Cache 22

I have recently had a discussion with one of the platform programmers (≈DBA) on a visit to a client. During my work there I added an adjustment to a process that I thought would improve accuracy with a minor impact on performance. Instead, the process came to a screeching halt, never returning an answer.

assume nothing

The root cause of this was an API misunderstanding – the code segment I added included an access to the DB. This DB query took the form of f(x) = y where both x and y are of a small, limited set, with a lot of repetition, and I assumed it would be cached. It was not.

The effect of these repeated DB queries, in addition to the fact that the DB was on a remote server and had some performance issues of its own was the resounding failure of the process.

cache (almost) Everything*

Caching as a whole and particularly local caching is one of our most powerful tools as architects, as it significantly changes the flow of data in the system with little to no visibility to the programmer and with what may be zero change to the logic, which is nice.

Hopefully the change in the flow of data would limit unwanted, repeated queries while opening a congestion free, rapid responses for new queries, with the performance boost to match. But caching has a dark side of its own, it adds a small overhead, may be misconfigured when it comes to data staleness, increases your system’s memory footprint, and worst of all, is a known source of hiesenbugs, as it changes the flow of data in a way that is hard for the (feeble) human mind to grasp without training, mainly because it is easy to forget about.

Make it Clear

My personal opinion on this subject is, that as the world of computing becomes more and more decentralized, and almost any query is remote, the more you can correctly cache, the better. That said, even good things are only good in a measure and writing caching systems to everything takes time, and in many cases we would like to leave to the user the choice of not using it at all.

This means some of your functions will be cached, some of your functions will sometimes be cached and some will not, and we have some very common tricks for telling the user-programmer what type is he using. In my travels I have seen many who used the func_chached(param) paradigm. This is very good as long as there is no choice of cached/non-cached implementation. Where such choice is needed the recommendation is to use func(param, cached), having the parameter optional or with a default value is also an option but needs to be done carefully. This would create an unmistakable change to the signature of the function without making the user-programmer to be overly familiar with the documentation.

On the other hand, I also see a lot of func_chached(param) , func(param) pairs – THIS IS THE WORST AND I CAN PROVE IT. By using two functions (or a single logic function and a wrapper function), you relay on the user-programmer to read through all the code of the library/platform/whatever and notice (and remember!) which functions has this implemented and which are not, worst off are the cases where some functions have caching built in as a default and some having it as an option, as this makes the user-programmer read the documentation (or worst, code) every time a new function is used to find out which type it is. As far as the user-programmer is concerned the data he gets from the function is always correct and up to date.

Hand over the controls

Which leads me to my last point – I am using the term “correct and up to date” to distinguish between the latest data to the “close enough to be latest” data –  a CDN for a news site can cache once a minute, it’s OK of people will read a minute old news, but it is not OK for it to cache once a day, or they would just print the damn thing. As an architect you must make sure to have your data staleness timer set up right and to be configurable to meet the specs and/or user requirements as they change, you may even have to add the ability to set staleness from within the code*.

Caching may be the medicine you don’t want to take, but you have to**. When you do take it, make sure you are verbose about it, and make sure the control of that cache is handed down to those who will need it.

∗  This is done very well in many cases in the Linux C libraries, the most common is the communication with output streams (think flush()), which are a bit of reverse cache, but whatever.

** You will often find that most of your cache worthy platforms (DBs etc.) already come with some form or another of a local cache for you to utilize with relative ease.


Adam Lev-Libfeld

A long distance runner, a software architect, an HPC nerd (order may change).

Latest posts by Adam Lev-Libfeld (see all)

Cache 22

Leave a Reply

Your email address will not be published. Required fields are marked *