Breaking the monolith – How to design your system for both flexibility and scale – Part 2: The Cathedral & The Bazaar

This post is part of a series. you’d probably like to read it from the beginning. You may also be interested in the next part of the series.

 

As a paraphrase on Eric Raymond’s famous book under the same title (O’reilly ,open format) let us, for a moment, think of the church as a computing system. The  service (no pun intended) is led by the priest, and has a single, general, output, it is very reliable (when comparing to other medieval services) but given under very strict terms (start time, end time, location, donation). If you don’t like your town’s priest, the next best option (if you are a medieval peasant)  probably involved some serious walking and certainly was not the safest things you could do in your day off (how’s that for a captive audience?) .

The Bazaar, On the other hand is a whole different animal – if you look at it from afar it has some of the same traits – it is time and space bound (though quite loosely, when compering to church service), and it serves the same people. That said, taking a closer look will expose a major difference – the service provider is not encapsulated in a uniform entity. There is no “market manager”, that bearded man who walks around wearing funny clothes and mumbles in a language you don’t understand is not in charge, he’s just plain crazy.

Without a unified ceremony everyone has to perform to the letter (though there are some standards, like currency and common language), the bazaar gives its clients another kind of service: it is highly distributed, highly parallel, highly flexible and thus better personalized experience. The downside is low reliability in the single transaction scope but you can always try again later or go to that next potion lady / swordsmith  a few steps away.

Basically, the strength of the bazaar comes from the fact that everybody can play a role in it, from child pickpockets to old leper beggars, there is not a single soul in the city that is left out the marketplace, and all services are supplied as needed exactly, no more, no less (what Adam Smith called “forces of the market”). This is all nice in market theory (even if I got a bit carried away with the metaphor) but this exact trait – any needed task for any free performer is the basic idea that enables us to build software that is both scalable and flexible.

 

DIVIDE and CONQUER

last time I left you with some homework: some perplexing questions were given and maybe some paradigms were oh so slightly shifted, let’s revisit those questions:

…In other words – each task should be a self-contained process. Sometimes it’s easier said than done, and yet the basic idea is to find where different events/messages do have some interaction and moving it as far of the compute-critical areas as possible.

In this process, some good leading questions might include:

  • how big is the interaction between two random messages? Does this change if their closer in time? and if the arrived at the same exact time?
  • what parts of the system have no interaction at all?
  • what is the compute-critical part of the system? (=what tasks are the most complex (computationally) parts of the system)
  • is any part of my system doing more than one task?
  • should my datastores  be fast-in-slow-out? or slow-in-fast-out?
  • must I have a datastore, or would a cache do just fine? can the entire data needed on a node fit into its memory? if not, what would it take to make it fit?
  • what happens if I fail to answer on a small part of these? what is the most I can fail to answer?

If you answered all the questions honestly (and I assume you eventually will) you probably already see how some of the blocks in your diagram are maybe too big, doing too many different things, or can be dissected into two self operating units in another way. Let’s see how this is applicable to NME’ s data flow:

NME2

By asking the right questions we were able to separate many of the tasks into their sub-elements. This enabled us to shift some tasks around, and we were even able to unify both flow diagrams, advancing our understanding of the flow and the stress points we might encounter. The three most useful tricks (in our case) were:

  1. Processing is totally separated from data handling
  2. Dividing complex processing tasks into a very basic pre-processing stage, and short bursts of processing followed by reevaluation of the work – up until your heuristics score (for example: calculation accuracy)  is “good enough”*
  3. Every DB is written to by no more than one element

In most cases, the same three  will basically get you to something that basically resembles a (micro)service oriented architecture, and that’s good, but not yet good enough. For it to be good enough there are another couple of stages – next up, we go parallel.

 

* This is determined by your product standards, a good rule of thumb is (in the words of Seth Godin) “Good enough is not perfect, it’s when you can say: good, enough.

 

Adam Lev-Libfeld

A long distance runner, a software architect, an HPC nerd (order may change).

Latest posts by Adam Lev-Libfeld (see all)

Breaking the monolith – How to design your system for both flexibility and scale – Part 2: The Cathedral & The Bazaar

Leave a Reply

Your email address will not be published. Required fields are marked *