Breaking the monolith – How to design your system for both flexibility and scale – Part 3: The embarrassingly parallel truth

This post is part of a series. you’d probably like to read it from the beginning. Or check out the previous post in the series


We are about to dig a hole. The hole will be 8 meter deep, 1m long and 1m wide. We know, from experience as a professional hole digger it will take us about 8 hours to dig all the way down. But this time there’s a difference – we will get some help.

together we stand

So our boss decided to get us some help, 7 people worth of help to be exact. But of course in life as in life, and nothing is free – as we are now eight, she wants us to dig the same hole in one hour. Trying to explain why this is not possible, you plead, you beg, you even say trying to dig that particular hole in an hour is like asking nine women to create a human baby in one month. Alas, the boss is the BOSS, and she will not budge – she wants that earth moved.

on the other side

“Earth, you say..” you ask and an evil grin starts to creep across your face as a light bulb appears above your head. “What if we dig the hole from the side? 1m deep 1m wide and 8m long, we can do that in just over an hour. how ’bout that?”. By now your boss has little interest in what you have to say and all she hears that “we can do that” part. “Yeah, Yeah. You go do that” she says.

    |                 |     |     |     |     |     |     |     |   
    V                 V     V     V     V     V     V     V     V
=========   VS.   ===================================================
 |  1  |           |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |
 |_____|           |_____|_____|_____|_____|_____|_____|_____|_____|
 |  2  |             
 |  3  |
 |  4  |                    
 |  5  |
 |  6  |
 |  7  |
 |  8  |

looking sideways

Every service is hole-digging in a way. Some are more like drills – time-consuming, complex and serial in nature, others are like a ditch digging – shallow on the computation time side of things and can be chopped to well-defined elements that solve part of the bigger problem at hand (and, of course can be run simultaneously). If you haven’t guessed it yet, this post is all about the latter kind, the parallel processes and services in our system.

Bare with me for a moment as I’ll dig (no pun intended) into a more formal definition of what a parallel process actually is:

 Let P be a complex process over dataset D where P can be described as a chain of subprocesses on segments of the total dataset P(D) = p1(D) + p2(D’2) + … + pN(D’N) if exits X,Y 1<X<Y<N such that for each i X=<i=<Y D’i is independent of pi-1 then subprocesses pX..pY can be run at parallel

Now that we powered (or skimmed) through this barely readable segment, we can go on to apply these great ideas to our problem, here are the (general) steps:

First of all, a tight, well described definition of the end result is needed – why are we digging? are we prospecting for oil, digging for treasure, or just  moving earth? Knowing this will allow us to make the compromises needed to take our process from serial to parallel.

Second, we will need to understand the different datasets in the system, their segments and co-dependencies. How does the different data blocks in our system interact with each other? What Are the boundaries between them? How does they affect each other and interact across these edges?*

And Finally, combine these answers to form a strategy, the common bag of tricks usually includes:

  • Separate data and computation as best as possible – this separation will cost you in the performance of the single process, so make sure the data is just far enough to maximize the parallel run performance.
  • If the subprocesses are read-only (= their dataset are, or can be immutable) put them on different compute nodes so that their performance and load will not effect each other.
  • If the part of the process can be divided into repeatable iterations of the same action (recursion is a good example) try and preserve as much data as possible between every two consecutive iterations.
  • If the a subprocesses needs some data from another’s subprocesses segment, consider synchronizing just that needed data before (and maybe after) every calculation (or iteration).
  • etc.

So now that we have slaved through the last three posts on chopping our process to as many small elements as we can, and now that we have that extra “parallelism” tool on our belt, we can finally get to the exquisite pleasure of process consolidation – our subject of choice to the next post of the series, or the one after it, it depends.


* if there isn’t any interaction between the different parts of the computation then the problem is called “embarrassingly parallel”

Adam Lev-Libfeld

A long distance runner, a software architect, an HPC nerd (order may change).

Latest posts by Adam Lev-Libfeld (see all)

Breaking the monolith – How to design your system for both flexibility and scale – Part 3: The embarrassingly parallel truth

Leave a Reply

Your email address will not be published. Required fields are marked *