Knowing what are the computing capacity of your cluster (on-site or not) is a critical part of designing a made-to-last software. Getting your math right might just make the difference between a great launch and an PR catastrophe
The first thing we need to know about performance calculations is that it is a mathematical tool to help us estimate what our machines, servers, CPUs, or lab-rat-brain-computational-array are capable of.
Basically what we want to do is to figure all the different elements this formula out:
single machine performance × #of machines × utilization coefficient ≥ software workload per task × #of live tasks per second
This calculation assumes your critical section (where the hard work is done) is embarrassingly parallel and that the tasks are independent (computationally) from each other. If your program has some synchronous parts to it, or your critical section you need to add these parts to the right hand side if the equation.
single machine performance
off the shelf appliances, and self ASSEMBLED MACHINES
When using an off-the-shelf product, you need to figure out what is the smallest calculating unit (core, CPU, etc.) you can control for, it may be the CPU, but it can also be a single core. Don’t forget to make sure the calculating unit is at least as big as the scope in which the task is running – if the task takes at lest an entire CPU there is no use of calculating how many cores it needs, now is there?
To calculate the single machine performance we need to gather some intel (no pun intended) first:
- what is the frequency (in Hz) of a single calculating unit (check out the hardware vendor site for that)
- how many of them are there in the level above that (CPU is made of cores, GPUs from CPUs, servers from CPUs and/or GPUs, etc.) up to the server scale
single machine performance ≈ calculating unit frequency × # of cores in CPU × # of CPUs in GPU × # of GPUs in server
Amazon Web Services (AWS) – understanding ECU
Amazon has a bit of a different approach to performance calculation: every server has a ranked on a scale of 1 ECU (EC2 Compute Unit) and up, this is a bit like horsepower for engines but it signifies the computing capacity equivalent to the one provided by the CPU of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. see the complete instance list here.
Once you know how many ECUs your machine is, divide it by the number of cores per CPU and continue on from there as usual
Usually you will find that the frequency noted on the processor box is not as accurate as you might think. This is basically do to the fact that you use an operating system, that you are connected to the network, and that the CPUs are doing more than just running your code.
This leads to lower performance on the machine. who low? A good mean value will be around 70% utilization (so that’s utilization coefficient ≈ 0.7 ), but to be sure you’d have to measure it yourself. for that there is an open source project called HAL (look for it in GitHub), since moving there it’s going a major rebuild, but you cat adapt the existing (perl) code for your specific needs.
I’ll close with a suitable version of my own to a known Irish proverb:
So long my friends, and may the stats come up to meet you.