How to Do Performance Calculations the Right Way

Knowing what are the computing capacity of your cluster (on-site or not) is a critical part of designing a made-to-last software. Getting your math right might just make the difference  between a great launch and an PR catastrophe

the math

The first thing we need to know about performance calculations is that it is a mathematical tool to help us estimate what our machines, servers, CPUs, or lab-rat-brain-computational-array are capable of.

Basically what we want to do is to figure all the different elements this formula out:

single machine performance × #of machines × utilization coefficient ≥ software workload per task × #of live tasks per second 

 

This calculation assumes your critical section (where the hard work is done) is embarrassingly parallel and that the tasks are independent (computationally) from each other. If your program has some synchronous parts to it, or your critical section you need to add these parts to the right hand side if the equation.

single machine performance

off the shelf appliances, and self ASSEMBLED MACHINES

When using an off-the-shelf product, you need to figure out what is the smallest calculating unit (core, CPU, etc.) you can control for, it may be the CPU, but it can also be a single core. Don’t forget to make sure the calculating unit is at least as big as the scope in which the task is running – if the task takes at lest an entire CPU there is no use of calculating how many cores it needs, now is there?

To calculate the single machine performance we need to gather some intel (no pun intended) first:

  • what is the frequency (in Hz) of a single calculating unit (check out the hardware vendor site for that)
  • how many of them are there in the level above that (CPU is made of cores, GPUs from CPUs, servers from CPUs and/or GPUs, etc.) up to the server scale
single machine performance ≈ calculating unit frequency × # of cores in CPU × # of CPUs in GPU × # of GPUs in server

 

Amazon Web Services (AWS) – understanding ECU

Amazon has a bit of a different approach to performance calculation: every server has a ranked on a scale of 1 ECU (EC2 Compute Unit) and up, this is a bit like horsepower for engines but it signifies the computing capacity equivalent to the one provided by the CPU of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. see the complete instance list here.

Once you know how many ECUs your machine is, divide it by the number of cores per CPU and continue on from there as usual

 

utilization coefficient

Usually you will find that the frequency noted on the processor box is not as accurate as you might think. This is basically do to the fact that you use an operating system,  that you are connected to the network, and that the CPUs are doing more than just running your code.

This leads to lower performance on the machine. who low? A good mean value will be around 70% utilization (so that’s utilization coefficient ≈ 0.7 ), but to be sure you’d have to measure it yourself. for that there is an open source project called HAL (look for it in GitHub), since moving there it’s going a major rebuild, but you cat adapt the existing (perl) code for your specific needs.

 

I’ll close with a suitable version of my own to a known Irish proverb:

So long my friends, and may the stats come up  to meet you.

Adam Lev-Libfeld

A long distance runner, a software architect, an HPC nerd (order may change).

Latest posts by Adam Lev-Libfeld (see all)

How to Do Performance Calculations the Right Way

Leave a Reply

Your email address will not be published. Required fields are marked *