The last couple of weeks I slaved over refactoring some of the code-base of one of our costumers. The task at hand was to take the elaborate spaghetti code, build with way (waaaay) too many multiprocessing queues and turn it into a functioning, debugable, decently performing piece of software.
As it turns out, most of the work is embarrassingly parallel1 but with all the different processes handling all the different tasks the code is unmaintainable. So off we went, trying to find more subtle routs to multiprocess our way out of the mess.
It was clear. we needed a map. Not the of the paper kind, not even of the google kind2, but of the parallel kind.
Very well, a map. the code is in Python and Python has one. Should be pretty simple, doesn’t it?
The following gist is the skeleton of the code we used to move the existing code to its (hopefully) finale map-using form. It is, as you can see less than trivial, which leads me to think the whole concept of “mapable” code not as solid as you might expect in a mature language as Python.
Feel free to use, abuse, share or disregard this. I wish I found something like this before we ever started on that task, and now it’s here, but the damage was done. The major lesson is that refactoring into a parallel code is incredibly more complex than starting from a clean slate and reusing only what you must. Next time I’ll make sure to try that.
1. if you don’t know what this term mean read this
2. depends how you define “google map”