JoelBlog | Category | Linux/Ubuntu

Using Celery+Python to make a Distributed Genetic Algorithm

Note: If you like to follow along with complete source, feel free to grab this project on github.

Celery is a distributed task queue that integrates smoothly with python. I stumbled upon it and immediately saw it as a cool technology looking for an application. You can define tasks in python by just adding a decorator (@task) that can be performed in the background asynchronously by hungry worker compute nodes. For example, here is a sample (simple) task gratuitously lifted from the basic celery tutorial:

@task
def add(x, y):
    return x + y

You can then call the task with its “delay” method and it will be serialized, whisked away to Celery’s backend queue, unserialized by a worker, computed, and then the result will be ferried back to the python caller:

>>> from tasks import add
>>> add.delay(4, 4)
>>> result.get()
8

What is especially nice about this is how seamless it is. If your prior exposure to parallelization is something like MPI in C++, the simplicity and elegance of celery is refreshing. It just works, and for expensive parallel tasks it is a nice way of circumventing the GIL in python.

My PhD research often involves genetic algorithms. The idea is that you have an array of artificial genomes (which are themselves often lists of numbers) and a way of judging these genomes (an error function of some kind). Iteratively you apply the error function to weed out the worst of these artificial genomes, and replace those worst genomes with slight variations of the better ones. The hope is that over time you can find a genome that minimizes the error function (usually called the fitness function in GA parlance).

Usually the performance bottleneck in GAs is evaluating each genome through the fitness function. For example, sometimes this evaluation can consist of expensive physics simulations (e.g. simulations of virtual creatures).

What is nice from a parallelization standpoint, is that each simulation is done on a per-genome basis and is (usually) independent of every other genome. So if you had a population of 1000 genomes that you need to evaluate, you can create a celery task for each of them that can be offloaded to either spare cores on the same computer or onto worker instances on other computers.

Of course, celery will be handy not only for GAs but wherever you have many expensive tasks that are independent of each other. For example, rendering individual frames of a ray-tracing animation or multiplying many large matrices. If you can decompose your computational bottleneck into significant independent chunks, Celery may be a good fit for easily reaping the benefits of parallelization.

For the genetic algorithm application, I just had to wrap the fitness function as a celery task:

@task
def evaltask(domain,genome):
 fit=domain.evaluate(genome)
 return fit

And then I augmented the serial evaluation of the genomes (shown here):

#evaluate sequential
 if(not eval_concurrent):
  for k in population:
   k.fitness = dom.evaluate(k)

With the parallel evaluation that uses the celery server:

#else evaluate concurrently
else:
  tasks=[]
  for ind in population:
   tasks.append(evaltask.subtask((dom,ind)))

  job = TaskSet(tasks=tasks)

  result=job.apply_async()
  fits=result.join()

What this code does is make a list of celery subtasks, one for each genome to be evaluated. It composes this into a larger “taskset,” which is run asychronously, and the results are gathered (in order) by the join call once they all complete. If each task is non-trivial (i.e. it takes significantly longer to run than the overhead associated with serializing it and throwing it to celery), then the speed-up gained from paralellization can be very nice.

Feel free to check out the complete source on github (including the celeryconfig.py file that).

Using concurrency requires a working celery install (see http://www.celeryproject.org/tutorials/first-steps-with-celery/ for how to run celery, it is fairly simple to install):

You can install Celery either via the Python Package Index (PyPI) or from source.

To install using pip:

1

$ pip installCelery

To install using easy_install,:

1

$ easy_install Celery

And a backend for celery is necessary for it to run. The default is rabbitMQ, available on ubuntu as package: rabbitmq-server

Email’s not sexy

I wanted to host an email server for a project I am working on. Due to the simplicity of installing a web server in modern linux distributions I mistakenly assumed I was a few “apt-get install”s away from success. This was stupid of me.

Don’t get me wrong, I am sure that sendmail and postfix are wonderful pieces of powerful software, and as someone searching for free software to do complex things I have no right to complain, but I simply don’t care enough about email to invest time in learning. It’s just not fun or sexy to me. For whatever reason, web servers are sexier. Web servers are still tied to the future and are going interesting places. But email seems bland and tired, no longer tied to progress, like some sort of dead end, waiting to be replaced by some cooler communication system.

Basically, my search for a simple solution led me here: https://help.ubuntu.com/community/MailServer , which tells me exactly what I don’t want to hear:

Setting up an email server is a difficult process involving a number of different programs, each of which needs to be properly configured. The best approach is to install and configure each individual component one by one, ensuring that each one works, and gradually build your mail server.

Great. Perhaps my technical genius will allow me to plow through this in a few minutes?

A Mail Transfer Agent (MTA) is the program which receives and sends out the email from your server, and is therefore the key part. The default MTA in Ubuntu is Postfix, but exim4 is also fully supported and in the main repository.

Postfix – this guide explains how to set up Postfix.

But the extent of my technical genius is vastly (well, entirely) overstated and my attention span is near it’s limit. I hope this is relatively simple.

[...]

Configure Postfix to do SMTP AUTH using SASL (saslauthd):

sudo postconf -e 'smtpd_sasl_local_domain ='
sudo postconf -e 'smtpd_sasl_auth_enable = yes'
sudo postconf -e 'smtpd_sasl_security_options = noanonymous'
sudo postconf -e 'broken_sasl_auth_clients = yes'
sudo postconf -e 'smtpd_recipient_restrictions = permit_sasl_authenticated,permit_mynetworks,reject_unauth_destination'
sudo postconf -e 'inet_interfaces = all'

Next edit /etc/postfix/sasl/smtpd.conf and add the following lines:

pwcheck_method: saslauthd
mech_list: plain login

Generate certificates to be used for TLS encryption and/or certificate Authentication:

touch smtpd.key
chmod 600 smtpd.key
openssl genrsa 1024 > smtpd.key
openssl req -new -key smtpd.key -x509 -days 3650 -out smtpd.crt # has prompts
openssl req -new -x509 -extensions v3_ca -keyout cakey.pem -out cacert.pem -days 3650 # has prompts
sudo mv smtpd.key /etc/ssl/private/
sudo mv smtpd.crt /etc/ssl/certs/
sudo mv cakey.pem /etc/ssl/private/
sudo mv cacert.pem /etc/ssl/certs/

Configure Postfix to do TLS encryption for both incoming and outgoing mail:

[...]

Oh my god I don’t care anymore. I don’t understand why this sensible configuration isn’t built in or automated, there probably are wonderful technical reasons, but I am physically unable to force myself to type in these commands because I am bored just reading them.

If this was Node.js or clojure, the compelling nature of what they enable and how they sit at the edge of innovation might entice me to dig through config files — and yet ironically I don’t need to because the default install for those programs are fairly straightforward.

Anyways, moral of the story: The amount of garbage you are willing to put up with to get something working reflects how inherently interesting you deem that endeavor. Keep that in mind the next time you feel your temperature rising when dealing with config files. Perhaps some things are worth outsourcing instead of buckling down and working through. I feel like an intermediate knowledge of web servers will be handy in future projects, whereas becoming intimate with IMAP, SASL, MTA, and MX DNS records probably does not pave a path towards the next big thing.

Ubuntu/Linux: What to do when a window or dialog won’t fit on-screen

On my laptop, sometimes an application’s throws up a pop-up menu or form that is too long to fit on the screen. If the form has no scroll-bar to get to the bottom of the form, which often has an OK button that needs to be pressed to move on, it is a very frustrating experience.

I’ve had this problem with cinelerra, a powerful opensource video editor as well as with the android SDK. The solution is to press ALT+F7, which will allow you to ‘grab’ the window and manually adjust it upwards and allow you to click the OK button and move on!