Archive for the ‘Code’ Category

March 25th, 2010 @ 01:46

Today I’ve been looking at opencv-python for a quick project (I’d like to practice OpenCV a little bit). Installed the opencv-python package on Fedora 13, headed to the samples directory (/usr/share/opencv/samples/python/), started running one of them and… boum, segfault. Tried another one (the inpainting one), and it worked. A third one… segfault. Most of the samples in there segfaulted, mostly with SWIG errors about wrong parameters, always mentioning int64 (I’m using a x86_64 kernel & distribution).

After half an hour of failure on trying to get opencv.i686 work alongside my x86_64 python, I went back to the OpenCV website to check if there was some known heavy problems with x86_64 systems and… I discovered that :

Starting with release 2.0, OpenCV has a new Python interface. (The previous Python interface is described in SwigPythonInterface.)

Wait wait wait, you mean that during all this time I was running the OLD, pre-2.0 Python interface ? Why the hell does the opencv-python 2.0 package provides both the new and the old interfaces ? (well, I know the answer : backwards compatibility). Meh :( Anyway, I wish the old samples would get ported to the new interface… At the moment there’s no sample using it at all :/

March 22nd, 2010 @ 01:56

For a project midterm presentation, we were asked to produce a bunch of slides explaining our project architecture and implementation choices. Apart of the obvious things (libraries in use, network protocol…), I had no real clue on what I could put in, so I thought I’d just throw some UML-like diagrams and that it would be fine. The only detail was : how to produce these diagrams ?

Since the project code was written in Python, all the inheritance relations were already held by the code and could be introspected, so that it was theoretically possible to automatically produce the inheritance diagram. And it actually is, and is implemented by things like the Epydoc (a documentation generator for Python code) parser, as well as the diagram generation, which Epydoc also implements. The only thing is that I wasn’t satisfied by the Epydoc diagrams since they were limited to the inheritance relationships, while I was also willing to include usage relationships and display only the main methods and variables of my objects.

I thus wrote Umlpy, a UML-like class diagram generator for Python code, which depends on Epydoc (for the parser) and python-graphviz (for the graph generation, it produces nicely spaced graphs and can output jpg, png or pdf files, and probably more). It handles the aforementioned requirements through docstrings parsing and introspection. Check the Umlpy README file for more documentation on how to use it. It took me about 10 minutes to get to the result I was expecting (it’s basically about adding a little docstring for usage relationship, and copy-pasting a docstring on methods or variables you want to see on the diagram.

This wouldn’t be complete without the mandatory screenshot, and this example code results into this diagram :
Umlpy result example

March 22nd, 2010 @ 00:27

Last autumn I had to do some particularly cpu-time consuming but completely independant computations on thousands of pictures (namely computing SIFT keypoints on high resolution images). Using my laptop, it would have taken hundreds of hours to complete (it took about 20 minutes per image), but it could be easily done in a distributed manner on dozens of machines.

I thus started writing a little cloud-computing alike framework to handle the distributed computations in a safe manner : the machines I was intending to run my computations on were public-access machines and could get rebooted at any time, so the computation (let’s call it task from now on) that was started on it would never end.

There’s no trick here, the user defines his tasks, an entry point which sets off the initial tasks, and that’s it. The tasks are stored in a workqueue, which is handled by a master node, which assigns tasks to the other nodes (actually, the master server is split in two, the “heartbeat” server which monitors slaves status, and a taskqueue server from which the slaves pull their tasks). If a node does not complete a task within a user-defined amount of time, the master assigns it to another node. Yet, if the initial node manages to complete the task before the second one, the task gets completed anyway (this prevents infinite task respawning in case of an unexpectedly long task).

The tasks may use data stored in either a file-like storage or in a database. Since the whole thing is in Python without any runtime restriction, any arbitrary code or binary can be ran through this thing, which feels like a great pro versus frameworks where only a few languages are supported. A nice feature is that each task can be associated to a file pack (holding e.g. binaries which will be ran by the task) which is automatically deployed or cleaned when the task is started on the node.

Let’s also mention that the slaves are automatically deployed based on a simple configuration file (through ssh/scp) ; that applications are easy to deploy as well (the initial files, the entry points and al are defined in a single config file, form which the deploy tool automagically creates the file packs, updates the pack data to the heartbeat server and plugs the entry point into the taskqueue) and that there are only a few deps for the slaves (python and python-yaml), and not much awful ones for the various servers (mostly add sqlalchemy to the slaves deps).

There are still some points I’d like to improve. I’d first like to make the master node dynamic, and not static like it is now. Nodes which may elect as master (i.e. those who may directly connect to the various storages) would use a distributed protocol to actually elect the master (I’ve worked on such stuff last year during a complexity course, see my (French) summary about the subject : Calculabilité Distribuée), and then broadcast to the other nodes the winner of the election. If the current master is unreachable at any time, the electable slaves would start doing the election process right away, but would still try to attempt to reach the master until the election is completed.
I’d also like to add a way to handle in a nicer way mixes of 32 and 64 bits architectures (i.e. provide a way to define 32-bit/64-bit specific initial packages).
Security is another matter, I guess I should probably add some chrooting to the runtime environment, and secure the transmissions between the nodes (note that there’s absolutely no notion of authentication in any part of the software at the moment).

The only problem is that whan I got to the point I had finished the framework, I had no time to write the little bit of script to actually test the whole thing, and a friend had thrown the pictures to his high-end quadcore computer, so I had no chance to test it on a real life scenario. I’ll try someday, though. If you want to give a try to the latest version I pushed (it lacks the database backend and documentation at the moment, though I intend to finish/test them and upload them someday soon), check the Cloudpy git repo.

March 20th, 2010 @ 01:40

Last summer, I went to INRIA Sophia Antipolis (INRIA is the French research institute for computer science, basically ; Sophia Antipolis names a group of villages near Nice and Antibes (on the Côte d’Azur) on which are installed quite a lot of technology businesses, among which HP or Amadeus) for a 3 months internship in PLANETE project, under the advisory of Mathieu Lacage, a great software engineer.

Before any extra details, let’s point to the code, the report and the slides. There are also some slides somewhere in the wild which were made for a quick talk given by Mathieu at ns-3 workshop at SIMUtools 2010.

The goal of the internship was to enable the seamless use of multicore parallelization in ns-3 network simulator to speed up simulations.
The obvious idea behind this project is that with all the trends on going more and more towards highly multicore architectures even for personal computers, such a change would possibly allow large speedups without having to directly throw a lot of hardware in the balance and without requiring most likely complicated configurations from the user.

Our goal was to make theses changes entirely transparent for both developers (i.e. those working on improving the ns-3 framework — routing stuff, devices handling…) and users (i.e. those using ns-3 to simulate their protocols…).

Before going into the optimization details, let’s mention the basic idea of the algorithms that we used for the synchronisation of the execution units (the cores, in our case), algorithms which are part of the conservative synchronization algorithms class (as opposed to optimistic algorithms class).
We first tried a message-passing based approach, namely the Chandy, Misra, Bryant algorithm. The whole idea in this algorithm is that each node knows how far in the future it can avance without any risk of simulation desynchronization. This is based on knowing the physical transfer delays, plus the current date of the neighboring nodes ; date which is passed through recurrent messages. This way, under certain conditions (namely that no link has a null transfer time), the simulation will safely advance.
We found out that this approach would lead to no performance gain due to the amount of messages which have to be passed compared to the actual amount of stuff done per time unit.
We then used a barrier-based synchronization algorithm : at each iteration, a global maximum date is determined, based on the current date of each node and the physical transfer delays (again). All the node are thus allowed to process events up to this date, then wait at the barrier, and then go to the next iteration when everyone has reached the barrier. The actual barrier implementation is a fun part, which is described in the report. From a trivial semaphor implementation to a fully optimized, tree-shaped implementation, performance gains of over a 100 fold can be observed.

One of the usual challenges when doing parallel computing is to define a proper partition of what each execution unit does. For instance, when working on a network of nodes (which are computers, routers…), which nodes should be handled by which execution unit ? Since we don’t want to ask the user about what he thinks would be the best, and since we can’t just use topology clues to make a good partitionning (because it also depends on the actual dynamics of the network, which we can’t predict), we had to use a dumb solution and make it smart. The solution we found there was to have both dedicated and global nodes : a given percentage of nodes is dedicated to each execution unit, while a global pool of nodes remains undedicated. When an unit has finished processing it’s own dedicated nodes, it starts picking nodes from the global pool and process them. This way, these units won’t waste their time waiting for other units which have to handle more complicated nodes in their dedicated set, thus requiring more processing time just for the local set.

Now you are going to ask, why not just use a global pool and let all the units directly pick nodes to process from there ? The whole problem is that this global list must be protected, two threads just can’t pick stuff from it at the same time — this would lead to awful memory corruptions, and probably more. Consequently, we need to use either mutexes (i.e. locks) or atomic operations to modify the list. Atomic operations are cheap compared to locks, and yet they are still at least 10 times slower than normal operations. This justifies the whole dedicated + global pools approach.

Now, there were more thread safety problems. While most of them were easily disarmed for minor performances loss (such as disabling free list or caches), there was still a major issue : reference counting. Indeed, in order to be as user-friendly as possible, ns-3 provides (and heavily uses) a reference counting system to ease memory handling for users. The whole problem there is that the counters might be accessed concurrently by two threads at once. What happens when a thread references an objet at microseconds before another thread unreferences it ?
An usual i++ operation is usualy done by taking i from memory to a register, adding 1 to it and putting it back to memory. Now, if the original reference count was 1, we could have the first thread taking i = 1 from memory, the second one taking i = 1 from memory, remov 1, putting i = 0 back into memory, then figuring i == 0 and trashing the objet, then the first threading adding 1 locally, and putting i = 2 back into memo… wait, this memory has been unallocated. Game over.
Again, a possible solution is to use atomic operations. Oh right, they are awfully expensive. Don’t care, let’s try, there must not be that much ref/unrefs going around there. And then… ow, actually, we just catched a plain 100% runtime increase.
While we fixed some of the stupidities that lead to such horrible increases, the problem was still on.
We tried to use thread local storage to maintain a per-thread reference count, but this did not really solve the problem, just helped.
We finally tried to use thread local buffers of Ref/Unref operations, which would be batch-ran by a single thread. This lead to nice performance improvements over the previous solutions, but if I remember correctly under certain unknown conditions the code would badly crash or eat all the memory. Urgh.

Still, in some conditions, I still managed to get over 20% performance improvement with 8 threads (iirc).

The conclusion of all this is that multithreaded programming is definitely not a trivial problem and that converting an existing software to gain from multithreading is not that easy. For more information, numeric data and all, check the report.

Now, I have had no time since middle September to work on the issue, but I know Mathieu has. It’d be awesome to have some news from him here… To be continued, hopefully.

April 13th, 2008 @ 08:45

For a project of mine (involving LEJOS-OSEK, mostly), I’m forced (well, almost) to use Windows, which my dear VirtualBox virtualizes fairly well on my clean Linux host. I obviously setup vim for my coding needs, but its default behavior isn’t quite what we are used to on other platforms. For instance visual mode selection with arrows requires you to hold Shift, while it doesn’t elsewhere and backup files are created upon write.

So, let’s edit the .vimrc file, which actually is at VIM_INSTALL_PATH\_vimrc (for instance C:\Program Files\vim\_vimrc with the default install options).
Disabling backup files is as simple as adding

set nobackup

Getting visual mode arrows selection is a bit different, as explained in vim tip 864 comments, you can either add

set keymodel-=stopsel

around the end of the file
or drop the

behave mswin

line from the default vimrc.

Here are two other (non windows-specific) tips:
Bash like filename completion:

set wildmode=longest:full
set wildmenu

Pythonic smart indent:

autocmd BufRead *.py set ai et ts=4 sw=4 sts=4
autocmd FileType python set ai et ts=4 sw=4 sts=4
autocmd BufRead *.pyx set ai et ts=4 sw=4 sts=4
autocmd BufRead *.py set smartindent cinwords=if,elif,else,for,while,try,except,finally,def,class

See also this pretty Tango color scheme for vim.

April 12th, 2008 @ 08:24

Ever heard of Gogh? This is a really nice drawing tool for graphic tablets owner (such as those by Wacom). However it stopped working on my Hardy system after some recent update. After some investigation, it appeared that it was due to python-xml being moved away of Python path because it interfered with stuff in Python core ; the whole python-xml package being scheduled for deprecation as soon as the reverse depends would be cleared. I figured it out and dropped the reference to python-xml package in gogh/, using a method provided by Python core xml stuff instead. It seems that quite a bunch of people are currently using xml.doc.ext.PrettyPrint from pyxml to output an XML document to a file, so I figured that posting this little tip might help someone :)

So, let’s say you currently have something like this to output your xml document “doc” to a file at path “path”, doc being an xml.dom.minidom.Document object (or similar):

        f = open(path, "w")
        xml.dom.ext.PrettyPrint(doc, f)

All you need to change is PrettyPrint call to make use of the .toxml() method of your document object instead:

        f = open(path, "w")

Here you go, hope this might save someone’s day someday :)

April 6th, 2008 @ 22:03

Building dynamic forms with Django newforms module is quite undocumented, though it’s quite easy to do. All you need to do is to hook up the __init__ function of the form, raise the __init__ to the parent forms.Form class and then add your dynamically generated fields to self.fields dict.

Here is a quick snippet demonstrating it, which will create a form with n integer fields named from 0 to (n – 1), but you will easily be able to heavily extend it.

from django import newforms as forms
class MyForm (forms.Form):
    def __init__ (self, n, *args, **kwargs):
        forms.Form.__init__ (self, *args, **kwargs)
        for i in range (0, n):
            field = forms.IntegerField (label = "%d" % i, required = True,
                                        min_value = 0, max_value = 200)
            self.fields["%d" % i] = field
March 31st, 2008 @ 23:00

Bah, it took so much time to get them, while it was just a matter of finding one or two dozens of hours to make them all :)

Well, I’m too lazy to do any screenshot now, so just check them out by yourselves :)

March 9th, 2008 @ 21:01

Months ago, we were setting up‘s VPN and we were willing to produce a good looking map of the network status. We quickly chose weathermap4rrd, the software which is usually used for that kind of purpose. However the result looked graphically a bit rough, with aliases lines and so on, so that I had to fix it. I rewrote the rendering bits of the PHP version so that they use one of the PHP Cairo wrappers.

The usual sceenshot :

weathermap4rrd+cairo sample

I bundled the whole stuff you might need to deploy it with a howto and additional infos (sorry, the docs are written in French, I will translate them when I’ll have the time to) into this tarball, or you might just grab the weathermap4rrd patch.

March 6th, 2008 @ 21:29

It’s now official, Enso, a really awesome tool that integrates smoothly into your desktop and enables you to perform several otherwise boring actions by just using your keyboard in a very simple way (check Humanized website for a better description), has just been open sourced under a BSD-like license. It’s written in Python and uses Cairo for the rendering, and the end result is really neat.
One of the great news of this is that it’s going to work on Linux (and probably even BSD’s, though it requires compositing & a compositing manager for full functionnality, which is (afaik) only available on FreeBSD) very soon :). I’ve got it working on my laptop right now, after some python-xlib/pycairo/pygtk love ; it just now needs some more testing and polish before it goes live, but it’s gonna be great :)

Enso on Linux

Long life Enso!