Archive for the ‘Code’ Category

August 16th, 2013 @ 15:47

Kliment, of Printrun/Sprinter/4pi/… fame, has struck again. Sponsored by Joaz (jglauche@#reprap, who runs RepRapSource), he developed simarrange, a smart STL plater. Basically, you throw it a bunch of STL files or a directory with STL files, and it’ll attempt to fit them into plates of the given size (200x200mm by default, circular plates are also supported).
The algorithmic bits behind this are basically a greedy search, where the software tries to fit the items from biggest to smallest, and attempts to place each item closer to the bottom left corner first (or closer to the center) in any possible rotation. The spatial & angular rotation search granularity can be easily controlled using command line parameters. To decide whether an object can be placed at a given position or not, the object geometry is projected onto the 2D plate surface and bitmap-based logical tests are used to detect overlap.
This methods seems to produce tightly packed plates, as can be seen on these examples (the software outputs both the plate STL and these small visualizations):

plate24 plate19 plate07
plate22 plate18

Example plates produced by simarrange

I quickly jumped in to hack a few speedups, namely to avoid trying to fit objects which we already failed to fit on the current plate (in the case of making copies) and to detect if it is possible to make copies of each produced plate (which can save a lot of time if you’re plating dozens of printer kits for instance), and I contributed part of the code to make centered packing work. Kliment then added multicore parallelism yesterday (using OpenMP for full cross-platform support) to search over rotations in parallel. With all these optimizations, plating jobs which would have initially taken entire days now run in less than 10 minutes :)

Oh, and it’s already integrated to Printrun’s plater :)

Pronterface's integration of simarrange

Pronterface’s integration of simarrange

August 9th, 2013 @ 14:22

I’ve been a Deezer Premium (paid) user for almost two years, and while the service (and when I say “service” I mostly mean “the smartphone app”) was initially great, it has been worse and worse over the past year. Some songs started to get randomly corrupted (turn into awful noises) while they were playing alright the day before, some other songs started getting unsynchronized because Deezer renamed it, and lately the app has become a memory and CPU hog, visibly aimed at new high-end devices and not at my poor 2 years old Nexus S. Some behaviors are quite striking: while the app can handle playlists of 20 songs alright, it starts hanging forever just a few minutes after starting to browse or listen to a 180 tracks playlist. Last, it seems the app drains the battery a lot, which is not quite okai for something as simple as a music player.

All in all, when Google launched Google Play Music in France, I decided to give it a shot (especially since the special early bird offer is 2€ less than Deezer Premium+). The only issue is that I have carefully crafted my playlists over these 2 years and did not want to have to manually rebuild them, crawling through Google library one track after the other. I thus decided to script it, which clearly took me less time (less than 2 hours) than I’d have needed to do the manual work, with the extra bonus that others will be able to use it as well :)

I came up with two complementary pieces of Python code: deezerexport and gmusicimport. The two communicate through simple files representing playlists using a simple JSON format:

  "playlists": [{"title": "My Playlist 1",
                 "tracks": [{"title": "My Song 1",
                             "artist": "My Artist 12",
                             "album": "My Album 42"},
                            {"title": "Meh Song 2",
                {"title": "Mah Playlist 2",
                 "tracks": ...},

deezerexport is standalone (no extra dependency except Python) takes your Deezer user ID as a command line parameter and saves your playlists to the file specified by the --export CLI parameter. To find out your Deezer ID the easiest way is to look at a Deezer page source when you’re logged in and look for “USER_ID :” followed by an integer, which is your ID.
Getting the playlists exported is thus as simple as running:
python 2529 --export playlists.json (2529 is the example ID from Deezer’s API documentation)

Once you’ve produced the playlists file, you just need to feed it to gmusicimport. This one depends on gmusicapi, which are python helpers for the unofficial Google Play Music API.
gmusicimport will go through each playlist track list and will try to find a good match for each song based on its title and artist name from the Google Play Music All Access library. Based on the score reported by the search facilities from the API, it will select the best match or warn that no good match has been found. After this search pass, it will proceed to create a new playlist in your Play Music account with all the matches it found.

Usage is super simple again:
python2 -u USERNAME playlists.json where USERNAME is your Google username, and playlists.json the file you produced with deezerexport (or with your own export script).

You can also use the --dry-run flag to test the script without actually creating anything but only check the importability of the playlists (i.e. if there are good matches for all the tracks from your playlists in the All Access library), and the -v flag to increase verbosity (moar messages!).

Feel free to report any bug or feature request on GitHub, and have a safe migration ^^

June 25th, 2013 @ 15:30

One of the main side effects of my recent work on Printrun is that we now store the parsed G-code instead of the plain text one. I initially wrote and tested these changes on machines which were not memory tight (with either 8 or 16GB of RAM) and did not initially foresee that I was creating a huge memory hog. Indeed, instead of just storing the G-code commands as a list of strings, we now store a list of Python objects with various attributes, the raw G-code string being just one of them. Loading a 3.8MB file with about 140000 G-code lines lead to about 244MB of memory usage just for the parsed G-code, which is definitely not right (especially since some users have very large G-code files, more than 20 times bigger than the one I’m using).

Let’s look at how we can find out where we are using memory and what we can do to improve the memory usage.

Understanding the data structures

We basically have a large batch of Line objects. Each Line object is basically a storage hub for a single G-code command and the attributes we parsed from it: the command (as a string), X/Y/Z/E/F G-code arguments and absolute X/Y/Z position and a few meta-information, such as whether the coordinates were imperial, whether the move was relative or whether the command is going to involve the extruder.

These lines are accessed through three main paths: a list which references all the lines, a dictionary of Layer objects referencing all the G-codes started from a given Z (these Z are the layer indices) and a list of Layer objects, with one Layer object for each Z change (so a head lift during a move would create two new layers, the following commands going into the last created layer).

Profiling memory usage of Python code

I used two main approaches to profile the code: the first one is simply to use the sys.getsizeof function from Python standard library, which gives you the size of the object you provide it (not including the size of objects it references but including the size of the garbage collector metadata overhead), while the second one (which is actually the one I used the most, I only used the first one at a very late stage) is the memory inspector Heapy.

Heapy is very easy to use: just install guppy (the programming environment that provides Heapy), and then:

from guppy import hpy
h = hpy()
# the code you want to memory-profile
print h.heap()

Which should print something like (and this report is the starting point of our adventure, based on commit de02427):

Partition of a set of 2030760 objects. Total size = 256042336 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 139536   7 146214528  57 146214528  57 dict of printrun.gcoder.Line
     1 838522  41 43666288  17 189880816  74 str
     2 140075   7 23991064   9 213871880  84 list
     3 279047  14 21207472   8 235079352  92 tuple
     4 419904  21 10077696   4 245157048  96 float
     5 139536   7  8930304   3 254087352  99 printrun.gcoder.Line
     6  73040   4  1752960   1 255840312 100 int
     7    536   0   150080   0 255990392 100 dict of printrun.gcoder.Layer
     8    536   0    34304   0 256024696 100 printrun.gcoder.Layer
     9      4   0    13408   0 256038104 100 dict (no owner)

What we can see here is that 57% of the memory is used for the dictionaries that hold the Line objects attributes, 17% for storing strings, 9 for lists, 8 for tuples, 4 for floats. and 3 for the actual Line objects and 1 for integer objects.

Getting rid of temporary data

My first shot (which was even before using Heapy, so it might sound a little silly) was to drop data which had a limited lifetime. We were indeed storing the result of splitting the G-code string using a complex regular expression until the point where we consumed all the information from this split to parse the attributes of the command. Thus in 86bf8ca we started dropping this subresult (which was basically a tuple holding a bunch of strings) after using it, and reduced memory consumption by about 14%, down to 200MB:

Partition of a set of 1331825 objects. Total size = 210155496 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 139536  10 146214528  70 146214528  70 dict of printrun.gcoder.Line
     1 279047  21 21207472  10 167422000  80 tuple
     2 279103  21 16753624   8 184175624  88 str
     3 419904  32 10077696   5 194253320  92 float
     4 139536  10  8930304   4 203183624  97 printrun.gcoder.Line
     5    559   0  5016888   2 208200512  99 list
     6  73040   5  1752960   1 209953472 100 int
     7    536   0   150080   0 210103552 100 dict of printrun.gcoder.Layer
     8    536   0    34304   0 210137856 100 printrun.gcoder.Layer
     9      4   0    13408   0 210151264 100 dict (no owner)

The __slots__ magic

From there since I was not very sure of where to cut costs, I started using Heapy and figured out that most of our memory (70%) was being used by the dynamic dictionaries used to store Line objects attributes. This is an expected downside of Python’s flexibility, as you can add new attributes on the fly at any time, so it can’t just allocate a static amount of memory at object creation to store all the attributes. This is where the __slots__ class variable comes to the rescue: it lets you specify which attributes will ever be used on this object. We thus implemented this trick in 2a6eec7, further reducing memory consumption on our test file to 77MB, which is 62% less memory than before.

Partition of a set of 1192289 objects. Total size = 80685288 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 139536  12 25674624  32  25674624  32 printrun.gcoder.Line
     1 279047  23 21207472  26  46882096  58 tuple
     2 279103  23 16753624  21  63635720  79 str
     3 419904  35 10077696  12  73713416  91 float
     4    559   0  5016888   6  78730304  98 list
     5  73040   6  1752960   2  80483264 100 int
     6    536   0   150080   0  80633344 100 dict of printrun.gcoder.Layer
     7    536   0    34304   0  80667648 100 printrun.gcoder.Layer
     8      4   0    13408   0  80681056 100 dict (no owner)
     9      4   0     1120   0  80682176 100 dict of guppy.etc.Glue.Interface

Replacing lists of tuples with arrays

At this step the memory usage of Line objects felt pretty much right, still too high but back to a reasonable ratio compared to the other data structures. However, the costs of tuple objects now looks a little bit too high. We use tuples in two places, the first one being to store the current absolute X/Y/Z position at the end of each G-code line, the other being in a big list which is used to map between the line number and the layer number & position in this layer line list. The later can be easily replaced by two lists holding the same information without the tuples, or by two Python array, which are very compact storage arrays for standard types (characters/integers/floating point numbers) and thus have much less overhead than a list holding integer objects. Switching to integer arrays in 5e1a0be brought us down to 65.5MB, a new 15% improvement.

Removing more tuples

As explained in the previous part, we also had tuples storing the destination absolute position of each G-code line, which we simply split into three different properties in b27cb37, reducing memory consumption down to 57MB, a new 13% improvement.

Going C-style

So, here we are:

Partition of a set of 840204 objects. Total size = 59763064 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 139536  17 27907200  47  27907200  47 printrun.gcoder.Line
     1 279103  33 16753624  28  44660824  75 str
     2 419904  50 10077696  17  54738520  92 float
     3    558   0  3706096   6  58444616  98 list
     4      2   0  1116400   2  59561016 100 array.array
     5    536   0   150080   0  59711096 100 dict of printrun.gcoder.Layer
     6    536   0    34304   0  59745400 100 printrun.gcoder.Layer
     7      4   0    13408   0  59758808 100 dict (no owner)
     8      4   0     1120   0  59759928 100 dict of guppy.etc.Glue.Interface
     9      1   0     1048   0  59760976 100 dict of printrun.gcoder.GCode

47% of the memory is used by the Line objects, 28 by string objects and 17 by float objects. The strings take 16MB in memory (while we only have 3.8MB of input data, that’s a x4 factor, which is probably due to the fact that sys.getsizeof("") reports 40 bytes overhead per str objects), and each float takes 24 bytes, while a double should take no more than 8 bytes.
The solution we used for getting rid of all the overheads is to provide an extension module written in C, using Cython, which would store all the properties in a C structure. However, we had to take care of two issues: the first one was the need for a fallback mode if the Cython module isn’t available (so we had to keep the previous Python-based Line implementation and derive from either this PyLine or the Cython-based gcoder_line.Line object), while the second one was that Python properties could be either None or some value of the type we wanted, and that we use this behavior to detect whether an attribute was present in the G-code line or not (that is, if after parsing the command of an object called line the line.x attribute is None, then there was no X argument in the command). The solution to this second requirement is to store for each property an extra boolean telling wether the property was ever set or not, so that when the property is accessed we can return None if no value was set.

Let’s talk about implementation now. I first tried to implement argument access in a Pythonic way, hacking __setattr__ and __getattr__ to do the extra None-checking, as can be seen in a3b27af, but this solution was much slower (3 times slower) than the original Python code, though much lighter, down to 26MB, a new 54% improvement (though we lost track of some memory here, see below):

Partition of a set of 141504 objects. Total size = 27358912 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 139536  99 22325760  82  22325760  82 printrun.gcoder.Line

The solution to this slowdown, implemented in a511f20, was simply to use the properties mechanism from Cython, which lets you write getters and setters for a given attribute, which get compiled into very fast C code, while using the same amount of memory per-instance.

The next improvement was to compress all the boolean properties specifying whether a variable had been set into a single bitarray stored as a 4 bytes (32 bits) integer, which reduced the per-instance memory usage from 160 bytes to 144 (1c41072). We extended this trick to all the boolean properties in fc43d55, further reducing per-instance cost to 136 bytes.

Getting rid of the cyclic garbage collector

Though memory consumption had already been trimmed down a lot, I was a little bit confused because the Line objects were still taking 136 bytes, while the actual properties I had declared should have been taking about 97 bytes, and doing a manual sizeof on the C struct which was being compiled announced a size of 104 bytes (the 7 bytes difference is just a matter of memory padding). After looking around a lot and using the standard “get rid of one line of code at a time until things start making sense”, I figured out the 32 extra bytes were brought by the fact that one of my properties was a Python object (the tuple of strings from the G-code splitter regexp), and that referencing a Python triggered the addition of cyclic garbage collection metadata. Fine, I refactored the code to not store this information at all. The Cython-based Line objects then started being the right size, but the final Line objects (the ones which inherited from either the PyLine or Cython-based gcoder_line.Line) were still 32 bytes too large, even though they had no more memory __slots__ than their ancestors (so no other property than the parent’s could reference other Python objects, and as the Cython parent could not, the child class should not be able to reference other Python objects).

After investigating a lot more, I figured out that while the same derivation worked fine with new-style Python objects (the ones which derive from object class), the Cython objects were a little bit different and that deriving them would always pull in this GC metadata. I thus rewrote the code to not require the derivation (by moving the previous class methods to module methods) and after 77888d9 we had memory consumption down to 17.6MB on our test object.

After a few more tricks and fine tuning of the properties list (we got rid of super rare properties), the size of Line objects was reduced to 80 bytes, and the global memory consumption down to 15.3MB:

Partition of a set of 140384 objects. Total size = 16011448 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 139536  99 11162880  70  11162880  70 printrun.gcoder_line.GLine
     1    536   0  2394240  15  13557120  85 printrun.gcoder.Layer
     2      2   0  1313424   8  14870544  93 list
     3      2   0  1116400   7  15986944 100 array.array
     4      4   0    13408   0  16000352 100 dict (no owner)
     5    275   0     6600   0  16006952 100 float
     6      4   0     1120   0  16008072 100 dict of guppy.etc.Glue.Interface
     7      1   0     1048   0  16009120 100 dict of printrun.gcoder.GCode
     8      8   0      704   0  16009824 100 __builtin__.weakref
     9      1   0      448   0  16010272 100 types.FrameType

Adding Heapy support for Cython objects

However, the numbers I announced in the previous section are wrong. Indeed, one big issue with using Cython is that Heapy can’t automagically follow C pointers, and as we store the raw G-code string and G-code command (the “G1” or “M80” part for instance) as C strings, Heapy lost track of them, and we are using more memory than Heapy claims.
Luckily enough, Heapy provides a mechanism to specify a function which will take an object of the given type as an argument and will return the space used by the object and its properties, as documented on
The only trick there was that the C code is only a subproduct of the Cython extension build process, and that I couldn’t add raw C-code to the Cython source, so that I had to write a patch which can be applied to the C source after it has been initially generated (the patch is available in 040b89f).

After this patch, we now have a correct estimation of the memory usage, and we have effectively reduced memory consumption from 244MB down to 19.4MB, which is a reduction factor of 12.5. We still use 5 times more memory than the initial file we load, but given how much metadata we store, it feels quite decent. I don’t expect anyone to load a G-code so big than 5 times the size would go over the memory limits of the person’s system.

Partition of a set of 140383 objects. Total size = 20369977 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 139536  99 15521497  76  15521497  76 printrun.gcoder_line.GLine
     1    536   0  2394240  12  17915737  88 printrun.gcoder.Layer
     2      2   0  1313424   6  19229161  94 list
     3      2   0  1116400   5  20345561 100 array.array
     4      4   0    13408   0  20358969 100 dict (no owner)
     5    275   0     6600   0  20365569 100 float
     6      4   0     1120   0  20366689 100 dict of guppy.etc.Glue.Interface
     7      1   0     1048   0  20367737 100 dict of printrun.gcoder.GCode
     8      7   0      616   0  20368353 100 __builtin__.weakref
     9      1   0      448   0  20368801 100 types.FrameType

So, long story short, Heapy is a very cool tool to profile the memory usage of Python code, Cython can help reducing your memory usage by letting you write extension modules very easily and that you should not forget about Python’s cyclic garbage collector when investigating your objects’ memory usage.

June 24th, 2013 @ 16:30

For more than one month now, I’ve been working on a daily basis on Printrun, one of the most used 3D printer host softwares (the thing that sends G-Code to your printer and lets your monitor how things are going). I had already given it a bit of love in the past, doing some internationalization and translation work, splitting the GUI code into a more manageable class-based patchwork or reworked the serial connection bits, but this batch of changes has much more influence on the user side.

Just to boast a little bit, let me quote Kliment, of RepRap/Sprinter/4pi/Printrun fame, who created and maintains Printrun:

13:52:12 < Kliment> On the sixth day, Kliment created the pronterface and the pronsole, and saw that it was…pretty crap. Then iXce came and helped out.

So, here are the main changes:

Cleanup of G-Code parsing

I initially came back to hacking Printrun because I thought loading G-code files was really too slow, and I quickly figured out that it was mostly due to the fact that we were parsing the G-code several times (there were at least 6 different G-code parsers in the code, and loaded G-code was parsed at least three times immediately upon loading). By rewriting and merging all those parsers, I was able to reduce the startup plus file loading plus visualization preparation time from about 13 seconds to 4.5 seconds on average, using a 3.8MB G-code file of about 140000 lines, while the new parser is also doing more things than the old ones.

Improvement of the remaining print time estimation

The previous remaining print time estimation was simply doing a cross-multiplication between elapsed time, number of sent G-code lines and number of remaining G-code lines, which happened to be very wrong in many cases: the early estimations were completely biased by the fact that we usually print the first few layers at a much slower speed than the other ones, and average G-code time means pretty much nothing (a long line can be 1 single G-code, while a tiny circle can be 50, while printing the line can take 20 seconds and the tiny circle less than 2, so you can have some G-codes which take 500 more time than some others). Thanks to the fact that we now have a complete understanding of the G-code (or actually just because we store more meta-information), we can now compute a much smarter estimate, based on the estimated total duration (the one you see after loading the file) computed on a per-layer basis, which we correct by incorporating a bias computed between this estimation and the actual print time for the elapsed layers. This way at any time the ETA should be within 10% of the actual ETA (unless you have a very, very weird machine), and should get closer and closer over time.

Optimization of the 2D print visualization

The 2D print visualization was really slow, and was even known for slowing down prints because screen refreshes were taking too much time. The drawing code was indeed redrawing every single line on each refresh, which would happen once for each sent G-code. We now buffer drawings as much as possible: complete redraws only occur when changing layers or resizing the canvas, every other refresh is done by updating a buffered bitmap with the newly added lines and drawing it to the screen. Visually, not much has changed except that the print bed extents should appear in a cleaner way (the yellowish background color does not leak anymore), that the grid should correctly start in the bottom left corner, and that thanks to other interface changes the resizing behavior should be much nicer.

The optimized 2D viewer, with proper resizing behavior, correct grid and different colors for background and build area

The optimized 2D viewer, with proper resizing behavior, correct grid and different colors for background and build area

Addition of a 3D viewer

Even better than optimizing the 2D viewer, we added a proper and very fast 3D viewer, based on tatlin, a standalone G-code and STL viewer originally built on GtkGLExt and PyOpenGL, which we ported to Pyglet so as not to add another dependency, and adapted it to use our already parsed G-code. As it uses the GPU to do most of the drawing (we store the model to display in GPU memory if possible, so that the CPU only has to tell “print from this vertex to this vertex with this color or GPU-stored color array”), it’s super light for the CPU, and as it’s done in 3D we can zoom (and zoom to point of interest), pan and rotate in 3D almost for free.

The new 3D viewer3D viewer (zoomed, panned and rotated)

The new 3D viewer

Addition of a clean, tabbed options dialog

Another nice improvement was the addition of a clean options dialog, with options grouped in different tabs and being displayed with appropriate widgets: checkboxes for boolean options, combo boxes for multiple-choices options, spin buttons for numbers, text fields for text options.

Old options dialogNew options dialog

Old vs. new option dialogs

Addition of a part excluder tool

This is probably the most original addition: a G-code based part extruder. How many times have you had a single part fail on a big build plate and had to stop everything just because the plastic was not being correctly deposited on the failed part and started messing up the whole print? Well, this tool is designed for stopping this kind of situations from happening. Just hit the Excluder entry of the File menu and draw a few rectangles over the area you want to stop printing, and you’re done (you can also do that to exclude parts of already sliced plates you don’t want to fully print nor reslice without the extra objects). It basically avoids going into the defined rectangles, resetting E values if needed instead of extruding, and seems to work fairly well with Slic3r-generated G-codes (I haven’t tested with any other slicing engine, and it could break because layer changes have to happen independently of X/Y moves for now).

Failed part excluder tool (right) and print simulator (left)

Failed part excluder tool (right) and print simulation (left) where I stopped printing two of the four objects after a few layers

Addition of new interface layouts

Based on the GUI code modularization from last year, we added two new interface layouts. The first one differs from the default one by placing the log viewer below the printer controls on the left side of the program, freeing more space for the print visualization:


Default (left) vs. compact (right) interface modes

The second mode is much more different. It splits the interface into two tabs, the first one holding all the printer controls, while the second one holds the print viewer and the log box. This makes it super compact and very well fit for tiny screens (it should fit in 700×600 pixels), especially when using wxWidgets 2.9 where it will use an appropriate widget for the main toolbar which wraps around automatically.


The two tabs of the Tabbed interface mode

Addition of a bigger temperature graph visualization

Last, you can now click on the small temperature graph (which can also optionally be disabled and complemented or replaced by the old temperatures gauges) to open a new window showing a much larger version of the temperatures graph for easier monitoring. Not the biggest thing ever, but still handy :)

Large temperature graph

The new large temperature graph

July 22nd, 2010 @ 01:18

A school friend, namely p4bl0, mentioned the idea of maintaining blog posts with git and a set of hooks which would produce the blog html from the contents of the repo. I loved the idea, but thought I could push it a little further : a blog engine which would use no other storage than git, with the post subject and contents being the commit message subject and contents. A post-commit or post-receive hook then produces the html. As simple as that !

You can find the source in BloGit git repo, and see an example at BloGit example. To use the source, you first have to pack it (using the pack script), which will merge the raw_post and raw_produce, producing a single post script (which I also included at the end of this post), which you can simply put in an empty directory and run it. It will unpack the other script (produce), initialize the git repo, and set the hooks. It’ll then prompt you for your post title and then open an editor for you to set your post contents. Save the file, and you’re done with your first post : check the index.html file which has been produced in the same directory. You can write your own stylesheet in the blogit-style.css file. Further posts can be done with the same post script.

Yet, the best way is probably just to use the usual git workflow. To initialize the repo and all, run post --unpack, and to post post --raw or git commit --allow-empty (when using git commit, leave a blank line between the subject line and the rest of the post). You can also amend existing commits (using git commit --amend), use the GIT_AUTHOR_* environment variables to change the author, and so on. Since merge commits are skipped by the html generator, it should work just great for multi author blogging !

PS : I know this is JUST a git log pretty printer, and that the whole thing is pretty much trivial. I also know that using versionned files to store the posts would allow a lot of extra bonuses (such as automatically adding “Updated on …” mentions based on the commit log of each single file). I just thought the idea was fun :p There are probably a lot of things to improve, or a lot of smart git features to use there that I overlooked. Feel free to leave a line :)


April 21st, 2010 @ 20:54

After sorting out my clutter issues and finally producing a video of a clutter animation, I thought I’d use it on the initial goal, that animation I had written ages ago. What I had sadly forecast occurred, the video dumping awfully slowed down the animation.

The whole problem is that, for now at least, I don’t think it’s possible to run the animation frame per frame rather than time based. So I thought “let’s just defer the whole video generation to after the end of the animation, and bufferize the frames meanwhile”. Well, this worked… until oomkiller jumped in and killed my process. Urgh.

So, I can’t bufferize the whole video, but I can’t push the frames to gstreamer in real time directly from the animation either. Well, all I need is parallelism then ! Push frames to a queue which is consumed by another execution unit (by pushing the frames to gstreamer). And since threading pretty much sucks in Python (well, it would definitely since we need real parallelism), let’s use the new multiprocessing framework from Python 2.6. Using it is pretty straightforward : create some parallel structures (queues, pipes), spawn a new process with it’s own main function, push to the structures from one process, read from another, and you’re done. The only thing I’m still wondering is why there is a close() function on Queues when there is no obvious way to detect from the other end that the queue has been closed (which I worked around by pushing a None message).

Well, now I have a smooth animation and a smooth video dump, my two cores being nicely fully used :)

The code is available below, with the interesting parts being StageRecorder.create_pipeline, StageRecorder.dump_frame, StageRecorder.stop_recording and StageRecorder.process_runner.


April 20th, 2010 @ 19:15

A while back, I used clutter (a very nice and simple animation toolkit that basically let’s you easily work in a 3D environment with 2D objects) to do a little photo slideshow with a lot of customisations, but I never even showed it to the person it was aimed at because the whole thing was not satisfying enough (it either took ages to start or was not smooth and it was not easy to put a decent soundtrack when you can’t synchronize video & audio).

A simple solution would have been to do the rendering once and then just do the postproduction. I had quickly looked for a way to use a direct output of the animation to gstreamer (since there is gstreamer input support for clutter, this pretty much made sense), but there was none. Another option would have been to use a capture software, like Xvidcap, but this stuff is too heavy for my poor laptop. Consequently, I just gave up back then.

What I had completely overlooked is that clutter uses OpenGL for the rendering, so that all I had to do was to dump each frame myself using glReadPixels or using things like Yukon to do the dirty stuff. After a quick googling, I found this clutter mailing list thread about capturing the clutter output to a video file, which mentions the clutter_stage_read_pixels function, which does all the glReadPixels magic and even puts it in a more standard format. It also points to gnome-shell’s recorder stuff, which does the glReadPixels stuff and outputs it to a gstreamer pipeline, plus some extra fancy things (since they are doing screencasts of gnome-shell features, they draw the mouse cursor on top of each frame). So all I have to do now is put things together :)

One of the bad things I figured is that clutter_stage_read_pixels calls clutter_stage_paint, so mixing the gnome-shell recorder approach with clutter_stage_read_pixels results in a bad infinite loop if you don’t protect the whole thing. Even though this means painting things twice, I guess this is a much easier approach than having to use python-opengl or something along the line.

Another bad thing I encountered was that the Python bindings for clutter_stage_read_pixels are broken at the moment (pyclutter 1.0.2). The first problem is that the argument parsing part seems to be broken. Changing the PyArg_ParseTupleAndKeywords to a simple PyArg_ParseTuple gets things “working”, and gdb indicates a segfault in a PyDict_Check of the keywords argument :

Program received signal SIGSEGV, Segmentation fault.
0x00000032d34ecd9c in _PyArg_ParseTupleAndKeywords_SizeT (args=(0, 0, 500, 200), keywords=, format=
0x7ffff000d9ac "dddd:ClutterStage.read_pixels", kwlist=0x7ffff022f6c0) at Python/getargs.c:1409
1409 (keywords != NULL && !PyDict_Check(keywords)) ||
(gdb) bt
#0 0x00000032d34ecd9c in _PyArg_ParseTupleAndKeywords_SizeT (args=(0, 0, 500, 200), keywords=
, format=
0x7ffff000d9ac "iiii:ClutterStage.read_pixels", kwlist=0x7ffff022f6c0) at Python/getargs.c:1409

After asking on #clutter, ebassi immediately caught the problem, there was a missing “kwargs” bit in the python binding override definition, so that the kwargs were never actually passed to the C wrapper which was getting garbage instead.

The other problem was that the returned data was empty. This was simply due to the fact that the buffer returned by the C function was interpreted as a NULL-terminated string, which is wrong when you get such binary data. The fix was simply to specify the length to read to fill the string.

Both issues are now fixed in pyclutter git, and should be available on the next stable release.

The remainder of the port was pretty straightforward. The only problem was that I had no experience with gstreamer, which wasted me quite a lot of time. Here are a few things I discovered :

  • The --gst-debug-level command line argument is really really useful, especially on levels 3 and 4, it outputs a lot of valuable information on what’s going on and what’s not working.
  • The whole caps story is really important. After spending an hour trying to figure why my pads wouldn’t negotiate their caps, I found out that they couldn’t because I had a wrong cap (the endianness one), and after a few more hours I figured that I had to set the caps on each buffer, and that I actually only had to set caps on buffers.
  • Timestamps are not magically inferred (at least not without extra gstreamer elements) and should be set by hand using the buffer.timestamp python property (this is not quite well documented in the Python bindings documentation imho).

Well, that’s pretty much it. I used a Clutter python demo from fedora-tour and here is the result : Clutter Stage Recorder demo. The whole source is available below :)


March 31st, 2010 @ 20:01

Keyboard shortcuts are always a great matter of debate, and the whole problem is that most often they are chosen based on assumptions of the end user layout.

For instance, take this metacity commit : Change default cycle_group keybinding to Alt-grave. This change looks perfectly harmless, right ? Well not quite. It’s most likely based on the assumption that the end users has a qwerty keyboard layout (and it makes perfectly sense there). But let’s take an azerty layout. Grave is on the é/7 key, which is even farther from alt or tab than F6 is (well, not much I agree, but it might be even worse on other layouts). Is it really worth doing such a change then ?

Let’s also note that this also triggers a bad bug which gets alt+7 and alt+shift+7 to trigger the binding as well, while alt+grave is actually alt+altgr+7. This has been keeping me from nicely switching to my window n°7 in irssi for months (great thing that this window holds a really low traffic channel…).

All in all, I guess that the real problem is not that this change was made, but rather than we might need a system to have layout-dependant keybindings, or maybe hardware-location-based keybindings (i.e. that the key above the Tab key would trigger this keybinding independently of the layout).

Initially published on Mar 24, 2010 @ 8:22

Update : this change has been reverted for the GNOME 2.30 release. Even though I’m happy that the problem is “fixed”, it’s sad that the underlying problem (Alt+Shift+7 triggering Alt+`) is still there.

March 26th, 2010 @ 01:28

After migrating a bunch of stuff from one (about to expire) server which ran lenny to a new one running squeeze, a friend’s blog, which is powered by Dotclear, appeared heavily broken. His posts appeared empty, though they were still there and the titles were right, but nothing else (no url, no author, no date, no content). After a little bit of investigation, I figured that the problem was that squeeze is running PHP 5.3, and that my friend’s version of Dotclear obviously didn’t support it. Checked the Dotclear website, found out that since PHP 5.3 support is planned for the upcoming Dotclear 2.2, the latest version (Dotclear 2.1.6 — which my friend already had, actually) did not support it. Checked the PHP website to find the 5.3 release date : 30 June 2009. Wow.

Looked a little deeper in the Dotclear forums, and I found a patch which is actually a workaround for the problem. This workaround has been available since the 20th of July, 2009, and the Dotclear developers won’t include it even in the 2.1 branch because it’s a workaround and not a real fix :

Le patch n’est pas appliqué parce que, tu le dis toi-même, c’est une rustine. Ça peut paraître vieux jeu, mais nous préférons garder un code propre et régler les problèmes correctement.

Which translates to “The patch hasn’t been applied because it’s only a workaround. It might sound overaged, but we’d rather keep a clean code and fix the problems correctly”. Well, it sounds like a great plan, which would be fine if it did not took them ages to produce that clean code :) (arguably, since the patch is easily available, my point might is pretty much void, but still, it’s not official — the average user grabbing the latest official release and installing it on their hosting which provides php 5.3 might easily get confused).

March 25th, 2010 @ 23:45

There’s an outstanding bug right now which makes that cvCanny edge detector function in OpenCV currently segfaults on x86_64 systems. This post is an open attempt to track my debugging process :)

  • Bug encountered. I know it’s x86_64 specific since I ran the same code on an i686 machine a few hours ago (with a home compiled OpenCV, though).
  • Googled it : found reports on both OpenCV and RedHat bugtrackers.
  • Installed debug symbols, ran under gdb : all values I may need are optimized out.
  • Fetched OpenCV source, compiled it in debug mode.
  • CvCanny works great in debug mode.
  • Recompiled in release (optimized) mode to check if it is a distro-specific bug (both reports are from Fedora users).
  • Woha, release mode compilation is so slow :( But bug confirmed : it segfaults again. Time to instrument the code.
  • Filled cvCanny function with printfs and fflushs to track the function execution. Looks like it tries to access an element at index -514. Hugh. What’s even more frightening is that it successfully achieves that on another array.
  • After running the same instrumented code on my i686 machine, it appears that the indexes are right and that the same indexes are accessed without any problem in optimized mode in the i686 build.
  • Reading the code tells me that the accesses at negative indexes are legit since the array origin is shifted from the actual allocated memory blob start. Well, that’s good, since it explains why it works well in debug mode or on i686 setups, but that’s pretty bad because it’s going to be awful to narrow down.
  • Ok, doing the access by hand (i.e. doing _map[-514] instead of _map[j – mapstep]) works. This is getting crazy. Doing k = j – mapstep and accessing _map[k] segfaults as well. Huh.
  • After an hour of heavy fprintfs, I figured that long k = j – mapstep; gave me a k which wasn’t the int value (-514) but rather the unsigned int value (4294966782), while doing int k = -514; long k2 = k; printf (“%d %ld\n”, k, k2); in a very simple code gives out -514 -514, even with -O3 or -O5 and all the options used for OpenCV release build. Since we are working with 64 bits pointers (i.e. of the size of long integers), this is probably the issue : when accessing _map[k], it unreferences the value at _map + k, which fails since it unreferences _map + 4294966782 instead of _map – 514.
  • Doing volatile int k = j – mapstep; and accessing _map[k] works, and cvCanny runs great now. Though this isn’t a real fix, just a workaround. There’s most likely a compiler bug underneath.
  • Posted a summary of my findings and the workaround on the bug report on the OpenCV tracker.

Patch against latest svn (it should apply nicely to the 2.0.0 release as well) :

Index: cvcanny.cpp
--- cvcanny.cpp	(révision 2908)
+++ cvcanny.cpp	(copie de travail)
@@ -239,7 +239,8 @@
                     if( m > _mag[j-1] && m >= _mag[j+1] )
-                        if( m > high && !prev_flag && _map[j-mapstep] != 2 )
+                        volatile int k = j - mapstep;
+                        if( m > high && !prev_flag && _map[k] != 2 )
                             CANNY_PUSH( _map + j );
                             prev_flag = 1;
@@ -253,7 +254,8 @@
                     if( m > _mag[j+magstep2] && m >= _mag[j+magstep1] )
-                        if( m > high && !prev_flag && _map[j-mapstep] != 2 )
+                        volatile int k = j - mapstep;
+                        if( m > high && !prev_flag && _map[k] != 2 )
                             CANNY_PUSH( _map + j );
                             prev_flag = 1;
@@ -268,7 +270,8 @@
                     s = s < 0 ? -1 : 1;
                     if( m > _mag[j+magstep2-s] && m > _mag[j+magstep1+s] )
-                        if( m > high && !prev_flag && _map[j-mapstep] != 2 )
+                        volatile int k = j - mapstep;
+                        if( m > high && !prev_flag && _map[k] != 2 )
                             CANNY_PUSH( _map + j );
                             prev_flag = 1;