Archive for the ‘~home’ Category

May 17th, 2013 @ 14:37

Woha. It’s been such a long time since my last post. More than 2 years have passed since then, I’m now a PhD student in computer vision at Inria and I hacked on quite a bunch of projects.

Weirldy enough I did not do much pure software stuff (probably because I already code quite a lot for my research), but spent a lot (I mean, most) of my spare time on more hardware or hardware-oriented things, mostly in the scope of hackEns, the ENS hacking club we founded in October 2010.

Among other projects, we built a RepRap 3D printer, a Mendel one, in early 2011, with a first successful print on April, 22nd, 2011.

hackEns's Mendel 3D printer

hackEns’s Mendel 3D printer

A friend and I quickly started the replication process, building two Prusa i1 printers in August-September 2011 using plastic parts printed just-in-time with the Mendel. We continued the replication process with Jacques-Henri Jourdan, who slowly built a Prusa i2 starting early 2012, with a first successful print in late 2012, using plastic parts printed with my Prusa i1. That’s a beautiful chain of 3 generation of 3D printers. Over the builds, we have been amazed by the progressive technological improvements and the true capacity of the machine to auto-improve itself. During summer 2012 I almost reprinted all my printer plastic parts (the original ones were a little dirty and had clear defects that affected print quality), progressively transforming my Prusa i1 into a Prusa i2. One of the most striking things is that a 3D printer can produce better parts than the ones it is made of (and I’m not just talking about upgrades, but also about the print quality).

My Prusa 3D printer

My Prusa 3D printer

What’s also amazing is how fast the software side of 3D printing is evolving. Slicing, which is the operation of transforming a 3D object description in a printer-friendly list of commands, is definitely a non trivial task, both on the fundamental algorithmic side and on the engineering side. Indeed, it is done by slicing the object into successive layers along the Z axis and then computing where plastic should be deposited on this 2D layer, which requires a lot of geometrical computations and careful coding to correctly handle edge cases. Skeinforge was the first software I used to slice my objects, which was later improved upon in a fork known as SFACT. However these python-based approaches were quite slow overall (partly because of the language itself, partly because of the design decisions such as a complete modularity which might not be the best for such a software), and a few months later a new candidate appeared, Slic3r, developed from scratch in Perl with speed and efficiency in mind, and is now regularly evolving with better core algorithms and GUI features.

The possibility to print objects and the availability of 3D sensors such as Kinect (and software such as Kinect Fusion) has pushed replication possibilities way further than what we could have dreamed of 10 years ago. And for the hackers, it is now possible to build complex 3D objects from scratch, without having to painfully machine wood or metal by hand or with super-expensive CNC tools. If Thingiverse (or should we call it Makerbot Thingiverse now ?) is full of cute or pretty designs, it has also become a home for DIY hobbyist and hackers. Coming from a pure CS background with little engineering knowledge, we were able to develop a hexacopter (a multirotor unmanned aircraft with 6 engines) frame from scratch, printing it with our 3D printers and flying it (but I’ll tell you more about this project soon).

hackENS's hexacopter being assembled

hackENS’s hexacopter being assembled

The only sad thing overall is that some people are trying to leverage the interest in 3D printing technologies to make a lot of money by selling crappy printers. The number of 3D printer crowdfunding attempts which were started on Kickstarter is just unbelievable, often promising things that cannot be achieved with the technology they are using (like printing super huge objects which are not hollow without a proper heating chamber) or just trying to sell existing designs at crazy prices. It’s also sad how Makerbot is getting all the good press while at the same time they are going further and further away from the open source approach, but that’s another story.

April 23rd, 2010 @ 08:08

While surfing and jumping from link to link, I stumbled upon The Real Costs of “Free” Search Site Services. While the global idea of the article holds a point, I guess that the emphasis is definitely not put on the right parts.

The bold parts are all about the money that gets wasted by ads on a free web service. Let’s review the initial assertions and the conclusion :

For a company employing 10,000 people with an average loaded cost of $50/hour

Users would thus waste 1 hour and 34 minutes

In total, the cost of having employees look at and occasionally click on distractions costs the company $1.2 million per year.

What’s wrong here ? Well, first I guess that the main problem is that the stunning effect of the article is that $1.2 million per year, which looks huge at first sight, but if you look a little more, this $1.2 million a year is for a 10.000 employees company, so that’s actually $120 per employee per year, which is, as specified, about 2 hours and a half of work of that employee over the whole year.

Now let’s get around the fact that the employees sometimes have to go pee. I’d say that’s easily 5 minutes lost a day in non productive hygiene stuff, at least, which amounts for a total of about 20 hours a year per employee lost, which gives an awful $1012 per year per employee, or for our test company $10.1 million a year. Urgh ! Guess it’s about time to get the WC outside of the work area and have your employees swipe out & back in when they get there, to remove this illegitimate 1 grand you are giving out to each employee each year for nothing but pee !

If we also include the fact that anyway the employees won’t be 100% productive all the time, which leads to even more wasted money, that $1.2 million a year starts looking a little bit ridiculous. That’s the whole problem with that article, they used that 10.000 fold factor to have some significantly stunning figures to give, while completely omitting to mention how that scaled to.

Based on some rough estimations (about 3 weeks of holidays, working 5 days a week, 8 hours a day), our company is spending about $100.000 a year on each employee, or a total of $1 billion a year on human resources (on just those working in front of a computer screen). Two conclusions : that “wasted” time amounts for just 0.12% of the company HR expenses in that field, and that company is huge : 10.000 employees working on a computer, it must either be a large IT company or a really large non-it company, and in both cases, even if they probably don’t rely on free services (that would be silly), $1.2 million is probably just pocket money for them

Don’t get me wrong, stupid wastes should be avoided as much as possible, but please, pretty please, don’t try to hold your point using huge scale factors. Numbers lie, but when the lies are unveiled, the liars just look like fools. It just doesn’t help and could be avoided, especially when there are other much more robust arguments (here against ad-powered free services) around.

March 26th, 2010 @ 18:12
[youtube]http://www.youtube.com/watch?v=ymuV4Cql7lY[/youtube]
The ramp between the new Gates Hillman Center and the Purnell Center for the Arts at night
March 26th, 2010 @ 03:43
Sense : this picture makes none
No clue on where this image first appeared, so I won’t be able to give proper credit.

This image makes me think of the video clip of Takin’ Back My Love by Enrique Iglesias and Ciara. I fail to see the logic behind the different attacks : she drops his paintings and clothes into the swimming pool, while he barely spills the milk on the floor. WTF ?

By the way, I had been wondering for a while how to make links which directly point to a specific time in a Youtube video. Well, now I know.

March 24th, 2010 @ 00:53

Since Sunday I have been testing Picasa web face recognition on my set of pictures. After an hour of initial processing, I was presented an interface showing a list of cluster of faces, which I have to check, remove false-positives and name. While it seemed to work great for the very first clusters (which correctly grouped about 50 to 100 faces of the same persons), it quickly appeared that the whole thing was not that great.

Here are a few rants :

  • It seems to be heavily influenced by the face expression and angle (i.e. it’ll often make two clusters of faces of the same person depending on whether it looks tilted to the left or to the right).
  • It doesn’t reconsider the clustering after the initial processing : I’m pretty sure that after tagging a bunch of clusters of the same person, it could easily merge the remaining clusters into a single one.
  • It keeps giving me “communication errors”. I’m used to the “click and it’s immediately there” scenario with google services, and I have to say that this service is definitely not a good example. About two thirds of my actions result in such an error, which takes about 10 seconds to arrive ; successful actions also take several seconds (5 to 10) to complete, which is not really efficient when you have pictures of about 800 different persons which gives clusters of 1/2 faces you just want to ignore and that it takes about 20 second per cluster to get it actually ignored.

I know this is still in development, so the actual recognition problems are ok, but meh, the communication errors are really really annoying…

Update (03/25/10) : I don’t know if they fixed the problem or if it was just pure luck, but I was able to tag 1500 faces without a single problem in about half an hour. Yeepee !

March 22nd, 2010 @ 00:27

Last autumn I had to do some particularly cpu-time consuming but completely independant computations on thousands of pictures (namely computing SIFT keypoints on high resolution images). Using my laptop, it would have taken hundreds of hours to complete (it took about 20 minutes per image), but it could be easily done in a distributed manner on dozens of machines.

I thus started writing a little cloud-computing alike framework to handle the distributed computations in a safe manner : the machines I was intending to run my computations on were public-access machines and could get rebooted at any time, so the computation (let’s call it task from now on) that was started on it would never end.

There’s no trick here, the user defines his tasks, an entry point which sets off the initial tasks, and that’s it. The tasks are stored in a workqueue, which is handled by a master node, which assigns tasks to the other nodes (actually, the master server is split in two, the “heartbeat” server which monitors slaves status, and a taskqueue server from which the slaves pull their tasks). If a node does not complete a task within a user-defined amount of time, the master assigns it to another node. Yet, if the initial node manages to complete the task before the second one, the task gets completed anyway (this prevents infinite task respawning in case of an unexpectedly long task).

The tasks may use data stored in either a file-like storage or in a database. Since the whole thing is in Python without any runtime restriction, any arbitrary code or binary can be ran through this thing, which feels like a great pro versus frameworks where only a few languages are supported. A nice feature is that each task can be associated to a file pack (holding e.g. binaries which will be ran by the task) which is automatically deployed or cleaned when the task is started on the node.

Let’s also mention that the slaves are automatically deployed based on a simple configuration file (through ssh/scp) ; that applications are easy to deploy as well (the initial files, the entry points and al are defined in a single config file, form which the deploy tool automagically creates the file packs, updates the pack data to the heartbeat server and plugs the entry point into the taskqueue) and that there are only a few deps for the slaves (python and python-yaml), and not much awful ones for the various servers (mostly add sqlalchemy to the slaves deps).

There are still some points I’d like to improve. I’d first like to make the master node dynamic, and not static like it is now. Nodes which may elect as master (i.e. those who may directly connect to the various storages) would use a distributed protocol to actually elect the master (I’ve worked on such stuff last year during a complexity course, see my (French) summary about the subject : Calculabilité Distribuée), and then broadcast to the other nodes the winner of the election. If the current master is unreachable at any time, the electable slaves would start doing the election process right away, but would still try to attempt to reach the master until the election is completed.
I’d also like to add a way to handle in a nicer way mixes of 32 and 64 bits architectures (i.e. provide a way to define 32-bit/64-bit specific initial packages).
Security is another matter, I guess I should probably add some chrooting to the runtime environment, and secure the transmissions between the nodes (note that there’s absolutely no notion of authentication in any part of the software at the moment).

The only problem is that whan I got to the point I had finished the framework, I had no time to write the little bit of script to actually test the whole thing, and a friend had thrown the pictures to his high-end quadcore computer, so I had no chance to test it on a real life scenario. I’ll try someday, though. If you want to give a try to the latest version I pushed (it lacks the database backend and documentation at the moment, though I intend to finish/test them and upload them someday soon), check the Cloudpy git repo.

March 22nd, 2010 @ 00:13

A few days ago, I discovered I had a miniUSB-A (or something along the line) plug on my beloved EOS 450D camera. Fun stuff, I could read pictures directly from the camera without having to swapping my SDHC card back and forth, at a pretty decent transfer rate (yet quite slower than through the cardreaded of my laptop).

I then discovered libghoto2, which is the library behind the Linux support of this feature. After a little bit of googling, I stumbled upon the gphoto2 website, where they mentionned that it was possible to command shooting from the computer and directly get the resulting image on the computer. Immediately tried and… it works great ! (though I haven’t figured yet how to get the picture on the sdcard instead of directly on the computer).

Not sure what I’m going to do with this, but I’m confident there is a bunch of hacking possibilities here. Maybe should I write a simple GUI on top of it someday to begin with. Or I could try building a motorized base and remote control the whole thing from my n900 (and attempt to do things like Paris 26 Gigapixels). Hmmmmm…

March 20th, 2010 @ 01:40

Last summer, I went to INRIA Sophia Antipolis (INRIA is the French research institute for computer science, basically ; Sophia Antipolis names a group of villages near Nice and Antibes (on the Côte d’Azur) on which are installed quite a lot of technology businesses, among which HP or Amadeus) for a 3 months internship in PLANETE project, under the advisory of Mathieu Lacage, a great software engineer.

Before any extra details, let’s point to the code, the report and the slides. There are also some slides somewhere in the wild which were made for a quick talk given by Mathieu at ns-3 workshop at SIMUtools 2010.

The goal of the internship was to enable the seamless use of multicore parallelization in ns-3 network simulator to speed up simulations.
The obvious idea behind this project is that with all the trends on going more and more towards highly multicore architectures even for personal computers, such a change would possibly allow large speedups without having to directly throw a lot of hardware in the balance and without requiring most likely complicated configurations from the user.

Our goal was to make theses changes entirely transparent for both developers (i.e. those working on improving the ns-3 framework — routing stuff, devices handling…) and users (i.e. those using ns-3 to simulate their protocols…).

Before going into the optimization details, let’s mention the basic idea of the algorithms that we used for the synchronisation of the execution units (the cores, in our case), algorithms which are part of the conservative synchronization algorithms class (as opposed to optimistic algorithms class).
We first tried a message-passing based approach, namely the Chandy, Misra, Bryant algorithm. The whole idea in this algorithm is that each node knows how far in the future it can avance without any risk of simulation desynchronization. This is based on knowing the physical transfer delays, plus the current date of the neighboring nodes ; date which is passed through recurrent messages. This way, under certain conditions (namely that no link has a null transfer time), the simulation will safely advance.
We found out that this approach would lead to no performance gain due to the amount of messages which have to be passed compared to the actual amount of stuff done per time unit.
We then used a barrier-based synchronization algorithm : at each iteration, a global maximum date is determined, based on the current date of each node and the physical transfer delays (again). All the node are thus allowed to process events up to this date, then wait at the barrier, and then go to the next iteration when everyone has reached the barrier. The actual barrier implementation is a fun part, which is described in the report. From a trivial semaphor implementation to a fully optimized, tree-shaped implementation, performance gains of over a 100 fold can be observed.

One of the usual challenges when doing parallel computing is to define a proper partition of what each execution unit does. For instance, when working on a network of nodes (which are computers, routers…), which nodes should be handled by which execution unit ? Since we don’t want to ask the user about what he thinks would be the best, and since we can’t just use topology clues to make a good partitionning (because it also depends on the actual dynamics of the network, which we can’t predict), we had to use a dumb solution and make it smart. The solution we found there was to have both dedicated and global nodes : a given percentage of nodes is dedicated to each execution unit, while a global pool of nodes remains undedicated. When an unit has finished processing it’s own dedicated nodes, it starts picking nodes from the global pool and process them. This way, these units won’t waste their time waiting for other units which have to handle more complicated nodes in their dedicated set, thus requiring more processing time just for the local set.

Now you are going to ask, why not just use a global pool and let all the units directly pick nodes to process from there ? The whole problem is that this global list must be protected, two threads just can’t pick stuff from it at the same time — this would lead to awful memory corruptions, and probably more. Consequently, we need to use either mutexes (i.e. locks) or atomic operations to modify the list. Atomic operations are cheap compared to locks, and yet they are still at least 10 times slower than normal operations. This justifies the whole dedicated + global pools approach.

Now, there were more thread safety problems. While most of them were easily disarmed for minor performances loss (such as disabling free list or caches), there was still a major issue : reference counting. Indeed, in order to be as user-friendly as possible, ns-3 provides (and heavily uses) a reference counting system to ease memory handling for users. The whole problem there is that the counters might be accessed concurrently by two threads at once. What happens when a thread references an objet at microseconds before another thread unreferences it ?
An usual i++ operation is usualy done by taking i from memory to a register, adding 1 to it and putting it back to memory. Now, if the original reference count was 1, we could have the first thread taking i = 1 from memory, the second one taking i = 1 from memory, remov 1, putting i = 0 back into memory, then figuring i == 0 and trashing the objet, then the first threading adding 1 locally, and putting i = 2 back into memo… wait, this memory has been unallocated. Game over.
Again, a possible solution is to use atomic operations. Oh right, they are awfully expensive. Don’t care, let’s try, there must not be that much ref/unrefs going around there. And then… ow, actually, we just catched a plain 100% runtime increase.
While we fixed some of the stupidities that lead to such horrible increases, the problem was still on.
We tried to use thread local storage to maintain a per-thread reference count, but this did not really solve the problem, just helped.
We finally tried to use thread local buffers of Ref/Unref operations, which would be batch-ran by a single thread. This lead to nice performance improvements over the previous solutions, but if I remember correctly under certain unknown conditions the code would badly crash or eat all the memory. Urgh.

Still, in some conditions, I still managed to get over 20% performance improvement with 8 threads (iirc).

The conclusion of all this is that multithreaded programming is definitely not a trivial problem and that converting an existing software to gain from multithreading is not that easy. For more information, numeric data and all, check the report.

Now, I have had no time since middle September to work on the issue, but I know Mathieu has. It’d be awesome to have some news from him here… To be continued, hopefully.

March 20th, 2010 @ 00:42

A friend of mine arrived in the US one week before me, and when he told me his department had to pay $8 a month to the university (namely PSU) computer services for his computing account (email, computer login…), I was like, wow, this is theft.

One week later I arrived in Pittsburgh and CMU, quickly got a computer account. While browsing the various docs about their computer systems (I was trying to login into a “general purpose Linux machine” which I supposedly had access to — I actually hadn’t since they had forgotten to create the account on this specific machine), I discovered that the monthly fee for an user account at CMU was $105. Still stunned.

Let’s mention that activating an ethernet outlet is also a charged service, just as registering a machine on the network (so yeah, you (or your department) have to pay both the ethernet outlet activation fee plus the machine registration fee). Isn’t this whole system crazy ? Isn’t it normal work to activate an outlet on a switch configuration/plug a ethernet wire in the said switch or to hit the validation button of a probably automated script to add a machine to the network ?

March 19th, 2010 @ 04:20

I seriously wonder what’s wrong with French names here in the USA.

  • My landlady has my email saved as Guilluame <guillaume@…>
  • The guy at the bank managed to swap my firstname and lastname ; does Seguin really sounds like a firstname ? (well, he also managed to make a typo in my pin code while recording my file into the computer system, which lead me to great lengths of stress when I figured that the damn ATM definitely wouldn’t let me use my long-awaited debit card [let’s concede him that it was an uncommon operation, since usually everything is directly done on the computer (and the pin code is typed by the user and double checked) but they were experiencing a computer systems upgrade at the exact time I came in to get my account opened)
  • Most people can’t pronounce Guillaume right, even if I tell them to just say William and that it sounds quite the same they won’t quit trying on their own and producing pronunciations I wouldn’t have thought possible (not that I really care about this, it’s rather fun than anything else :p)