Internship long-term goals
- Recognize actions recorded with the Kinect in 3D movies
- Recognize actions recorded with the Kinect in 2D movies
- Query databases of 2D movies for actions recorded with the Kinect
Ongoing goals
- Evaluate skeleton pose estimation & tracking algorithms (both 2D and 3D)
- Lookup skeleton matching algorithms (maybe create a simple one ?)
- Figure out how to store skeletons in an efficient way for querying
- Understand format of 3D Blu-Ray movies and create an easy to use pipeline to extract them to something we can use (like pairs of videos for left/right eye images)
- Build depth maps from 3D movies (using stereovision)
- Extract skeletons from kinect recorded sequences, 2d and 3d movies
Notes
- The current skeleton pose estimation algorithm we have cannot be realistically used for our purpose due to the initial calibration sequence which require the user to hold a Ψ-like position for ~ 3 seconds to calibrate the skeleton detection. Let's also note that it easily loses track of the user when occlusion occurs, and rerequires the calibration step afterwards.
- 3D movies are encoded in a (left frame, delta between left and right frame) structure, which we can probably use to our advantage for the depth map computation (we could add some prior to the stereovision algorithm we will run on the movie)
Self-deadlines
- 04/16 Have a working 3D Blu Ray movie rip pipeline
- 04/16 Have a working 2D pose query algorithm
- 04/01 Have a working simple 3D-to-2D pose lookup tool : OK
Work log
Week 3 (03/28 - 04/01)
- Tue.-Fri. Worked on both pose query descriptors and distances and the kinect-to-2D pose lookup tool. The tool is almost done, but we have some severe issues with the pose query framework. Next goals are to evaluate our descriptors and query codes on the ETHZ Buffy Pose Classes dataset (9), to check if we correctly got them.
- Mon. Evaluating ETHZ code on Buffy & ETHZ PASCAL datasets. Working on how OpenNI-based apps work, aiming at producing a simple software to store poses. Evaluation of the ETHZ code prooved very nice performance, though we need to find a way to filter out wrong detections (i.e. pose estimation works pretty well, but the person detector without additional filtering isn't reliable).Prepared a basic OpenNI app, based on NiUserTracker.
Week 2 (03/21 - 03/25)
- Fri. Worked on the skeleton estimation stuff, read articles again, looked at the ETHZ code. Saint Louis classes.
- Thu. Improved wrapper code for the stereo algorithm. Running some more tests using the current stereo algorithm and looking at the competition. Succeeded at installing OpenNI, SensorKinect and NITE on the lab computer. Worked again on the stereovision parameters, tried with smaller images... No real success though a slight improvement was noticed when using images of width/height of about 400px, which is about the size of the provided test images... Looked at getting test data from (4), going to work on it during the weekend. Tried an algorithm from the competition randomly found on cvonline CMU page, which does pretty much as well (or maybe even better) as our top-ranked algorithm we've been evaluating : there's probably an issue with how we use or parameter this algorithm. Thinking on how to represent and store skeletons.
- Wed. Tried to adjust the parameters for the stereovision algorithm. Our frames seem to be correctly rectified, but the motion blur and maybe some extra artifacts from the video sequence might be responsible for the noise we observed. Going to try to improve my wrapper code for the algorithm so that we can do more. Based on a suggestion from Jean, I'm going to also benchmark another algorithm on the same frames. I might need some pointers there... Tried installing OpenNI, SensorKinect and NITE on the lab computer. Fetching the deps was rather easy but getting the whole thing to work without root access is a PITA for now.
- Tue. Experimented with (5) and (6). These initial tests seem to show a quite good algorithm. Going to try to run the software on the whole ETHZ stickmen/Buffy datasets. Tried a stereovision algorithm on the provided datasets and on the Street Dance 3D trailer. The results were quite awesome on the provided images but really noisy on the trailer frames. Maybe do we have a rectification issue on the trailer frames ?
- Mon. Thought more deeply over skeleton matching issues. Indeed, our context is not the generic object recognition one, but rather the action recognition one, where the skeletons we will match are all human ones. Thus we could probably just normalize the skeletons (by fixing one or some of the axes to be in a specified configuration and by normalizing the lengths of the bones). Matching is still non-trivial, especially since literature is more about graph matching than 3D skeleton matching. Meeting with Josef. We are refocusing on 2D matching for now, while still looking at our prerequisite steps for the 3D matching stuff on spare time. Going to experiment with (5) and (6) as a first step. Olivier mentionned using [7] instead for mapping the 3D silhouette to the videos because he believes (5) and (6) do not have good performance.
Week 1 (03/14 - 03/18)
- Fri. Saint Louis classes
- Thu. Looked at 3D movie formats, looked up software to rip 3D BluRay movies. Started writing this awesome recap page. Looked at [5] skeleton graph matching.
- Wed. Read [2], [3] and [4]. [3] and [4] could be implemented, but we need a dataset on which we can train the algorithms. CVPR recap meeting. Looked up some sterovision algorithms from Stefano Mattocia and Middlebury.
- Tue. Read [1], setup the Kinect with both libfreenect and OpenNI, evaluated current (proprietary) closed source skeleton pose detection algorithm from NITE (spent a while before figuring out how OpenNI modular architecture worked and that the pose detection module was closed source), looked up a few references on other skeleton detection algorithms ([2], [3], [4])
References
Here is a list of articles I read.- Real-Time Human Pose Recognition in Parts from a Single Depth Image Blake & al., CVPR 2011
- Nonlinear Body Pose Estimation from Depth Images by Daniel Grest, Jan Woetzel and Reinhard Koch, DAGM 2005
- Real Time Motion Capture Using a Single Time-Of-Flight Camera by Ganapathi, Plagemann & al., CVPR 2010
- Real-time identification and localization of body parts from depth images by Ganapathi, Plagemann & al., ICRA 2010
- Path Similarity Skeleton Graph Matching by X. Bai et L.J Latecki, PAMI 2008
- Articulated Human Pose Estimation and Search in (Almost) Unconstrained Still Images by Marcin Eichner, Manuel J. Marín-Jiménez, Andrew Zisserman and Vittorio Ferrari
- Recognizing objects by piecing together the Segmentation Puzzle by Timothée Cour and Jianbo Shi, CVPR 2007
Links
Here is a list of links pointing to stuff I tried or looked at.- OpenNI, a framework for natural interaction, providing a nice framework for accessing and interacting the Kinect, also providing basic proprietary skeleton estimation and tracking
- OpenKinect, an open framework for accessing the Kinect, still very young and raw
- bino, a movie player for stereoscopic 3D videos
- Stereobank, a database of 3D sequences
- 2D articulated human pose estimation software
- calvin upper-body detector, an uper-body detector from single images
- ETHZ Pascal Stickmen dataset, an annotated dataset of 2D human poses
- Buffy Stickmen dataset, an annotated dataset of 2D human poses from Buffy
- Buffy pose classes dataset, an annotated dataset of 2D human pose classes from Buffy