Tuesday, September 4, 2012

Paper Reading #2 - KinectFusion: Real-Time 3D Reconstruction and Interaction Using a Moving Depth Camera

Intro:
     Title: KinectFusion: Real-Time 3D Reconstruction and Interaction Using a Moving Depth Camera
     Author Bios
    1. Shahram Izadi 
      • Research focus: the technical end of HCI
      • http://research.microsoft.com/en-us/people/shahrami/
    2. David Kim
      • Research focus: novel input and output technologies for 3D spatial and natural interaction
      • http://di.ncl.ac.uk/blog/author/xdk/
    3. Otmar Hilliges
      • Research focus: the intersection of input sensing technologies, display technologies, computer graphics, and human computer interaction
      • http://research.microsoft.com/en-us/people/otmarh/
    4. David Molyneaux
      • Research focus: human computer interaction, augmented reality, ubiquitous computing, interactive multitouch tabletops, handheld and steerable projected interfaces, mobile smart objects, computer vision, and machine learning
      • eis.comp.lancs.ac.uk/people/david/
    5. Richard Newcombe
      • PhD. candidate in cognitive robotics and computer vision
      • http://www.doc.ic.ac.uk/~rnewcomb/
    6. Pushmeet Kohli
      • Research focus: the development of intelligent machines
      • http://research.microsoft.com/en-us/um/people/pkohli/
    7. Jamie Shotton
      • Research focus: computer vision and machine learning
      • http://jamie.shotton.org/work/
    8. Steve Hodges
      • Research focus: rapid prototyping, novel sensors, embedded camera systems, flexible electronics, display technologies, wireless communications, and ubiquitous and mobile devices
      • http://research.microsoft.com/en-us/people/shodges/
    9. Dustin Freeman
      • http://dustinfreeman.org/
      • http://dustinfreeman.org/files/DustinFreemanCV_academic.pdf
    10. Andrew Davison
      • Works in computer vision and robotics
      • http://www.doc.ic.ac.uk/~ajd/
    11. Andrew Fitzgibbon
      • Research focus: computer vision
      • http://research.microsoft.com/en-us/um/people/awf/
Richard Newcombe and Andrew Davison works for Imperial College London and Dustin Freeman works for the University of Toronto. The rest worked for Microsoft Research Cambridge.

Summary:

This paper covers KinectFusion. Using just the standard Kinect camera, they can recreate indoor 3D scenes. The system is able to use the live depth data from a Kinect camera that is being moved around a scene and create a 3D model that is accurate and detailed in real time.
Figure 1. A. User moving camera around B. The Phong shaded reconstructed 3D model  C. Real-time particles simulated on the 3D model that is mapped usign the Kinect RGB information D. The use of the multi-touch interaction that can be performed by users E. Shows the segmentation and tracking of a 3D object [1]
The authors claim that their system and novel GPU pipeline is unique for several reason. The system allows for real-time reconstruction and tracking, does not rely on any explicit detection step so it can be used under a variety of settings, provides high quality surfaces with real-world geometry, allows for some user interaction, does not require expensive equipment or augmented room, and can reproduce a whole room.

This system has several main features, some of which and be seen in figure 1. The first that is it can be used to scan an object by moving the camera around the object. This object can be used by CAD or 3D printed. Another use is that it can segment an object out of a scene by first recreating the scene and then allowing the user to move the object, which the system then segments in real-time. This system can also be used in augmented reality since it allows created 3D models to be added to a system, and this system will account for physics, which allows simulations to be run. Users are actually able to move around in the scene and the camera will track them and the note their touch interaction with the background scene (Figure 1 D).
Figure 2. The left shows raw data from a Kinect and the left shows the PC after reconstruction.

The GUI implementation has 4 main stages: depth map conversion, camera tracking, volumetric integration, and raycasting. These steps are executed in parallel and are written in the CUDA language. The algorithmic breakdown of each step is given in the paper. [1]

Related work not referenced in the paper:
  1. Efficient Model-based 3D Tracking of Hand Articualtions using Kinect
    • http://www.ics.forth.gr/~argyros/mypapers/2011_09_bmvc_kinect_hand_tracking.pdf
    • This paper makes a 3D model like this article does, but focuses on the the hand and not on the rest of the scene.
  2. 3D shape scanning with a Kinect 
    • http://dl.acm.org/citation.cfm?id=2037780
    • This one focuses on 3D object scanning using the Kinect, which is just one aspect of this article's system.
  3. Real-time 3D visual SLAM with a hand-held RGB-D camera
    • http://ias.in.tum.de/_media/events/rgbd2011/03_engelhard.pdf
    • It focuses on a RGB solution, which my paper avoids.
  4. Human Detection Using Depth Information by Kinect
    • http://www.nattee.net/sites/default/files/Human%20Detection%20Using%20Depth%20Information%20by%20Kinect.pdf
    • This is very similar in the fact that it has segmentation and tracking using the Kinect, but it focuses on a human not on a scene.
  5. The Kinect Sensor in Robotics Education
    • http://www.innoc.at/fileadmin/user_upload/_temp_/RiE/Proceedings/69.pdf
    • This one also uses 3D modeling, but focuses on robotics and education
  6. Accuracy Analysis of Kinect Depth Data
    • https://wiki.rit.edu/download/attachments/52806003/ls2011_submission_40.pdf
    • This is an analysis of the depth data, which is actually used by this paper's system.
  7. RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environment 
    • http://ijr.sagepub.com/content/early/2012/02/10/0278364911434148.abstract?rss=1&patientinform-links=yes&legid=spijr;0278364911434148v1
    • This one again focuses on RGB mapping
  8. Gesture Recognition based on 2D and 3D Feature by using Kinect Device
    • http://onlinepresent.org/proceedings/vol1_2012/26.pdf
    • This article is different because it focuses on users and not scenes and because it uses RGB color solution
  9. Using Kinect for hand tracking and rendering in wearable haptics
    • http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5945505&tag=1
    • This article focuses on tracking, but not on modeling.
  10. Incremental 3D Body Reconstruction Framework for Robotic Telepresence applications 
    • http://mail.isr.uc.pt/~mrl/admin/upload/752-068.pdf
    • This one focuses on modeling the body not a scene.
Overall, most of these have a lot in common with this paper's solution, but they do not cover the wide array of areas that this one does.This paper combines almost everything these other papers cover, which is unique, and it also takes new approaches to reconstructing scenes and segmenting.
Evaluation:
Each part of this system was tested separately, so this was not a systemic approach. This paper does not really go into details on testing methods, so I can only assume that it is still in the 'lab' test phase and not ready for real testing. Once that point is reached, I believe open-ended questions that are qualitative and subjective to be the best method since this is a system that should be tested partly on how much people enjoy using the system and how useful they feel it is. It could also be tested quantitatively by measuring how long it takes to reconstruct a scene.

Discussion:
I could see this being a very fun addition to video games in the near future. It could also be used in many other situation, such as training exercises for the military or policemen. This was not a well written paper though. It was highly technical, and was not written in a way that someone without a background could pick it up quickly. This really reduced by enjoyment of the paper for me.

Reference Information:
[1] KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera: http://delivery.acm.org/10.1145/2050000/2047270/p559-izadi.pdf?ip=165.91.10.170&acc=ACTIVE%20SERVICE&CFID=151349081&CFTOKEN=89389542&__acm__=1346736520_5660bbce8755472ac4aefa88d4f853b3
[2] All papers listed were found using http://scholar.google.com/

No comments:

Post a Comment