3D body measurement’s with a single Kinect sensor Nikolai Bickel Supervisor: Dr. Bogdan Matuszewski
A project report submitted in partial fulfilment of the Degree of BSc (Hons.) Computing 26/04/2012
Double Project CO3808 University of Central Lancashire email@nbickel.com
I.
Abstract
This report describes the project that has been undertaken as part of the final Degree of Computing BSc (Hons) at the University of Central Lancashire. The Microsoft Kinect holds the Guinness World Record of being the "fastest selling consumer electronics device". 8 million units were sold in its first 60 days and that was just the beginning. Until January 2012 18 million Kinect’s were sold worldwide. Mainly these devices were used for the gaming console Xbox as an input device instead of the normal game controller. It is able to control the console and selected games with gestures and voice commands. Very short after the big commercial success a few smart engineers found out that it was possible to connect the Kinect not just with the gaming console but also with a standard personal computer. Soon after a connection was established the engineers were also able to access the data of the Kinects sensors. The Kinect as a 3D input device for computer vision applications and a lot more was born. The quite good accuracy, relative to the price, made it to a great 3D measurement device. Because working with the Kinect is very interesting this project analyses if it is possible to use the Kinect as an input device for body measurements such as height or chest circumference to use this data to communicate clothing sizes. That first idea slightly changed to just make the 3D body model and the measurement of body height. In this project the different capabilities of the Kinect and the way to access and analyse data was explored. The next step was to make experiments with real persons to look how accurate the results are compared to the real world. This report describes the research around the Kinect project, its capability’s and the present results of the body scanning experiments and accuracy analysis.
II.
Acknowledgements
I would like to thank
my supervisor Dr. Bogdan Matuszewski for his help and guidance over the whole project lifecycle my family for supporting my study in Preston
Honours Degree Project Nikolai Bickel
Page 2 of 69
26/04/2012
III.
Table of Contents
I. Abstract ........................................................................................................................... 2 II. Acknowledgements ......................................................................................................... 2 III. Table of Contents ............................................................................................................ 3 1. Introduction ..................................................................................................................... 5 1.1. Context......................................................................................................................... 5 1.2. Overview ..................................................................................................................... 5 2. Analysis and Investigation .............................................................................................. 6 2.1. Chapter Summary ........................................................................................................ 6 2.2. 3D human body scanning ............................................................................................ 6 2.2.1. What is 3D? .......................................................................................................... 6 2.2.2. What can be scanned? .......................................................................................... 6 2.2.3. Problems ............................................................................................................... 7 2.2.4. Use-cases for 3d body models .............................................................................. 7 2.3. Project management .................................................................................................... 8 2.3.1. Project Proposal.................................................................................................... 8 2.3.2. Technical Plan ...................................................................................................... 8 2.3.3. Literature Review ................................................................................................. 8 2.3.4. Supervisor Meetings ............................................................................................. 8 2.3.5. Risk Analysis........................................................................................................ 9 2.4. Microsoft Kinect ........................................................................................................ 10 2.4.1. What is a Microsoft Kinect?............................................................................... 10 2.4.2. Hardware details ................................................................................................. 11 2.4.3. Principles of Kinect ............................................................................................ 12 3. Design Issues ................................................................................................................. 14 3.1. Chapter Summary ...................................................................................................... 14 3.2. Kinect interfaces ........................................................................................................ 14 3.2.1. OpenNI framework ............................................................................................ 14 3.2.2. OpenKinect......................................................................................................... 16 3.2.3. Microsoft Kinect SDK ....................................................................................... 18 3.3. RGB-Demo/Nestk (data processing) ......................................................................... 20 3.3.1. Current features of the RGBDemo ..................................................................... 20 3.3.2. Installation and Compilation .............................................................................. 21 3.3.3. RGBDemo installation ....................................................................................... 22 3.4. Supporting tools ......................................................................................................... 23 3.4.1. CMake ................................................................................................................ 23 3.4.2. Git ....................................................................................................................... 25 3.4.3. Meshlab .............................................................................................................. 25
Honours Degree Project Nikolai Bickel
Page 3 of 69
26/04/2012
3.4.4. Microsoft Visual Studio ..................................................................................... 25 4. Implementation.............................................................................................................. 26 4.1. Chapter Summary ...................................................................................................... 26 4.2. Kinect Calibration...................................................................................................... 26 4.3. Collect data ................................................................................................................ 28 4.3.1. How to save the Kinect data ............................................................................... 28 4.3.2. What data are saved per frame? ......................................................................... 30 4.4. Accuracy measurements ............................................................................................ 31 4.4.1. Random error...................................................................................................... 31 4.4.2. Systematic error.................................................................................................. 31 4.4.3. The setup ............................................................................................................ 32 4.4.4. Colour Segmentation .......................................................................................... 32 4.4.5. Histograms ......................................................................................................... 33 4.4.6. Standard deviation .............................................................................................. 34 4.4.7. Error map............................................................................................................ 34 4.4.8. Problems ............................................................................................................. 34 4.4.9. Results ................................................................................................................ 34 4.5. 3D reconstruction ...................................................................................................... 35 4.5.1. PCL Kinect Fusion ............................................................................................. 35 4.5.2. RGBDemo Reconstructor .................................................................................. 35 4.5.3. Implementation................................................................................................... 39 4.5.4. Problems/Solutions............................................................................................. 41 4.6. Point cloud processing ............................................................................................... 43 4.7. Meshing ..................................................................................................................... 44 4.8. Measurements ............................................................................................................ 45 5. Critical Evaluation......................................................................................................... 46 5.1. Chapter Summary ...................................................................................................... 46 5.2. Project Management Evaluation ................................................................................ 46 5.3. Design and Implementation ....................................................................................... 46 5.4. Possible Improvements .............................................................................................. 47 5.5. Learning Outcomes.................................................................................................... 48 6. Conclusion ..................................................................................................................... 48 7. References ..................................................................................................................... 49 8. List of Figures ............................................................................................................... 52 9. Appendix A ................................................................................................................... 53 10. Appendix B ................................................................................................................... 56 11. Appendix C ................................................................................................................... 61 12. Appendix D ................................................................................................................... 62
Honours Degree Project Nikolai Bickel
Page 4 of 69
26/04/2012
1. Introduction 1.1. Context There are a few problems when buying clothes online, the most common being that the purchased clothes do not fit. This is exacerbated by the fact that many users do not know their own size or the size of those they are purchasing for (such as parents who purchase garments for their children). Many people deal with this problem by ordering several sizes of the same clothes and send back the excess. For those who buy several sizes it can be a nuisance to return the excess clothes. Additionally if a customer wishes to purchase clothes for a special occasion, they might be reluctant to order online as they might be unsure of the fitting of the clothes ordered. The online stores also bear the costs associated with this problem, as they usually pay the shipping costs for the returned items, and also deal with several logistical issues along with the costs associated with the resale of the items. The project idea was to get the measurements with the help of a Microsoft Kinect. This was a pretty clear target, because the device is a cheap depth scanner. A customer should be able to upload his data of measurements to a homepage and the online shop can then check if the clothes the person wants to buy would fit this person. But this was not part of this project. The main work was the collection, analysis and combination of the data from the Kinect with the help of several different tools explained in this report. Additionally to the 3D reconstruction a setup and a program to analyse the accuracy and noise of the Kinect data was made. The first thought for the usage of the project outcome was just for online shops, but a 3D body model could be useful in different other applications for example to generate a virtual game character (avatar) for computer games or medical analysis.
1.2. Overview Chapter 2 (Analysis and Investigation) contains general information about 3D, especially in context of human bodies. It also contains project relevant information for example about project meetings, additional project documents and an evaluation of the risk analysis. Chapter 3 (Design Issues) contains information about the interfaces to connect the Kinect to the computer. It also contains information about the program RGBDemo which was used in this project. The next part introduces a few other supporting programs. Chapter 4 (Implementation) contains more information about how the 3D body model was produced and anything else that is important to produce a 3D body model. The chapter also contains information about the Kinect error and the accuracy experiment that has been made during this project. Chapter 5 (Critical Evaluation) contains evaluations and thoughts about the different aspects of the project. It also includes possible improvements and learning outcomes.
Honours Degree Project Nikolai Bickel
Page 5 of 69
26/04/2012
2. Analysis and Investigation 2.1. Chapter Summary This chapter contains general information about 3D especially in context of human bodies. It also contains project relevant information for example about project meetings, additional project documents and an evaluation of the risk analysis.
2.2. 3D human body scanning 2.2.1. What is 3D?
Figure 1 2D, 2.5D and 3D (D’Apuzzo, 2009)
3D stands for three-dimensional. Objects of our world can be represented by three parameters. These three dimensions are commonly called length, width and depth. That is also the reason why the Kinect is sometimes called “depth sensor”. The depth sensor of the Kinect can capture the information in front of it in these three dimensions.
2.2.2. What can be scanned? It is possible to scan the whole body or just parts of the body. For example it is possible to scan the chest, the back, the face or legs. In this project the full body scan of a person is being used. The advantages of scanning just a part of the body are given when there is a special interest for example in medical application.
Honours Degree Project Nikolai Bickel
Page 6 of 69
26/04/2012
2.2.3. Problems When working with a human body as a scanning object there are some problems that are summarized in the presentation of D’Apuzzo (D’Apuzzo, 2009):
Practical problems o Movements o Breathing o Hairs Physical limits o Stature o Size o Weight Private data
Especially the movements during the scanning were a big problem when reconstructing the body.
2.2.4. Use-cases for 3d body models Animation – When a 3D body model of a person is available it is possible to animate this body with computer graphic techniques. This animation could be useful in computer games. Ergonomics – For example a company could produce a special chair for a specific person’s body Fitness / Sport – Strung together scans could show the weight loss process (motivation) Medicine – A 3D model of a face could be useful for plastic surgery
Honours Degree Project Nikolai Bickel
Page 7 of 69
26/04/2012
2.3. Project management 2.3.1. Project Proposal The purpose of the Project proposal (see Appendix A) was to write about the problem we want to tackle with our project. We also had to write in generally how we want to solve the project. The project proposal includes searching for relevant literature and adding it to the document. It also contains an initial idea to tackle the problem.
2.3.2. Technical Plan In the technical plan (see Appendix B) the project is specified in more detail. It also contains the project management relevant material for example project deadlines. In this stage of the project a risk analysis was made and potential ethical or legal issues were discussed. The technical plan additionally contains a small commercial analysis about what costs are to be expected. An important part was also to schedule our project. It is realized with a Gantt chart (see Appendix C). A Gantt chart illustrates the different project stages in form of a bar chart.
2.3.3. Literature Review The literature review contains a discussion of the published work around the project topic. Because the Kinect is relatively new it was not easy to find literature specific for the Kinect. The title of the literature review is “Build a 3D body model with a single Kinect sensor�. The work on the literature review gave an overview about what is important in this project.
2.3.4. Supervisor Meetings There had been several meetings and conversations with the supervisor Bogdan Matuszewski during the course of the project from September until the end of April. From the start of the academic year in September until Christmas vacation we met nearly every week. Those meetings were shared by electronic bachelor students who also have Mr. Matuszewski as a supervisor. These meetings were organized in a way that every week one person had to present his project to the other students. The thoughts about the project were presented first. For this presentation a PowerPoint presentation to illustrate the problems was made. The purpose of the presentation was to learn how to present a project progress. The students who heard the presentation learned to think and criticize other projects. Sometimes the other projects introduced thoughts and ideas which could then be adapted to this project. For example the colour segmentation (see chapter 4.4.4) was mentioned in one of this meetings. In the second semester there were only individual meetings with the Mr. Matuszewski, so he could go into more detail when supervising. These meetings happened approximately every two to three weeks. In these meetings we also got the feedback for the literature review, the pre-christmas progress review, the post-christmas progress review and the acceptance check.
Honours Degree Project Nikolai Bickel
Page 8 of 69
26/04/2012
2.3.5. Risk Analysis Risk
Severity
Likelihood
Action
Noisy depth data from Medium the Kinect
High
Make the best out of the data I get from the Kinect
Inaccurate data from High external resources (OpenKinect, MS Kinect SDK, OpenNI)
Medium
Try to configure the tools the right way so that the interface give the best results that are possible
Robustness – the data of 2 measurements do not match
Medium
High
Try to reduce the error rate as much as possible
The Kinect sensor breaking
High
Low
Buy a new one (will take a week)
Losing information about the project or related information
High
Low
Make backups
Measurement points are too complex to implement
Medium
Medium
Try the best otherwise reduce the measurement points
Scheduling failure, not enough time to complete project
High
Medium
Make a Gantt chart
Table 1 Risk analysis
This is the risk analysis made for the project as part of the technical plan (see Appendix B) at the beginning of the project. This chapter discuss these risks and how they affected the project. The first risk was the noise data from the Kinect. Every measurement device has a noise. Due to the Kinects cast as a controller device for a gaming console perfect accuracy is not needed. But it still works as you can see in the results at the end of the report. The data from the Kinect interfaces on the PC were pretty good. There was no problem that the Kinect interfaces (see chapter 3.2) gave inaccurate values. The Kinect sensor did not break but instead the hard drive did. The likelihood on this issue was “Low”. But there were a backup of all project relevant data available so the impact on the project was not very big. It was just a small problem but a project meeting with the supervisor had to be rescheduled because the hard drive was not delivered on time.
Honours Degree Project Nikolai Bickel
Page 9 of 69
26/04/2012
2.4. Microsoft Kinect 2.4.1. What is a Microsoft Kinect?
Figure 2 Microsoft Kinect for Xbox 360 (Pandya, 2011)
The usual purpose of the Microsoft Kinect is not the way it is used in this project. Microsoft announced the Kinect as a gaming input device. It interacts with the Xbox, a video game console manufactured by Microsoft. It is an optional peripheral device. The Kinect is connected with a wire to the Xbox and enables advanced gesture recognition, facial recognition and voice recognition. The Kinect for Xbox 360 was released in November 2010. 18 million units of the Kinect sensor have been shipped worldwide till January 2012 (Takahashi, 2012). In this project the Kinect for Xbox 360 was used for research and development. The Kinect sensor has a practical ranging limit of 1.2 – 3.5 m distance when used with the Xbox software. The Kinect is connected to the Xbox through a USB port. Because the USB port cannot supply enough power for the Kinect the device makes use of a proprietary connector combining USB communication with additional power. The newer Versions of the Xbox does not need this special power supply cable. Shortly after the Kinect was announced hacker figured out how to use the Kinect as a nongaming device on PC’s. People used the Kinect to control robots or play normal computer games with gestures. In this project the Kinect was not used with an Xbox. Instead it was connected with a computer by a standard USB 2.0 port. The Kinect has an RGB camera, depth sensor and multi-array microphone. There is more written about the features in chapter 2.4.2.
Honours Degree Project Nikolai Bickel
Page 10 of 69
26/04/2012
There are two different Versions of the Microsoft Kinect. As Microsoft discovered that many people want to use the Kinect beyond the gaming purpose they start thinking about to support these researcher and company’s too. That’s why they announced a SDK about which is written in chapter 3.2.3. In the middle of this project Microsoft announced a new Kinect version (hardware) the “Kinect for Windows”. The Microsoft Kinect for Xbox 360 is the original Kinect sold for the Xbox. With this Kinect version people are not allowed to make commercial products using the Kinect. But the device drivers for the computer are still working with it. The Microsoft Kinect for Windows will be available in May 2012. The hardware of the device did not change. The Kinect application now has a “Near Mode” which enables the depth camera to see objects which are closer to the camera. With the “Kinect for Windows” developer are also able to use the Kinect commercially. The Kinect are just sold with a Kinect game “Kinect adventures”. At the moment the product Kinect Sensor with Kinect Adventures costs £ 99.00 (02/04/2012). For a device with a depth sensor this is very cheap. The price for the Kinect for Xbox are subsidized by Microsoft because they hope that “consumers buying a number of Kinect games, subscribing to Xbox Live, and making other transactions associated with the Xbox 360 ecosystem” (Kinect for Windows Team, 2012). This is also the reason why the Kinect for Windows will cost approximately £ 100 more than the Kinect for Xbox version. Within the first 60 days the Kinect was sold more than 8 million times. This is a Guiness World Record for the “fastest selling consumer electronics device”. In January 2012 Microsoft announced that they sold 18 million Kinect motion-sensing systems. Certainly the most Kinects are used for gaming and not for developing applications on the computer.
2.4.2. Hardware details 2.4.2.1.
RGB camera
The Kinect has a traditional colour video camera in it, similar to webcams and mobile phone cameras. Microsoft calls this an RGB camera referring to the red, green and blue colours it detects. The camera has a resolution of 640x480 pixels with 30 frames per second. The RGB camera has a slightly larger angel of view than the depth sensor. 2.4.2.2.
3D Depth Sensor
The 3D depth sensor is the heart of the Kinect unique capabilities. The sensor provides a 640x480 pixel depth map with 11 bit depth (2048 levels of sensitivity) with 30 frames per second. An advantage of a depth sensor is that it is colour and texture invariant. 2.4.2.3.
Multi-array microphone
The Kinect includes an array of four built-in microphones. Each of the microphones provides 16-bit audio at a sampling rate of 16 kHz. They are used by the Xbox to gather sound commands by the user. There is more than one microphone to isolate the noise in the room. 2.4.2.4.
Tilt motor for sensor adjustment
The motorized pivot is capable of tilting the sensor up and down. With this motor the Kinect is able to extend the horizontal field of view. Honours Degree Project Nikolai Bickel
Page 11 of 69
26/04/2012
2.4.3. Principles of Kinect There are different methods to measure the depth information of a scene. Those which are working with light waves, like the Kinect, are for instance laser scans or time-of-flight techniques. The Kinect is using structured light technique. Structured light is a process of projecting a known pattern of pixels on to a scene and then analyses the way these deform when hit a surface. This structured light could be visible for the eye or not. The Kinect is using invisible (or imperceptible) structured light because the light pattern is near-infrared light which a normal human eye cannot see. The structured light projected by the Kinect has a pseudo random pattern.
Figure 3 Image from the PrimeSense patent (Zalevsky, et al., 2007)
The Light Source Unit (IR light source) project the light pattern to the scene. The Light Detector (CMOS IR Camera) observes the scene and the Control system calculates the depth information. The calibration between the projector and the detector has to be known. Calibration is carried out the time of manufacture. The calibration of the device was carried out while manufactured. A set of reference images was taken at different locations then stored in the memory.
Honours Degree Project Nikolai Bickel
Page 12 of 69
26/04/2012
Figure 4 Structured light (Freedman, et al., 2010)
The speckle pattern (structured light) produced by the IR light source varies with the z-axis. Figure 4 shows a human hand (object) with different speckle pattern at different distance. Kinect uses 3 different sizes of speckles for three different regions of distances. Because the speckles are having a distance-dependent property, each position has its specific spacing and shape. The control system of the Kinect estimates the depth by correlating each window with the reference data (speckle pattern). The reference pattern is stored at a known depth in the Kinects memory. “The best match with the stored pattern gives an offset from the known depth, in terms of pixels: this is called disparity. The Kinect device performs a further interpolation of the best match to get sub-pixel accuracy of 1/8 pixel. Given the known depth of the memorized plane, and the disparity, an estimated depth for each pixel can be calculated by triangulation.� - (ROS.org, 2010)
Honours Degree Project Nikolai Bickel
Page 13 of 69
26/04/2012
3. Design Issues 3.1. Chapter Summary This chapter contains information about the interfaces to connect the Kinect to the computer. It also contains information about the program RGBDemo which was used in this project. The next part introduces a few other supporting programs.
3.2. Kinect interfaces At the beginning of the project there was the decision how to get the information from the Kinect. There were three possible interfaces to access the information. Often these interfaces were “built from source code�. In the field of computer software this term means the process of converting source code files into software that can be run on a computer.
3.2.1. OpenNI framework OpenNI Framework is published by the OpenNI organization. One of the OpenNI organization goals is to accelerate natural user interfaces. The founding members of the OpenNI organization are PrimeSense, Willow Garage, Side-Kick, ASUS and AppSide. PrimeSense is an Israeli company that provides 3D sensing technology for Kinect and ASUS is a multinational computer hardware and electronics company. The OpenNI organization provides an API that covers communication with low level devices (e.g. vision and audio sensor) and high-level middleware solutions (e.g. visual tracking using computer vision).
Figure 5 OpenNI framework architecture (OpenNI, 2012)
The OpenNI framework is not just for the Microsoft Kinect. It also supports other hardware as example the ASUS Xtion PRO. Honours Degree Project Nikolai Bickel
Page 14 of 69
26/04/2012
3.2.1.1.
Installation
The concept of OpenNI is that they are trying to be very modular. To install and access the API it is necessary to install three different components to work with the Kinect. It is important to install these components in the right order. The download for the components can be found on the homepage of the OpenNI organization: http://www.openni.org/Downloads/OpenNIModules.aspx All components are available as executable files in 32 and 64-bit version for Windows and Ubuntu. There are also stable and unstable releases available. To use the Kinect mod (Step 4) at the moment the unstable releases are required as you can read in the last step of the installation process. The first step is to install the OpenNI Binaries. The second step is to install the NITE Module (Download category: OpenNI Compliant Middleware Binaries). Many of old installation instructions on the internet tell you that a license key is necessary. But in all of the latest NITE installation packages the licence key is added automatically. The third step is to install the Primesensor Module (Download category: OpenNI Compliant Middleware Binaries). The last step is to install the SensorKinect driver which can be downloaded from https://github.com/avin2/SensorKinect. It is important to read the README file. At the actual version it says “You must use this kinect mod version with the unstable OpenNI release�. When all these packages are installed a restart of the system is highly recommended. When the installation is successful the driver for the Kinect is installed when the Kinect is connected the first time to the USB port (shown in Figure 6)
Figure 6 Installed OpenNI driver at the Windows Device Manager
3.2.1.2.
Interface
You can access data from the Kinect over the OpenNI framework with Java, C++ and C#. To use the interface in C/C++ add the include Directory "$(OPEN_NI_INCLUDE)". This is an environment variable that points to the location of the OpenNI Include directory. The standard location of the include directory is C:\Program files\OpenNI\Include. Also add the library directory "$(OPEN_NI_LIB)". This is also an environment variable that points by default to the location C:\Program files\OpenNI\Lib. Honours Degree Project Nikolai Bickel
Page 15 of 69
26/04/2012
The source code should include XnOpenNI.h if using the C interface or XnCppWrapper.h if using the C++ interface. 3.2.1.3.
License
OpenNI is written and distributed under the GNU Lesser General Public License (LGPL) which means that its source code is freely-distributed and available to the general public. 3.2.1.4.
Documentation
OpenNI framework is well-documented. There exists a Programmer Guide which explains the OpenNI system architecture and programming object model. These explanations are illustrated with code snippet examples. The OpenNI framework also provide example applications in C, C++ and Java with a Features thoroughly explained in the documentation. As any normal framework OpenNI also supply a documentation of the interface its classes and members.
3.2.2. OpenKinect OpenKinect is according to their homepage “an open community of people interested in making use of the amazing Xbox Kinect hardware with our PCs and other devices. We are working on free, open source libraries that will enable the Kinect to be used with Windows, Linux, and Mac” (OpenKinect, 2012). The OpenKinect project has an interesting history. In November 2010 the website adafruit.com announced a competition for hackers and said they would pay $3,000 to the person who can access the Kinect with a PC and access the image and depth data. The source code needs to be open source and/or public domain. A few days later Hector Martín finally was able to hack the Kinect and won the competition. When Microsoft finally announced their driver for the Kinect a Microsoft employee called Johnny Chung Lee shared his secret in his blog: “Back in the late Summer of 2010, trying to argue for the most basic level of PC support for Kinect from within Microsoft, to my immense disappointment, turned out to be really grinding against the corporate grain at the time (for many reasons I won't enumerate here). When my frustration peaked, I decided to approach AdaFruit to put on the Open Kinect contest” - (Lee, 2011) Johnny Chung Lee does not work at Microsoft anymore, but this statement means that the first person who started the whole hacks around the Kinect was a developer of the Kinect within Microsoft. The heart of the OpenKinect is the “libfreenect”. Libfreenect includes all necessary code to activate, initialize, and communicate data with the Kinect hardware. This includes drivers and a cross-platform API that works on Windows, Linux, and OS X. At the moment there is no access to the audio stream of the Kinect.
Honours Degree Project Nikolai Bickel
Page 16 of 69
26/04/2012
The roadmap of the Kinect mentions an OpenKinect Analysis Library. This library should analyse the raw information into more useful abstractions. This includes hand tracking, skeleton tracking, Point cloud generation and 3d reconstruction. But they are also writing that it takes months or years to implement these functionalities 3.2.2.1.
Installation
To explore the OpenKinect project the driver were build form the source code. To build the project for Windows you first need to download the source code from github (OpenKinectSrc, 2012). The next step is to install the dependencies. Those are for Windows libusb-win32, pthreadswin32 and Glut. Copy the .dll-files from the dependencies pthreads and Glut to /windows/system32. There are two parts to libfreenect. One is the low-level libusb-based device driver and the other is libfreenect. Libfreenect is the library that talks to the driver. The next step is to install the low-level device driver. This can be done in the Device Manager of Windows. Right click on the Kinect devices and select "Update Driver Software...“ Update the driver of the devices “Xbox NUI Motor”, "Xbox NUI Camera" and "Xbox NUI Audio". The drivers are located in the downloaded source code in the folder “/platform/windows/inf”. After this step use Cmake to configure the compiler and create the makefiles for the compiler. CMake is discussed in chapter 3.4.1. The next step is to compile the source code with the compiler that was selected in Cmake. To use the library it should be copied to “/windows/system32” or to the folder of the program you want to run with the library. More information is in the README files and the OpenKinect encyclopaedia. 3.2.2.2.
Interface
OpenKinect is a low level API. At the moment it only supports a few basic functions. It allows the access to the camera, depth map, the led and the motor for tiling the system. public class KinectDevice { public KinectDevice() Signature: ()V; public setLEDStatus(LEDStatus) Signature: (LLEDStatus;)V public getLEDStatus() Signature: ()LLEDStatus; public setMotorPosition(float) Signature: (F)V public getMotorPosition() Signature: ()F public getRGBImage() Signature: ()LI public getDepthImage() Signature: ()LI }
Honours Degree Project Nikolai Bickel
Page 17 of 69
26/04/2012
These are the functions OpenKinect offers to the user. As you can see it offers access to the RGB Image, the Depth Image and the Motor Position. The API is written in C but there are wrappers for Python, C++, C#, Java and several other programming languages. Wrapper in this case means that these wrappers are bridges between the C API and the appropriate programming language. 3.2.2.3.
License
This project is dual-licensed. When software is dual-licensed the recipients can choose under which terms they want to use or distribute the software. The two licenses are the Apache 2.0 license and the GPL v2 license. This means you can copy, modify and distribute the covered software in source and/or binary forms with some conditions. These conditions are for example all copies should be accompanied by a copy of the license. 3.2.2.4.
Documentation
OpenKinect provides a Wiki with relevant information to the interface and the Kinect itself. A Wiki is a website whose users can add, modify, or delete its content via a web browser like the well-known internet site Wikipedia.
3.2.3. Microsoft Kinect SDK The Microsoft Kinect SDK is the original programming interface to the Microsoft Kinect. It was announced by Microsoft in spring 2011 after they saw what impact the OpenKinect project had to the developer community. During the project in March 2012 Microsoft announced a new version of the SDK called “Kinect for Windows 1.5”. At the start of the project Microsoft named his software development kit “Microsoft Kinect SDK”. At the beginning of the Year 2012 they changed the name of their Kinect project in “Kinect for Windows” and the name of the Kinect SDK changed into “Kinect for Windows SDK”. Additionally to the publishing the Kinect SDK Microsoft also started 'Kinect Effect' marketing campaign. The campaign aims to show that a product designed for entertainment is having a big impact on people's lives. The intention is that consumers will view the product as a source of 'innovation and inspiration'. They created a video where they show different use cases of the Kinect beyond the normal gaming purpose. (Microsoft, 2012) Technically it is possible to use the Microsoft Kinect SDK for the “Kinect for Xbox” but they recommend changing to the new version of the Kinect called “Kinect for Windows”. This is the new version of the Kinect especially for Windows (See chapter 2.4.1). The Kinect for Windows is just available for the operating system Windows. This includes Windows 7 and Windows Embedded Standard 7. Currently, it can also be used with Windows 8 Developer Preview.
Honours Degree Project Nikolai Bickel
Page 18 of 69
26/04/2012
3.2.3.1.
Installation
The installation of the Kinect driver for Windows is the easiest of all Kinects programming interfaces. You only have to download the Kinect SDK application and start the setup. Then follow the setup which is pretty straight forward. Additionally to the Kinect for Windows SDK Microsoft also provide a runtime version of the Kinect framework. A run time version enables a potential user to install all components used by a program which want to connect to the Kinect without any software development kit specific content. The runtime version is for customers and smaller than the whole SDK. But this runtime version just work for the “Kinect for Windows” device. 3.2.3.2.
Interface
With the SDK you can build applications in C++, C# and Visual Basic. It is possible to access the image, depth and audio stream of the connected Kinect. To use the framework provided by Microsoft add a Reference to the dynamic-link library Microsoft.Kinect.dll. To use the data from the Kinect in Visual Studio make a right-click to “References” and select the option “Add Reference”. Select the Tap “.NET” and search for the “Microsoft.Kinect” library. Select it and click OK. You can use the classes and functions in C# by adding the using-command “using Microsoft.Kinect;” 3.2.3.3.
License
The new Kinect for Windows SDK authorizes development and distribution of commercial applications. The old SDK was a beta, and as a result was appropriate only for research, testing and experimentation. The license allows software developer to create and sell their applications to customers using Kinect for Windows hardware. That means you cannot sell applications for people who are having the Kinect for Xbox hardware. 3.2.3.4.
Documentation
Microsoft provides a lot of different documentations and help when you want to work with the Kinect for Microsoft SDK. This includes a discussion board, videos and code samples.
Honours Degree Project Nikolai Bickel
Page 19 of 69
26/04/2012
3.3. RGB-Demo/Nestk (data processing) An internet search for tools to access and process the Kinect data had the RGBDemo as result. The main creator of this program is Nicolas Burrus from Spain. RGBDemo helps to access the data from the Kinect and have a bunch of useful functions to compute the data from the Kinect. Both of these two projects are open source and available to download via the version control system git (more on chapter 3.4.2) or over Github (Burrus, 2012) (Burrus, 2012). Mr. Burrus divided his project with the Kinect into two parts. One is a library nestk (Burrus, 2012) and the other is the RGBDemo (Burrus, 2012). Nestk is a C++ Library for Kinect which provides a lot of the functions and classes that were used in the RGBDemo. The library is built on top of OpenCV and QT for the graphical parts. Parts of it also depend on PCL. The text in chapter 3.2.2 deals with this dependencies. RGBDemo is using a lot of the nestk library and it has implemented a lot of Kinect related algorithms. The RGBDemo is written in C++ and use QT for the graphical user interface.
3.3.1. Current features of the RGBDemo The RGBDemo can grab Kinect images, visualize and replay them. This topic is discussed in chapter 4.3. It supports the OpenKinect and the OpenNI as a backend framework. The library of OpenKinect called libfreenect is already integrated into the nestk. With the OpenNI backend the program can also extract skeleton data and hand point position. Since a few months there is also a stable release of the RGBDemo which supports multiple Kinects. Results of the different demo programs can be exported to .ply files. More about this file format is written in chapter 4.5.2.1. These demo programs are:
Demo of 3D scene reconstruction using a freehand Kinect (more in chapter 4.5.2) Demo of people detection and localization Demo of gesture recognition and skeleton tracking using OpenNI Demo of 3D model estimation of objects lying on a table (based on PCL table top object detector) Demo of multiple Kinect calibration
A good point for the RGBDemo is that it supports all common operating systems Windows, Linux and Mac’s OSX. RGBDemo is under the GNU Lesser General Public License (LGPL). “In short, it means you can use it freely in most cases, including commercial software. The main limitation is that you cannot modify it and keep your changes private: you have to share them under the same license.” (Burrus, 2012)
Honours Degree Project Nikolai Bickel
Page 20 of 69
26/04/2012
3.3.2. Installation and Compilation There is no installation routine for the RGBDemo. To use it the user simply has to download the .exe files and can start the program. This means there are no fancy installers or nice icons. RGBDemo offer Win32 binaries to use the program. To start one of the programs you simply have to click on the “rgbd-viewer.exe” and the program starts. Certainly the dependencies need to be installed and should work. It take some time to self-compile the RGBDemo application the first time especially when you never worked with Git CMake or with other open source projects. The environment to install the RGBDemo was Microsoft Windows 64-bit with Visual Studio 2010. The compiler from Visual Studio 2010 to build the RGBDemo was used. It is nearly impossible to compile the RGBDemo Reconstructor with 64-bit because there was an error “C2872: ‘flann’ : ambiguous symbol”. This error is caused by a conflict between an embedded flann in OpenCV and an external dependency of PCL with another copy of Flann. So the ambition to build a 64-bit version of was given up and the RGBDemo were built in the 32-bit version. This has no major disadvantage and also run on a 64-bit computer. At first dependencies of the RGBDemo were installed. The dependencies are OpenNI, QT, OpenCV and PCL. PCL is an optional dependency but were still installed and used in this project, because PCL have good point cloud algorithms implemented. 3.3.2.1.
QT
The Qt framework is an application to make graphical user interfaces. It is comparable with Window’s WinForms or WPF. But instead of just Windows, QT works on a lot more platforms for example additional to Windows, Mac OSX, Linux, Symbian (mobile phone operation system). QT is supported and developed by the Nokia development division. QT includes a GUI designer which was used in the accuracy program (see chapter 4.4). When the project started last year there was no pre-compiled library for Visual Studio 2010 available. But now they are available on Nokia’s web site. This makes the installation of the Qt to an easy task. The download is available on Nokia’s download page (Nokia, 2012). 3.3.2.2.
OpenCV
OpenCV (Open Source Computer Vision) is a library of programming functions for computer vision. It includes among other things 2D and 3D feature toolkits, Facial recognition system, Gesture recognition, Structure from motion or Motion tracking. The library was originally written in C. But the version 2.0 includes the traditional C interface and additionally a C++ interface. OpenCV runs on any major operating system including Android. Android is the most common operating system of smart phones. The pre-compiled OpenCV library is downloadable from the project homepage of OpenCV (OpenCV, 2012). When installing OpenCV it is a good idea to choose an installation path which has no spaces in it. For example on “C:\OpenCV2.2” instead of “C:\Program Files (x86)\OpenCV2.2”. This can prevent problems with the inclusion of the library.
Honours Degree Project Nikolai Bickel
Page 21 of 69
26/04/2012
3.3.2.3.
PCL
PCL stands for point cloud library. Its main goal is the 3D point cloud processing. The PCL framework contains numerous state-of-the art algorithms including, surface reconstruction, feature estimation, segmentation and model fitting. PCL is an open source project. The PCL were installed with the normal All-In-One installer from their homepage. This includes all of the libraries and also the dependencies of PCL. Part of the PCL is also experimental implementation of the KinectFusion algorithm. This algorithm is not included in the All-In-One installer. There is more about this reconstruction algorithm in chapter 4.5.1. 3.3.2.4.
Kinect backend
Certainly the RGBDemo need a middleware to get access to the Kinect. These are the Kinect interfaces which are described in chapter 3.2. In this project the RGBDemo was used with OpenNI.
3.3.3. RGBDemo installation After the installation of the dependencies the next step is to download the source code of the RGBDemo from the internet. This download can be done with git. There are more details about git written in chapter 3.4.2. The git command for the download is: git clone --recursive git://github.com/nburrus/rgbdemo.git
With this git command the source code is saved to the hard drive and in the next step to configure the source code with the CMake GUI (see chapter 3.4.1). Therefore the source directory is set to the path were the source code was downloaded. For example when the source is on C:\ the source directory is “C:\rgbdemo”. Then we are determining the build directory. We set the path to “C:\rgbdemo\build“. During the project the experience was made that it is no good idea to use the Cache function of CMake. It made more problems than it was useful. The cache can be deleted over File menu. The next step is to start the configuration by doing a click on the “Configure” button. In the next step a compiler can be selected. In this project the compiler from Visual Studio 2010 were used. That is a 32-bit compiler. A list of grouped names and values should appear now. These are the configuration parameters. Set the parameter OpenCV_DIR to the folder where the OpenCV binaries are (for example C:\OpenCV2.2). To use the PCL open the NESTK group and select the checkbox at the name “NESTK_USE_PCL”. Hit the configure button again and your almost done. The Cmake log should not show any errors. The CMake project is configured now. You can start to generate the Visual Studio Project files by clicking on the “Generate” button. Cmake then generate the project files (Visual Studio solution) to the source directory. For example to: “C:\rgbdemo\build”.
Honours Degree Project Nikolai Bickel
Page 22 of 69
26/04/2012
Then the Visual Studio solution with the name RGBDemo.sln can be opened. To finally get an executable file you need to right click on the project and click “Build”. There are two different build configurations available. These two are a “Release” and a “Debug” configuration. In a debug build the complete symbolic debug information is emitted to help visual studio to provide the debug tools and also the code optimization is not taken into account. In the release build there is no symbolic debug information and the code is optimized. The result is that the RGBDemo is approximately three times faster when with the release configuration.
3.4. Supporting tools 3.4.1. CMake A lot of tools that were used in this project needed CMake. CMake is a program to help and automate the build and compilation process is cross-platform and open source. CMake generate with source code and the CMake configuration files, project files for native build environments. These are for example in Linux makefiles, in Windows Visual Studio solutions and in Apple Xcode. This solves the problem of many open source and cross-platform applications like OpenNI, RGBDemo, PCL and OpenCV have. The problem is that one build environment for example makefile cannot build applications for Windows and Visual Studio does not build applications for Linux. CMake separate the process of writing source code and compile it for a platform.
Configuration file (CMakeLists.txt)
CMake
Native Build System
Native Build Tools
Executables / Libraries
(Visual Studio)
(.exe, .lib)
Figure 7 CMake process
Figure 7 show the process supported from CMake. The configuration files names are “CMakeLists.txt”. The native build system is for example a Visual Studio solution with projects and the native build tools are in this case Visual Studio. With the CMake generated solution you can compile the source code to an executable file or a library. CMake differentiate between two different folder trees. One is the source tree which contains the CMake configuration files, the source code and the header files and the other is the binary tree which contains the native build system files and the compilation output as for example executable files or libraries. Source and binary tree could be in the same directory but it is better to spate the trees. This has the advantage that the developer can delete the binary tree and not affect any of the source code. This two folder trees are configured in the CMake GUI in the field’s labelled with “Where is the source code” and “Where to build the binaries”. Cmake has a Cache where it saves the configured values. This cache is located in the build tree and stores the key-value pairs in a text file called “CMakeCache.txt”. The Cache contains a simple variable name and a value for that variable. Those configuration variables are for the Honours Degree Project Nikolai Bickel
Page 23 of 69
26/04/2012
RGBDemo for example OpenCV_DIR with a value where the OpenCV folder is located or NESTK_USE_PCL with the information whether we want to use PCL or not (where 1 is true and 0 is false). The configuration files have a simple syntax to govern the configuration process. IF (NOT WIN32) SET(NESTK_USE_FREENECT 1) ENDIF()
This is an example code of the RGBDemo configuration file that ensures if the operating system is not Windows to set the configuration value NESTK_USE_FREENECT to true. That makes the OpenKinect (freenect) to the default Kinect interface in Linux and Apple. But just setting the variable is not enough. In the source code we have to check if the CMake variable is set and then use OpenKinect or OpenNI. #ifdef NESTK_USE_FREENECT # include <ntk/camera/freenect_grabber.h> #endif
In the source code of the RGBDemo the OpenKinect (freenect) header files are only included when the variable is set to true. #Ifdef is a pre-processor command for C and C++. CMake is very scalable. KDE a desktop environment with approximately six million lines of code used for their build process.
Honours Degree Project Nikolai Bickel
Page 24 of 69
26/04/2012
3.4.2. Git Git is the name of a distributed version control and a source code management system. Git was initially designed and developed by Linus Torvalds for Linux kernel development. Git does not need any network access or a central server once the code is downloaded. Source code management systems keep track of a source code of a program when more than one person is working on it. A lot of open source projects and companies are using Git to manage their code. Git has a lot of commands and functionalities. The functions that were used in this project are only the tip of the iceberg. The following commands are just a small subset of Git. The commands below were the only ones used in this project. RGBDemo and OpenNI are using Git also for version controlling. Git is available for Windows, Mac OSX and Linux. The Windows version is integrated into the command line. To download the source code from a Git address you navigate to the folder you want to store the code and execute the following commands: git clone --recursive + git url
The git address is published by the project you want the code from. The recursive command is to clone the sub modules to. For example nestk is a sub module of the RGBDemo. git fetch
When the code is in one folder you can download latest source code by using the fetch command.
3.4.3. Meshlab Meshlab is an open source program to process meshes and point clouds. MeshLab is oriented towards the management and processing of unstructured large meshes and provides a set of tools for editing, cleaning, healing, inspecting, rendering and converting these kinds of meshes. MeshLab has been chosen as it's a free software and has been used in various academic and research projects. Meshlab was used for the point cloud processing and meshing (see chapter 0).
3.4.4. Microsoft Visual Studio Visual Studio is a well-known integrated development environment from Microsoft. A source code editor and a debugger are just a few of the functionality Visual Studio provides. In this project Visual Studio was used to write the accuracy analysis program and inspect and modify the source code of the RGBDemo.
Honours Degree Project Nikolai Bickel
Page 25 of 69
26/04/2012
4. Implementation 4.1. Chapter Summary This chapter contains more information about how the 3D body model was produced and other things that are important to create a 3D body model. The chapter also contains information about the Kinect error and the accuracy experiment that has been made during this project.
4.2. Kinect Calibration Camera calibration is a way of analysing an image and derive what the camera situation was at the time the image was captured. Camera calibration is also known by the term “Camera resectioning”. The camera parameters are represented as a camera matrix. This is a 3 * 4 matrix. In the pinhole camera model, a camera matrix is used to denote a projective mapping form the world coordinates to pixel coordinates using a perspective transformation. “OpenNI comes with a predefined calibration stored in the firmware that can directly output aligned depth and colour images with a virtual constant focal length. Most applications will be happy with this calibration and do not require any additional step. However, some computer vision applications such as robotics might need a more accurate calibration.” (MANCTL, 2012) Because of the above statement from MANCTL in this project a calibration was made to determine the specific intrinsic parameter of the Kinect. The RGBDemo contain an algorithm to do this calibration. Nicolas Burrus the programmer of RGBDemo is writing in the discussion board of the program, that he used the calibration routine from OpenCV. It is basically a pin-hole model with distortions. For the calibration the application “calibrate-openni-intrinsics.exe” from the RGBDemo was used. After the compilation the program can start over the command line (cmd.exe): calibrate-openni-intrinsics --pattern-size 0.0325 calibration calibration.yml
These are the parameter for the calibration program: pattern-size The square size of the chessboard in mention calibration The folder with a set of images with a checkerboard calibration.yml The initial calibration file The folders with the set of images with a checkerboard were made with the rgb-viewer.exe. A few examples are in Figure 8.
Honours Degree Project Nikolai Bickel
Page 26 of 69
26/04/2012
Figure 8 Kinect calibration RGB images
The calibration.yml file was exported with the rgb-viwer.exe with the menu File -> Save calibration file. This is the calibration file generated by OpenNI with the default parameters for the Kinect. The result of this process is a new calibration file â&#x20AC;&#x153;openni_calibration.ymlâ&#x20AC;?. In this file the intrinsic camera parameter of the Kinect video camera and the depth camera are stored. Camera matrix, or a matrix of intrinsic parameters
Cx and Cy is a principal point (that is usually at the image center) and are the focal lengths Result of calibration (RGB-Intrinsic):
The depth camera had the same intrinsic matrixes. It looked as if the depth camera calibration for OpenNI backend is not available. But after the calibration files worked with the other applications for example the Reconstructor program there was no further research in this area executed. The generated calibration file was used every time offline operations with the RGBDemo were done. The alignment between the depth data and the RGB data are done by the OpenNI framework internally. This means the depth at point [1, 1] are corresponding to the colour in the RGB image at point [1, 1].
Honours Degree Project Nikolai Bickel
Page 27 of 69
26/04/2012
4.3. Collect data When the connection with the Kinect was available data could be recorded. The RGBDemo was used to collect the depth and image data. In this project the data was not processed in real time. This means in the first step the data was recorded to the hard drive and in the second step the data was then reconstructed from there. This had a few advantages. One advantage is that the RGBDemo can reconstruct only 1 FPS (frames per second) but can save 6 FPS to the hard drive. The barrier for more frames per second are not the computing time of the reconstruction instead it is the limitation of the hard drive. To record the Kinect data in this project a SATA hard drive with 7200 and 5400 rpm was used. The program can probably save a lot more frames per second with an SSD (Solid-state drive) hard drive. Another advantage of recording and saving data to a disc is that they are not lost after one reconstruction cycle. When the data is saved it can be used again to make the reconstruction or the accuracy analyse several times with different parameter or different program code. One disadvantage might be that you do not have a full reconstructed model after a scan. It takes a bit of time to analyse the data after it’s saved. In a commercial use this might be a critical factor.
4.3.1. How to save the Kinect data The program to save the data is the “rgbd-viewer.exe” (see chapter 3.3). All of the necessary recording functions are encapsulated in a class called “RGBDFrameRecorder”. You can see the function and properties in the following UML diagram:
Figure 9 RGBDFrameRecorder UML class
This class stores a series of RGB-D images in “view directories”. Honours Degree Project Nikolai Bickel
Page 28 of 69
26/04/2012
The first step is to select the folder where the program should save all the frames. This is implemented through a QT text field (see chapter 3.3.2.1). The program that handles the storing of the information is as mentioned above the “rgbviewer.exe”. This viewer has a function called “onRGBDDataUpdated()”. The function name says pretty much what this function is doing. Every time when there is a new frame coming from the Kinect this function is called. This function contains amongst others the command: m_frame_recorder->saveCurrentFrame(m_last_image);
m_frame_recorder is from the data type „ RGBDFrameRecorder “ and m_last_image is from the data type “RGBDImage”. The function “saveCurrentFrame” generates the full directory path where the data should be stored and call the function “writeFrame()” from the class RGBDFrameRecorder. In this function all of the information from this frame is actually stored to the hard drive. The folder structure after two saved frames looks like that:
GRAP1 is the name of the configured folder we want to save the data. a00366901966103a is the serial number of the Kinect. This folder exists when someone wants to record multiple Kinects. In this case for every Kinect there is a new folder. viewXXXX For every frame there is a new viewXXXX folder. XXXX are a consecutive number from 0. raw In this folder all the raw frame data are stored.
Honours Degree Project Nikolai Bickel
Page 29 of 69
26/04/2012
4.3.2. What data are saved per frame? The term “Frame” has a lot of different definitions. In this project a frame is a collection of the following three files. 4.3.2.1.
color.png
Containing the image data of the frame compressed in the format PNG. PNG stands for “Portable Network Graphics” and it is a bitmapped image format that enables lossless data compression. A lossless data compression allows reconstructing the exact original data from the compressed data. An example of such an image is on at the right side. This image is saved with colour information. The resolution of the image is 640x480 pixels. Figure 10 color.png example picture
4.3.2.2.
depth.raw
This is a file format for RGBDemo. The nestk library has functions to read and write in that format. These functions are located in the opencv_utils.cpp and their names are “imwrite_Mat1f_raw” and “imread_Mat1f_raw”.
qint32 rows = m.rows, cols = m.cols; f.write((char*)&rows, sizeof(qint32)); f.write((char*)&cols, sizeof(qint32)); f.write((char*)m.data, m.rows*m.cols*sizeof(float));
In this code snipped you can see how the raw information is saved. First the program writes two 32 bits integers which contain row and column information. And then it saves rows*cols 32 bits float values. Because of that process the every depth.raw have a size of 1.17 MB (1.228.808 bytes).
640 and 480 are the normal weight and height of the depth image. 4.3.2.3.
intensity.raw
This is the IR image normalized to grayscale and saved with the same method as the depth.raw. Because of that the intensity.raw-file also has a size of 1.17 MB.
Honours Degree Project Nikolai Bickel
Page 30 of 69
26/04/2012
4.4. Accuracy measurements During the project an application was programmed to find out the accuracy of the Kinect. Measurement errors can be split into two components: random error and systematic error. (Taylor, 1999)
Figure 11 Random and systematic error (Taylor, 1999)
4.4.1. Random error Random errors are errors in measurements inherently unpredictable, and have null expected value in the experiment. Every measurement is susceptible to have a random error. Random errors show up as different results for supposedly the same repeated measurement.
4.4.2. Systematic error “Systematic error is caused by any factors that systematically affect measurement of the variable across the sample. “ (Research methods, 2012) “The correction of systematic errors is a prerequisite for the alignment of the depth and colour data, and relies on the identification of the mathematical model of depth measurement and the calibration parameters involved. The characterization of random errors is important and useful in further processing of the depth data, for example in weighting the point pairs or planes in the registration algorithm.” (Kourosh & Elberink, 2012) In their paper (Kourosh & Elberink, 2012) are pointing out, that it is important to know the random error when processing the data.
Honours Degree Project Nikolai Bickel
Page 31 of 69
26/04/2012
Several sources are coming to the conclusion that â&#x20AC;&#x153;the random error of depth measurements increases quadratically with increasing distance from the sensor and reaches 4 cm at the maximum rangeâ&#x20AC;? (Kourosh & Elberink, 2012) (ROS, 2011).
4.4.3. The setup To look if these results of the previous papers are possible a setup to test the random error of the Kinect was made.
Figure 12 Kinect accuracy test setup
The Kinect is pointing towards the sheet. Then the information form the Kinect is collected with the method described in the previous chapter 4.3. As a result a few frames on different distances to the Kinect were taken. This is one example frame:
Figure 13 Kinect accuracy test example frame
4.4.4. Colour Segmentation Of course the program should just use depth point of the white area on the sheet and not the depth data from somewhere else. The first approach was to use colour segmentation. The algorithm searches for all of the pixels in the RGB Image that are white, and because the depth and the RGB Image are aligned every pixel of the depth image identified in the RGB Image as a white pixel was used. Certainly as you see the in Figure 13 these pixels are never prefect white. Because of that other colours then clean white were used (for example grey colour shades). Honours Degree Project Nikolai Bickel
Page 32 of 69
26/04/2012
As you can also see in Figure 13 there is a white colour in the background of the image recorded from the light. These white points were not used for the accuracy analysis because there were thresholds in the program. For example when the plane was located in 100 cm distance of the Kinect the program just used depth values that are in the range from 90 cm to 110 cm for the calculation. But there were still problems with that approach because there were also white pixels on the retainer of the sheet and the depth artefacts which were also used because their value were inside the depth threshold. Because of those problems the colour segmentation was not used and the area of the sheet were defined by hand. The point on the left upper corner and the point at the bottom on the right were defined and then the program just used the points inside this rectangle. The highlighted rectangle at Error! Reference source ot found. shows the area that is used for the analysis. All the pixels that are red are included into the accuracy measurement for the program to analyse.
4.4.5. Histograms
Figure 14 Accuracy measurement with highlighted area
A histogram is "a representation of a frequency distribution by means of rectangles whose widths represent class intervals and whose areas are proportional to the corresponding frequencies." (Dictionary, 2012) In this project histograms were used to show visually the distribution of the depth values on the observed rectangle on the sheet. To generate a histogram you first have to define a range you want to observe. For example when we had measured with the sheet at a distance of 60 cm we are looking at the highest and the lowest value of the depth data (58 and 62 cm) and used them as a range. Then these ranges were divided into intervals and then every depth was assigned value to one of these intervals. To calculate the histogram functions of OpenCV were used (cvCreateHist, cvCalcHist). Figure 15 shows the histogram when the sheet is 60 cm away from the Kinect sensor. The gaps between some occur because of the quantisation range of the depth inside the Kinect. Figure 15 Histogram
Honours Degree Project Nikolai Bickel
Page 33 of 69
26/04/2012
4.4.6. Standard deviation Standard deviation measures the dispersion of a set of data from its mean. The more the values are spread apart the mean, the higher the deviation. Standard deviation is a well-known parameter in statistics. The standard deviation is calculated as the square root of variance. The standard deviation can be calculated with the following formula.
√ ∑ ̅
Where in this experiment
{
} are the depth values ̅ is the mean of the depth values
In this project the standard deviation was calculated over a set of frames with the depth values on the sheet. The results of these calculations are shown in 4.4.9.
4.4.7. Error map To visualize the errors there was also made an error map. Every depth value in the frame was subtracted by the mean of all depth values. The result values then converted into colour information according to the distance of the mean and showed in the accuracy program. Figure 16 Error map of the first frame (60 cm)
4.4.8. Problems
There is a problem when the Kinect is not facing straight to the sheet. This problem could be solved by estimating a plane with the depth values with SVD and then calculate the distance from all points to this plane. That was not made in this project because at the end of the project the priorities were the 3D body model.
4.4.9. Results Distance [cm] 90 80 70 60
Average distance [cm] 89.9007 80.1377 70.5072 60.3991
Average minimal value [cm] 88.7333 79.1571 69.6477 59.5136
Average maximal value [cm] 89.9007 81.2036 71.6892 61.3
Standard deviation
Frames
0.496103 0.369619 0.386073 0.39508
6 56 111 22
The standard deviation is really high. The error is higher than the results in the literature (Kourosh & Elberink, 2012) (ROS, 2011). There are a lot of error sources but one important is properly the one pointed out in 4.4.8. Another bad effect could be if the calibration went wrong. Honours Degree Project Nikolai Bickel
Page 34 of 69
26/04/2012
4.5. 3D reconstruction In computer vision and computer graphics 3D reconstruction is to capture the shape (in our case the body) of an object. In this project two different Kinect programs that provide reconstruction functionality were evaluated. These two called RGB-Demo Reconstructor and PCL Kinect Fusion.
4.5.1. PCL Kinect Fusion The PCL (point cloud library) project was already mentioned in chapter 3.3.2.3. They implemented the Kinect Fusion algorithm into their library (PCL, 2011). The Kinect Fusion project “investigates techniques to track the 6DOF position of handheld depth sensing cameras, such as Kinect, as they move through space and perform high quality 3D surface reconstructions for interaction” (Microsoft, 2012). They have published the two research papers “KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera” (Izadi, et al., 2011) and “KinectFusion: Real-Time Dense Surface Mapping and Tracking” (Newcombe, et al., 2011) . The PCL open source community is about to implement the algorithm from this scholarly papers in the PCL source code. This program is not in the official release version which means it is currently in development and can be used just experimental. During the evaluation of reconstruction programs this implementation was also tried. Therefore the SVN and CMake (see chapter 3.4.1) were used to build environment. It worked but not perfect. Probably this was because the algorithm was not perfectly implemented. The algorithm support real time reconstruction and the code rely heavily on the NVidia CUDA development libraries for GPU optimizations. Compute Unified Device Architecture (CUDA) is a parallel computing architecture developed by Nvidia for graphics processing (Nvidia, 2011). In this project this approach was not used because it was too difficult to predict in which direction the project is moving. Because the program is in an experimental stage the creator of this program could easily change interfaces which could have an effect on the program. By the time the program was tested there were also no easy export function of the generated model, but it is possible to extract this point cloud model somehow. A little disadvantage was also that results are without colour. Because of this the RGBDemo Reconstructor was used. In a few months PCL KinectFusion is definitely an option to keep an eye on.
4.5.2. RGBDemo Reconstructor The RGB Reconstructor “rgbd-reconstructor.exe” is a part of the RGBDemo demo programs. In chapter 3.3 this tool is already explained. When we generate the Visual Studio solution we have to build the rgbd-reconstructor project. It is necessary to build in “Release”-mode so that the speed of the reconstruction is faster. The Debug mode has a lot of overhead from the debugging tools. The official purpose of the RGBD Reconstructor is the interactive scene reconstruction. In praxis this looks if you are walking in a room with a Kinect and the program will then progressively aggregate all captured frames in a single 3D point cloud model.
Honours Degree Project Nikolai Bickel
Page 35 of 69
26/04/2012
Because the normal purpose of the Reconstructor is to scan a room and not a personâ&#x20AC;&#x2122;s body the first thing was to look if the reconstruction also works with a person inside the scene (room). The big different between a scene (room) and a person is that even when the person tries to stand still there is movement of the body. For instance when the person breathes there is a small movement of the chest and when the person is wearing clothes they also move a little bit. This data is represented in a single reference frame using a Surfel representation to avoid duplicates and to even out the result. An object is represented by a dense set of points Surfel is an abbreviation of "surface element" and in 3D computer graphics Surfels are a alternative to polygonal models (Meshes). The creator of the RGBDemo calls it Surfel representation but it is also possible to call it point cloud. It is possible to use the RGBD Reconstructor in real-time. That means you can start this program with a connected Kinect and as a frame comes in its analysed and integrated into the point cloud. In this project this was not done because of the following disadvantages. First it does not use every frame that is coming in from the Kinect in real-time because it took the program to a lot of calculations to find the right spot where it can insert the data of this frame. Therefore the program uses just 1 Frame per second with an i7-2720QM CPU (quadcore processor with 2.20 GHz/Core). Of course there is a possibility to improve this process of the algorithm or use a faster computer. The second disadvantage has a little bit to do with the development respectively adaption of the reconstruction program. When you save all the frames on a disk you can repeat the reconstruction several times with different parameters and other modifications and can look if there are differences in the quality of the result (Point cloud). That is why the frames of the Kinect were collected with a frame rate as fast as possible with a hard drive (about 4-10 FPS) and the reconstruction program run on this data to get a point cloud. How and what data recorded is explained in chapter 4.3. The data is recorded whilst another person walked around the subject (first person) with the Kinect sensor in his hand to get the information about the body from 360 degree. The first thought was that the person standing in front of the Kinect rotates on its own axis. That has the advantages of the scanning process not needing two involved persons (One who scan and one who are the subject). But there were problems with that approach. First the RGBD Reconstructor is not build to make this task. But this is not the main problem because it is possible to change the algorithm. The major problem is that when the person is rotating around its own axis the body deforms too much.
Honours Degree Project Nikolai Bickel
Page 36 of 69
26/04/2012
Figure 17 six example input frames for the Reconstructor
Figure shows six frames from the recorded data of one scan. On the left side is the depth image converted into a colour representation and on the right side the corresponding RGB Image. For one reconstruction approximately 1000 frames were used. After the images were collected the reconstruction process begun. The reconstruction program (rgbd-reconstructor.exe) can be started with the following command on the windows command line (cmd.exe). C:\RGBDemo-0.7.0-Source\build\bin\Release>rgbd-reconstructor.exe -calibration openni_calibration.yml --directory C:\usi7 --sync true â&#x20AC;&#x201C;-icp true
The parameters for the Reconstructor are: calibration directory sync icp
The calibration file (yml) (see chapter 4.2) The folder where the recorded data is located (see chapter 4.3) The synchronization mode that should be used Use ICP to refine pose estimation
The synchronization mode tells the program that it should use every frame to build the point cloud. After the process of reconstruction which takes about 10 minutes the result is a point cloud in the programs memory. This point cloud can be exported in the format .ply.
Honours Degree Project Nikolai Bickel
Page 37 of 69
26/04/2012
4.5.2.1.
Export file format - .PLY
PLY is a computer file format known as the Polygon File Format or the Stanford Triangle Format. The format is designed to save three dimensional data. The format has a relatively easy structure and there are two variation of the format one in ASCII, the other in binary. The RGBDemo exports its 3D models in the ASCII version. ply format ascii 1.0 element vertex 3294565 property float x property float y property float z property float nx property float ny property float nz property uchar red property uchar green property uchar blue end_header 0.00290329 0.00341359 0.429719 -0.203918 0.268669 -0.9414 254 254 254 -0.000625859 -0.00391612 0.432549 0.200947 -0.465571 -0.861896 254 254 254 -0.0013024 -0.00193452 0.438721 0.333459 -0.0203877 -0.942544 254 254 254 0.004828 0.000784338 0.443075 0.457977 0.101636 -0.883135 254 254 254
This is the header and a few points of an exported file. The first line “ply” indicates the file as a PLY file. The second line indicates which variation of the PLY format this is. The third line presents a description of how some particular data elements is stored and how many of them there are. The following “property“-lines are describing how the element is represented. Where x, y, z are the coordinates, nx, ny, nz are the normals and red, green, blue are the RGB representation of the colour of a point.
Figure 18 Reconstructed point cloud
Honours Degree Project Nikolai Bickel
Page 38 of 69
26/04/2012
Figure 20 is showing the result of reconstruction process in MeshLab. The person is on the left side and as you can see the algorithm reconstructed also the walls and a bit of the floor. That unimportant information is deleted in the next step (see chapter 4.6).
4.5.3. Implementation In the official forum the main developer of the RGBDemo describes the algorithm of the RGBD Reconstructor this way: “it basically uses feature point matching, following by RANSAC and optional ICP pose refinement” (Burrus, 2012). The program extract the SURF features from the camera image and localize them in 3D space. Then it matches these features between the previous acquired images, and use RANSAC to robustly estimate the 3D transformation between them. Optionally it uses ICP to refine the estimated camera position. If the algorithm found a pose and the error thresholds (of RANSAC and ICP) are not exceeded then the program adds the points in this frame to the reference frame. That is the point cloud which is later exported from the program.
Figure 19 RGBD Reconstructor flowchart
4.5.3.1.
SURF
SURF stands for “Speeded Up Robust Feature” and was first presented by Herbert Bay et al. in 2006 (Bay, et al., 2008). It is an image detector and descriptor that can be used in computer vision. SURF is based on sums of 2D Haar wavelet responses and makes an efficient use of integral images. The standard version of SURF is partly inspired by SIFT. SIFT is another better known algorithm in computer vision to detect features in images. The standard version of SURF is faster than the SIFT implementation. In computer vision and image processing feature detection is a concept to find information about an image that describes an image in a way a computer can work with. The result of feature detectors (SURF) is often subset of points which describe the image appropriate. Often the features extracted by analysing the surrounding pixels of one pixel. There are different types of image features and often feature detection algorithms are specialised on one of these features. These types are for example Edges, Corners / interest points, Blobs / regions of interest or interest points or Ridges. Honours Degree Project Nikolai Bickel
Page 39 of 69
26/04/2012
After the SURF algorithm has found the interest points in the RGB image this information is combined with the depth data because the interest points of the RGB image are just two dimensional. For example if an interest point is at the pixel [5, 5] it access the depth data on pixel [5, 5] and combines this value to a point in three-dimensional space. These 3D points are then matched with 3D points of previous frames. The result is a set of point-wise 3D correspondences between two frames. Based on these correspondences the RANSAC algorithm estimates the relative transformation between the frames. 4.5.3.2.
RANSAC
RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an algorithm to estimate parameters of a mathematical model from a set of observed data. The property of this observed data is often that they have outliers. The input data for the relative pose transformation estimation with RANSAC at the RGB Demo are the interest point correspondences. 4.5.3.3.
ICP
The RGB Demo uses a variation of the ICP (Iterative closes point) (ZHANG, 1992) to refine the estimated transformation. The process is optional and takes computing time. If it is important to have a very fast reconstruction it is better to turn of this feature. Certainly the quality of the reconstructed object is not as good as possible. Because this project did not look on the speed properties of the reconstruction this refinement with ICP were used. The implemented ICP algorithm is not in the RGBD source code. Instead the program is using the ICP method from the Point Cloud Library (see chapter 3.3.2.3).
Honours Degree Project Nikolai Bickel
Page 40 of 69
26/04/2012
4.5.4. Problems/Solutions This section contains problems with the reconstruction and what the solutions of these problems were. There was a problem that the RGBDemo did not use all frames successive. Instead it took the first image and when the Reconstructor program tried to estimate the camera position the other frames continued. This means after the first frames was computed the program did not use the second frame instead it used the frame that is at this moment active for example the fifth frame. But with this behaviour the reconstructed model lost information and the estimated camera position was not very good because there was not enough reference information. The RGBDemo had an option to compute the frames one by one but in the early versions this option did not work. This option can be activated by putting a â&#x20AC;&#x153;--sync trueâ&#x20AC;? parameter when starting the Reconstructor program. Nevertheless in the first versions this option did not work but with the version 0.7.0 that bug was fixed and the frames were computed one by one. Another issue was that the reconstruction stopped at a certain point. Often this was the case when the front of a body was scanned and then came to the side of the person. Somewhere in this area the Reconstructor algorithm lost the track. An example of this issue is shown in Figure 20. As you can see in the side view there just a few points from the back.
Figure 20 Failed reconstruction shown from front and side view
Honours Degree Project Nikolai Bickel
Page 41 of 69
26/04/2012
The next attempt with red markers in the background also failed. The idea behind the red and white paper stuck on the walls was that the algorithm may be able to use this colour to find more feature points. But as you can see in
Figure 21 this did not work. The supervisor gave the advice to try another environment for the scan because although there more markers added to have more feature points they maybe still to small. After a change of the location the reconstruction worked as expected.
Figure 21 Reconstruction with marker
This is a limitation of this algorithm. The background should not be monotonous. The more diversified the background colour is the merrier. Another solution could be to change the threshold of the feature points. This could have an effect on the quality of the estimated position. It is also possible to estimate the camera position just with the depth information but this is a big change in the actual algorithm.
Honours Degree Project Nikolai Bickel
Page 42 of 69
26/04/2012
4.6. Point cloud processing A point cloud is a set of points in a three-dimensional coordinate system. These points called vertices are the plural form of vertex. In computer graphics a vertex is a data structure to describe a point. The result of the reconstruction is a point clouds were the point are defined by X, Y and Z coordinates and the colour. To process this point clouds the program Meshlab was used. â&#x20AC;&#x153;MeshLab is an open source, portable, and extensible system for the processing and editing of unstructured 3D triangular meshes. The system is aimed to help the processing of the typical not-so-small unstructured models arising in 3D scanning, providing a set of tools for editing, cleaning, healing, inspecting, rendering and converting this kind of meshes.â&#x20AC;? (MeshLab, 2012) Although there is mesh in the name of the program it also has a range of utilities to edit point clouds. MeshLab can open the ply (Polygon File Format) files that are exported by the reconstruction program. In this project the cleaning of the point could was made by hand. Cleaning a point cloud means to delete all points that do not belong to the object. It is possible to automate this process by trying to recognize the walls and the ground and delete these points but this was not done in this project. The images below show the process of the cleaning on the left side and the result on the right side.
Figure 22 After point cloud processing
Honours Degree Project Nikolai Bickel
Page 43 of 69
Figure 23 Point cloud cleaning
26/04/2012
4.7. Meshing The result of the reconstruction and the cleaning is a point cloud. They are a bunch of disconnected points floating near each other in three-dimensional space. When we look closely the image will break down into a bunch of distinct points with space visible between them. “If we wanted to convert these points into a smooth continuous surface we’d need to figure out a way to connect them with a large number of polygons to fill in the gaps. This is a process called "constructing a mesh"” (Borenstein, 2011). To build a mesh the Poisson surface reconstruction (Kazhdan, et al., 2006) implemented into MeshLab is being used. The Poisson algorithm is designed to handle noisy point clouds like ours. The Poisson algorithm has a triangle mesh as a result. An alternative algorithm implemented in Matlab is the Ball Pivoting (Bernardini, et al., 1999). This algorithm uses the points from the point cloud and links them together into triangles. Because these algorithms use the points from noisy data, there are a lot of holes in that mesh. Additionally there are double surfaces when point clouds are not perfectly aligned. The advantage of the Passion algorithm is that it minimizes the creation of holes even if some parts of the surface are missing in the point cloud. This is because the algorithm wrap around the points. The algorithm does not use the points of the point cloud as a vertex. Because of that property of the algorithm produces smooth surface. In order that the Passion algorithm works right it is necessary that every point in the point cloud was assigned normal. These normals can be calculated with Meshlab’s filter called “Compute normal for point sets”. After the Passion surface reconstruction the colour is lost in the mesh. To colorize the mesh there is a Matlab filter called “Vertex attribute transfer”. This filter picks the colour from the nearest point of the point cloud and applies them to the mesh.
Figure 24 Meshing process
Honours Degree Project Nikolai Bickel
Page 44 of 69
26/04/2012
4.8. Measurements There were not a lot of measurements done in this project because of the lack of time. MeshLab has a measuring tool to measure distances.
Figure 25 3D body model with measurement
As you can see in Figure 25 the result of height is 1.71985 m. It turns out that the real height of the test person is 1.72 m (depends if hair counts). This is quite accurate but was just made with one person and is not representative. The measurement tool calculates the distance between two points ( with the following formula.
) in 3D space
â&#x2C6;&#x161; Where one point is
and the second point is
Honours Degree Project Nikolai Bickel
Page 45 of 69
.
26/04/2012
5. Critical Evaluation 5.1. Chapter Summary This chapter mainly contains evaluations and thoughts about the different aspects of the project. It also includes possible improvements and learning outcomes.
5.2. Project Management Evaluation All of the requested documents were submitted timely and also all progress reviews were held in time. The task scheduling with a Gantt chart was done pretty early in the project. As part of the project the problems in the project changed and also the time scheduling changed. A problem at the end of the project was that the planning had not taken into account that at the end of the academic year there is a lot of other work to do. In a planning for another project this factors should take into account. The risk management was an important part at the beginning of the project. Especially in this project it was very good that the risk â&#x20AC;&#x153;Losing information about the project or related informationâ&#x20AC;? was regarded, and backup of relevant data was made. Because you cannot trust that the hard drive lasts the whole project. In this project the hard drive broke in the middle of the project. If there would have been no backup available it could have caused a major impact on the project progress.
5.3. Design and Implementation In a project in the real world normally you also evaluate and consider other depth scanners to take body measurements. But due the limited budget in this project the Kinect was the only device that was affordable. That is one of the points why the Kinect was used in the first place. The other advantages are for example the big community around that device and that a lot of people already have this device in their homes. It is impossible to use the Kinect on a computer without a Kinect interface. That is why it is necessary to choose a Kinect interface to get the relevant data. In this project it was a good idea to use the RGBDemo. It had already implemented a bunch of useful functions to process the information from the Kinect and a reconstruction example to build on. But you have to trust that the information from this additional middleware is right. In this project there was never the suspicion that something does not work correct except the calibration of the depth camera. It is recommendable to store the collected data on the hard drive and then use it to reconstruct the body. For this project it was the right decision but in commercial use it would be more practical to have the reconstruction in real time and without the need of MeshLab to clean the relevant information out of the reconstructed scene. This might be implemented with a depth threshold to exclude the walls in the scene. It was interesting to build a setup to test the accuracy of the Kinect. In case to repeat this test it is advisable to use a bigger sheet.
Honours Degree Project Nikolai Bickel
Page 46 of 69
26/04/2012
The reconstruction worked but not perfect. It depends a lot on the environment to work well. There are a lot of possible improvements in this area. Especially the KinectFusion (Izadi, et al., 2011) implemented in an experimental PCL version should not be unconsidered. If it is ready implemented it could be better than the RGBDemo but in this project it was too early. Through the lack of time the actual body measurements where really basic one. To take more measurements of other body parts it is necessary to improve the reconstruction process and evaluate and find other programmes to measure meshes. It should be noted that the scanned persons should extend the arms in new scans (Shown in Figure 26). It is possible that this has also an effect on the reconstruction and meshing quality.
Figure 26 Model with arms extended (The Districts, 2011)
5.4. Possible Improvements To simplify the reconstruction process it would be a very interesting experiment to use more than one Kinect. Three Kinects are still cheaper than most other depth sensors. It is possible to connect more than one Kinect to a computer. For example you could use three Kinects from three different angles and take just a few (or one) frame and then reconstruct the model from this. This has the advantage that the problem with body movements during the data collection is minimized. If the three Kinects are calibrated and the position of the cameras is known there is no camera position estimation necessary. A disadvantage is that no normal consumer has three Kinects at hand. This improvement could be work for a whole new project but could still use knowledge and experiences of this project. Another possible improvement could be to use the Kinect vertically. Therewith you can go closer to the person scanned and have a more accurate result because the accuracy depends on the distance to the object. The RGBDemo need some modification to use it in this position. It could be useful to use the depth information to estimate the camera position instead of the RGB image information. That has the advantage that the reconstruction is independent from the surrounding environment. Machine learning, a branch of artificial intelligence, is maybe an interesting topic to work with the data generated with that project. Machine learning generates the results by comparing data with samples from a database of known models. Letâ&#x20AC;&#x2122;s imagine a database with 3D body models and each measurement is written down. A new body model is then compared to this database and looks for similar models to generate with the known measurements a result. The difficulty in machine learning is that the input models are too much to be covered by a set of observed examples (training data). Therefore the training data must be generalized. There are a lot of different algorithms to face this problem also known under the term â&#x20AC;&#x153;pattern recognition algorithmsâ&#x20AC;?.
Honours Degree Project Nikolai Bickel
Page 47 of 69
26/04/2012
5.5. Learning Outcomes Certainly you know a lot more about the Kinect and how to handle information from the sensors. Because the Kinect is a depth sensor you also learn more about depth sensors and their abilities. Also you learn how to access information form the sensor and learn about the different interfaces and how you can use and install them. In the accuracy test you learn how to make an experiment and then how to analyse the data. You also know afterwards that there are different possible error sources and if it is possible to correct them. You are also learning mathematical skills when analysing the generated data. The accuracy analysis was programmed in the programming language C++. It is interesting to work with C++ interfaces that are new for example OpenCV. After you are working with a bunch of open source projects you are beginning to know that there is a pattern a lot of open source project are having. For example a lot of them use some sort of code management tools like SVN or git and a lot of the open source have discussion boards when you are having questions. Because the Kinect is not just a depth sensor and also have an in-build RGB camera you learn how to process and analyse images. What feature points are and how they could help you with the task you are working on. When you are looking at code from somewhere else you learn how helpful it is, when there are comments in the code. Especially when looking at reconstruction algorithm from RGBDemo it is very helpful. You also learn how difficult it is to reconstruction non-rigid objects. After this project you know for sure what point clouds and what meshes are and what the difference between them is.
6. Conclusion This report gave an overview about the important and interesting parts of this project to turn a human body into a digital 3D representation. The project proofs that there are a lot of interesting applications beyond the initial propose as a game console input device. It shows how the sensor from the Kinect can be accessed and how the information can be processed. At the moment it contains only a basic measurement. At the end of the project is a 3D body model but this is still not perfectly accurate. The evaluation presents a few important ideas to build on this project. For example use of multiple Kinects or machine learning. The whole process is at the moment not automated and it needs an expert to build the 3D body model. That means in this state it is not ready for commercial use, because for the customer all of the steps made by hand should be done by the computer program. It is definitely possible to build a program to automate this. The technology build in the Kinect is still at the beginning of a very interesting future in this area. Honours Degree Project Nikolai Bickel
Page 48 of 69
26/04/2012
7. References Bay, H., Ess, A., Tuytelaars, T. & Gool, L. V., 2008. SURF: Speeded Up Robust Features. Computer Vision and Image Understanding (CVIU), Volume 110, pp. 346-359. Bernardini, F. et al., 1999. The Ball-Pivoting Algorithm for Surface Reconstruction. IEEE Transactions on Visualization and Computer , 5(4), pp. 349-359. Borenstein, G., 2011. Making Things See. s.l.:O'Reilly Media / Make. Burrus, N., 2012. How rgbd-reconstructor.exe works?. [Online] Available at: https://groups.google.com/d/msg/rgbdemo/fY1d950ZRxc/8QUALhLpv4wJ [Accessed 8 April 2012]. Burrus, N., 2012. nestk. [Online] Available at: https://github.com/nburrus/nestk [Accessed 03 27 2012]. Burrus, N., 2012. rgbdemo. [Online] Available at: https://github.com/nburrus/rgbdemo [Accessed 27 3 2012]. Burrus, N., 2012. RGBDemo License. [Online] Available at: http://labs.manctl.com/rgbdemo/index.php/Main/License [Accessed 27 May 2012]. Dâ&#x20AC;&#x2122;Apuzzo, N., 2009. Hometrica. [Online] Available at: http://www.hometrica.ch/pres/2009_essilor_pres.pdf [Accessed 19 November 2011]. Dâ&#x20AC;&#x2122;Apuzzo, N., 2009. Hometrica. [Online] Available at: http://www.hometrica.ch/pres/2009_essilor_pres.pdf [Accessed 12 April 2012]. Dictionary, F. M.-W., 2012. Histogram - Definition. [Online] Available at: http://www.merriam-webster.com/dictionary/histogram [Accessed 22 April 2012]. Freedman, B., Shpunt, A. & Arieli, Y., 2010. Distance-Varying Illumination and Imaging Techniques for Depth Mapping. s.l. Patent No. US2010/0290698. Izadi, S. et al., 2011. KinectFusion: Real-time 3D Reconstruction and Interaction. Santa Barbara, CA, USA., ACM Symposium on User Interface Software and Technology. Kazhdan, M., Bolitho, M. & Hoppe, H., 2006. Poisson surface reconstruction. s.l., Proceedings of the fourth Eurographics symposium on Geometry processing, pp. 61-70. Kinect for Windows Team, 2012. Starting February 1, 2012: Use the Power of Kinect for Windows to Change the World. [Online] Available at: http://blogs.msdn.com/b/kinectforwindows/archive/2012/01/09/kinect-forHonours Degree Project Nikolai Bickel
Page 49 of 69
26/04/2012
windows-commercial-program-announced.aspx [Accessed 02 04 2012]. Kourosh, K. & Elberink, S. O., 2012. Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications. sensors, II(12), pp. 1437-1454. Lee, J. C., 2011. Windows Drivers for Kinect, Finally!. [Online] Available at: http://procrastineering.blogspot.co.uk/2011/02/windows-drivers-for-kinect.html [Accessed 4 April 2012]. libfreenect, 2011. libfreenect. [Online] Available at: https://github.com/OpenKinect/libfreenect [Accessed 20 November 2011]. MANCTL, 2012. Calibrating your Kinect (OpenNI backend). [Online] Available at: http://labs.manctl.com/rgbdemo/index.php/Documentation/Calibration [Accessed 27 March 2012]. MeshLab, 2012. MeshLab. [Online] Available at: http://meshlab.sourceforge.net/ [Accessed 18 April 2012]. Microsoft, 2012. KinectFusion Project Page. [Online] Available at: http://research.microsoft.com/en-us/projects/surfacerecon/ [Accessed 11 April 2012]. Microsoft, 2012. The Kinect Effect. [Online] Available at: http://www.xbox.com/en-GB/Kinect/Kinect-Effect [Accessed 05 April 2012]. MicrosoftInt, 2011. Introduction to Kinect for Windows. [Online]. Newcombe, R. A. et al., 2011. KinectFusion: Real-Time Dense Surface Mapping and Tracking. Basel, IEEE. Nokia, 2012. Download Qt, the cross-platform application framework. [Online] Available at: http://qt.nokia.com/downloads [Accessed 19 April 2012]. Nvidia, 2011. http://developer.nvidia.com/nvidia-gpu-computing-documentation. [Online] Available at: http://developer.nvidia.com/nvidia-gpu-computing-documentation [Accessed 14 April 2012]. OpenCV, 2012. OpenCV Download. [Online] Available at: http://sourceforge.net/projects/opencvlibrary/files/opencv-win/ [Accessed 28 March 2012]. OpenKinect, 2011. OpenKinect. [Online] Available at: http://openkinect.org/wiki/Main_Page [Accessed 19 November 2011].
Honours Degree Project Nikolai Bickel
Page 50 of 69
26/04/2012
OpenKinect, 2012. OpenKinect. [Online] Available at: http://openkinect.org/wiki/Main_Page [Accessed 03 April 2012]. OpenKinectSrc, 2012. libfreenect. [Online] Available at: https://github.com/OpenKinect/libfreenect [Accessed 4 April 2012]. OpenNI, 2012. Abstract Layered View. [Online] Available at: http://openni.org/Documentation/ProgrammerGuide.html [Accessed 01 April 2012]. Pandya, H., 2011. Microsoft Kinect: Technical Introduction. [Online] Available at: http://entreprene.us/2011/03/09/microsoft-kinect-technicalintroduction/kinect_hacks_introduction/ [Accessed 12 April 2012]. PCL, 2011. An open source implementation of KinectFusion. [Online] Available at: http://pointclouds.org/news/kinectfusion-open-source.html [Accessed 14 April 2012]. Research methods, 2012. Measurement Error. [Online] Available at: http://www.socialresearchmethods.net/kb/measerr.php [Accessed 23 April 2012]. ROS.org, 2010. Depth calculation. [Online] Available at: http://www.ros.org/wiki/kinect_calibration/technical#Depth_calculation [Accessed 03 April 2012]. ROS, 2011. openni_kinect/kinect_accuracy - ROS Wiki. [Online] Available at: http://www.ros.org/wiki/openni_kinect/kinect_accuracy [Accessed 11 April 2012]. Takahashi, D., 2012. Gamesbeat. [Online] Available at: http://venturebeat.com/2012/01/09/xbox-360-surpassed-66m-sold-and-kinecthas-sold-18m-units/ [Accessed 27 03 2012]. Taylor, J. R., 1999. An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements. s.l.:University Science Books. The Districts, 2011. The Districts. [Online] Available at: http://thedistricts.wordpress.com/tag/film-terms/ [Accessed 22 April 2012]. Zalevsky, Z., Shpunt, A., Maizles, A. & Garcia, J., 2007. METHOD AND SYSTEM FOR OBJECT RECONSTRUCTION. Israel, Patent No. WO2007/043036. ZHANG, Z., 1992. Iterative Point Matching for Registration of Free-form Curves. s.l.:s.n.
Honours Degree Project Nikolai Bickel
Page 51 of 69
26/04/2012
8. List of Figures Figure 1 2D, 2.5D and 3D (Dâ&#x20AC;&#x2122;Apuzzo, 2009) ............................................................................ 6 Figure 2 Microsoft Kinect for Xbox 360 (Pandya, 2011) ........................................................ 10 Figure 3 Image from the PrimeSense patent (Zalevsky, et al., 2007) ...................................... 12 Figure 4 Structured light (Freedman, et al., 2010) ................................................................... 13 Figure 5 OpenNI framework architecture (OpenNI, 2012)...................................................... 14 Figure 6 Installed OpenNI driver at the Windows Device Manager ........................................ 15 Figure 7 CMake process ........................................................................................................... 23 Figure 8 Kinect calibration RGB images ................................................................................. 27 Figure 9 RGBDFrameRecorder UML class ............................................................................. 28 Figure 10 color.png example picture........................................................................................ 30 Figure 11 Random and systematic error (Taylor, 1999) .......................................................... 31 Figure 12 Kinect accuracy test setup........................................................................................ 32 Figure 13 Kinect accuracy test example frame ....................................................................... 32 Figure 14 Accuracy measurement with highlighted area ......................................................... 33 Figure 15 Histogram................................................................................................................. 33 Figure 16 Error map of the first frame (60 cm) ........................................................................ 34 Figure 17 six example input frames for the Reconstructor ...................................................... 37 Figure 18 Reconstructed point cloud ....................................................................................... 38 Figure 19 RGBD Reconstructor flowchart............................................................................... 39 Figure 20 Failed reconstruction shown from front and side view............................................ 41 Figure 21 Reconstruction with marker ..................................................................................... 42 Figure 22 After point cloud processing .................................................................................... 43 Figure 23 Point cloud cleaning................................................................................................. 43 Figure 24 Mesing process ........................................................................................................ 44 Figure 25 3D body model with measurement .......................................................................... 45 Figure 26 Model with arms extended (The Districts, 2011) .................................................... 47
Honours Degree Project Nikolai Bickel
Page 52 of 69
26/04/2012
9. Appendix A Department of Computing Degree Project Proposal Name: Nikolai Bickel
Course: Computing
Discussed with (lecturer): Dr. Bogdan Matuszewski, Chris Casey development
1
Size: double
Type:
Previous and Current Modules
2
Object Oriented Methods in Computing (CO3402) Enterprise Application Development (CO3409) Database Driven Web Sites (CO3708) Computer Vision (EL 3105)
Problem Context
There are a few problems when buying clothes online, the most common being that the purchased clothes do not fit. This is exacerbated by the fact that many users don’t know their own size or the size of those they are purchasing for (such as parents who purchase garments for their children). Many people deal with this problem by ordering several sizes of the same clothes and send back the excess.
3
The Problem
For those who buy several sizes it can be a nuisance to return the excess clothes. Additionally if a customer wishes to purchase clothes for a special occasion, they might be reluctant to order online as they might be unsure of the fitting of the clothes ordered.
The online stores also bear the costs associated with this problem, as they usually pay the shipping costs for the returned items, and also deal with several logistical issues along with the costs associated with the resale of the items.
4
Potential Ethical or Legal Issues
none
Honours Degree Project Nikolai Bickel
Page 53 of 69
26/04/2012
5
Specific Objectives
Access information from the sensor (RGBD sensor -> Microsoft Kinect) Isolate the important data Try to get the sizes of body parts Convert measurements into clothing size (S, M, L, XL) Compare program results -> real data
6
The Approach
I want to capture the data of a body of a person using a RGBD sensor. The RGBD sensor I intend to use is the Microsoft Kinect. The Microsoft Kinect takes a RGB picture of a person and also captures depth using an additional sensor. Depth sensors are used to measure the 3rd dimension that is the depth of the object from the camera. The depth information will be very important data to work with. There are better sensors than the Kinect, but it is very cheap.
To get the data from the Kinect it must be connected with a Computer over USB. There are some interfaces to get the required “Kinect data”. The first approach will be to build a desktop application that can handle the “Kinect data” and get the required clothes sizes.
7
Resources
Microsoft Kinect Kinect Interfaces o Kinect for Windows SDK from Microsoft Research (free for research) o OpenKinect (free) A PC to connect the Kinect over USB
Honours Degree Project Nikolai Bickel
Page 54 of 69
26/04/2012
8
Potential Commercial Considerations
8.1
Estimated costs and benefits
Whether the final product can be used commercially depends on how accurately the measurements can be made. At this time I cannot say how accurate the measurements will be. Not every person has a RGBD sensor at home, but maybe in the coming years every webcam have a depth sensor to provide this information.
9
Literature Review
Is the data from the Microsoft Kinect good enough to get exact data of a personâ&#x20AC;&#x2122;s body?
10
References
Jamie, S. et al. 2011. Real-Time Human Pose Recognition in Parts from Single Depth Images. [ONLINE] Available at: http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf. [Accessed 27 September 11].
Christian Plagemann, Varun Ganapathi, Daphne Koller, Sebastian Thrun. 2010. Real-time IdentiďŹ cation and Localization of Body Parts from Depth Images. [ONLINE] Available at: http://www.stanford.edu/~plagem/bib/plagemann10icra.pdf. [Accessed 27 September 11].
eurogamer.net. 2010. Kinect visionary talks tech. [ONLINE] Available at: http://www.eurogamer.net/articles/digitalfoundry-kinect-tech-interview. [Accessed 27 September 11].
Similar project for webcams (without depth information) http://www.upcload.com/ http://www.seventeen.com/fashion/virtual-dressing-room Kinect for Windows Software Development Kit (SDK) beta from Microsoft Research, http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk/ Free, open source libraries that will enable the Kinect to be used with Windows, Linux, and Mac http://openkinect.org/wiki/Main_Page
Honours Degree Project Nikolai Bickel
Page 55 of 69
26/04/2012
10. Appendix B Department of Computing Final Year Project Technical Plan
Name: Bickel Nikolai
Size: double
Mode: ft
Time: 1
Course: Computing
Supervisor: Dr. Bogdan Matuszewski
1 Summary
I want to build a computer application which connects to a RDGB sensor (the Microsoft Kinect). The user of the program should be able to stand in front of the computer and see his body measurements and what clothing size would fit him. That means that the user should also be able to see the body measurements which would be determine the process of getting the clothing sizes.
The challenge of the project is to find the best algorithm to get robust measurements. So I will measure twice and I want to have nearly the same results. When I have robust and accurate measurements it will not be difficult to find the right clothing size. To find the best algorithm I need to test different approaches. There may be accuracy problems due to quality of the data provided by the Kinect.
2
Constraints
Because I donâ&#x20AC;&#x2122;t work with an external partner, I just have the deadlines that were given from the school. Project Deadlines: Proposal: 27-09-2009 Technical Plan: 20-10-2009 Literature Report: 24-11-2009 Project Report: 26-04-2010
Honours Degree Project Nikolai Bickel
Page 56 of 69
26/04/2012
3
Key Problems
As I mentioned before one of the key problems is to ensure the measurements are robust. To ensure that the results are accurate there must be a few components working in tandem. When I am able to get the data via USB I will have a Byte array, so I need to isolate the important data contained within it. Some external resources will help me to find that important data. I am unsure of the data quality from a Microsoft Kinect and that there some use cases which would be too difficult to implement. I may need to write algorithms that compensate for the problem of noisy data. Another potential problem is that I may not be able to change some of the hardware restrictions from in the Microsoft Kinect. The results of the measurements should be presented in an understandable way and the usage of the application should be not too difficult. Because I don’t know the standards of clothing industry I need to invest some time to get information about what standards the textile and clothing industry use. Just with this knowledge I can convert my measurements in representative clothing size (e.g. S, M, L, and XL)
4
Risk Analysis
Risk
Likelihood
Action
Noisy depth data from the Medium Kinect
High
Make the best out of the data I get from the Kinect
Inaccurate data from High external resources (OpenKinect, MS Kinect SDK)
Medium
Try to configure the tools the right way so that the give the best results that are possible
Robustness – the data of 2 measurements don’t match
Medium
High
Try to reduce the error rate as much as possible
The RGBD sensor breaking
High
Low
Buy a new one (will take a week)
Losing information about the project or related information
High
Low
Make backups
Measurement points are too complex to implement
Medium
Medium
Try the best otherwise reduce the measurement points to that which are possible
Scheduling failure, not enough time to complete project
High
Medium
Try to work within a timetable with the help of a Gantt chart
Honours Degree Project Nikolai Bickel
Severity
Page 57 of 69
26/04/2012
5
Options
Middleware o Kinect for Windows SDK from Microsoft Research (Microsoft) o OpenNI Framework (OpenNI organization) o OpenKinect (Open Source)
You can see the architecture of the application in the section “System & Work Outline”.
6
Potential Ethical or Legal Issues
When I want to test my application I cannot just test it with my body measures. I need other subject to test if my application is to be robust as people body sizes will vary, also if someone is thicker or thinner. I will ask some volunteers to test my application but I will not publish any personal details (e.g. pictures) in my reports. I may need to publish some anonymized data for example purpose.
7
System & Work Outline Sensor array
USB My Middleware
Application
Sensor array: Microsoft Kinect Middleware: The middleware provides USB driver and each of the Frameworks have own additional features. As For example they provide image, depth and audio streams or skeleton tranking. As a part of my preparation for the technical plan I will try to install all of the different middleware’s and play around with them. Each of them has pros and cons. At the moment I can’t say which of the products I want to use. I am need more time to test them precisely. As I
Honours Degree Project Nikolai Bickel
Page 58 of 69
26/04/2012
mention in my Gantt chart I want to work with all of them next time and then I will choose one of them.
Kinect for Windows SDK from Microsoft Research (Microsoft) OpenNI Framework (OpenNI organization) OpenKinect (Open Source)
The middlewares are not compatible with each other. My Application: Which programming language I will use, depends on which of the middleware I choose. I may need to search for a wrapper. All of the middleware support Microsoft.NET programming language. I think I will program the application in C# or C++. I can handle both of them and I think in this project the program language is not one of the big problems.
8 8.1
Commercial Analysis Estimated costs and benefits
Factor name
Is this a Estimated cost or a Amount benefit
Description
Kinect for Xbox RGBD sensor cost 360 Image stream Depth stream Audio stream Microsoft Visual benefit Software Studio, Netbeans, Middleware Miscellaneous
Measuring pocket rule
Working Time
Develop and research
Honours Degree Project Nikolai Bickel
tape, cost cost
Page 59 of 69
Estimate of when paid
£100
Before project
the
£0
MSDNAA software and free software
£15
Payable during project
300 – 400 During project working hours
26/04/2012
8.2
Commercial Conclusion
Whether the final product can be used commercially depends on how accurately the measurements can be made. At this time I cannot say if the measurements are accurate. Actually not all people have RGBD sensors at home, but maybe in the coming years every normal computer webcam will provide depth information. At the moment the middleware â&#x20AC;&#x153;Kinect for Windows SDK from Microsoft Researchâ&#x20AC;? is licensed only for non-commercial use. But they will release a licence for commercial use. The beta SDK has been developed to support wide exploration and experimentation by academic and research communities.
Honours Degree Project Nikolai Bickel
Page 60 of 69
26/04/2012
11. Appendix C
Honours Degree Project Nikolai Bickel
Page 61 of 69
26/04/2012
12. Appendix D Build a 3D body model with a single Kinect sensor Nikolai Bickel, BSc (Hons) Computing
Project: Body measurements with the Microsoft Kinect Supervisor: Dr. Bogdan Matuszewski Second Reader: 25. November 2011
Abstract
Depth cameras are not conceptually new, but the Microsoft Kinect has made the sensor popular for researchers and enthusiasts. A 3D body model is beneficial for applications in a lot of different areas. This paper gives an overview about how to build a 3D body model with a single Kinect sensor. It also gives some technical details about the specification and capabilities of the Microsoft Kinect system. Different algorithms to solve the problems in getting a 3D body model with a Kinect will be discussed.
1 1.1
Introduction Context
Kinect becoming an important 3D sensor and that is not because it is the best sensor. It is because of its reliability and the low cost. If you want to build a 3D body model with a Kinect it is important to keep in mind some of the problems which could appear. An important part of working with a technical device is to know the basic behaviour of it. Because of that you also can find some interesting information about the device in this paper. The paper will show some possible ways to treat the problem of building a 3D body model with a single Kinect sensor.
1.2
Overview
Section 2 (Kinect sensor) describes the Microsoft Kinect sensor and its capabilities and some additionally information about the accuracy and the calibration process. Section 3 (Collect data from the Kinect) is an overview about the different options to access the data from the Kinect. The following Section 4 (The object “human body”) contains some error sources when dealing with a human body. In Section 5 (Pre-processing Kinect data) are explanations to process the collected data before it can be used in different approaches which are pointed out in Section 6 (3D - Registration). To get a useable body shape there is a technique called meshing which is discussed in Section 7.
2
Kinect sensor
Borstein (2011) explains in his book “Making things see” what a Kinect does. The difference between a normal camera and a Kinect is that the Kinect additionally collects depth data. That means the Kinect measures the distance to the object that is placed in front of the camera. For a normal person there is no big difference between a normal picture and depth data but for the computer it is not so easy to “see” what it wants to know to differ between them. When a computer analyze a picture it has just the colour of a pixel and it is difficult to separate different objects and people. In a depth image at the other hand the computer have depth information for each pixel and it is easier to find the data that it is looking for because he know how far away the object is from the sensor. A benefit from the depth data is also that you can build a 3D model of what the camera can see. This is important in building a full 3D model of an object.
Honours Degree Project Nikolai Bickel
Page 63 of 69
26/04/2012
Functionality The Kinect sensor has a RGB camera, an IR camera and an IR projector. The IR projector projects irregular patterns to the objects in front of the Kinect. The depth camera creates a depth image by recognizing the alteration in this pattern. The inventors of the Kinect describe the measurements of depth as a triangulation process (Freedman, et al., 2010). Kinect Sensor Sensor item Viewing angle Mechanized tilt range (vertical) Frame rate (depth and color stream) Resolution, depth stream Resolution, color stream (MicrosoftInt, 2011)
Array Specifications Specification range 43° vertical by 57° horizontal field of view ±28° 30 frames per second (FPS) QVGA (320 × 240) VGA (640 × 480)
Accuracy Khoshelham (2011) has analyzed the accuracy of the Microsoft Kinect in the paper “Accuracy analysis of Kinect depth data” and came to the following statement: “The random error of depth measurements increases quadratic with increasing distance to the sensor and reaches 4 cm at the maximum range”. Khoshelham (2011) also comes to the conclusion that for mapping purpose the data should be acquired within 1-3 m distance to the sensor. At the ROS homepage (ROS, 2011) is written: “Because the Kinect is essentially a stereo camera, the expected error on its depth measurements is proportional to the distance squared.”
3
Calibration
In many literature resources they point out that it is important to have a calibrated Microsoft Kinect to get accurate data (Weiss, et al., 2011), (ROS, 2011), (Pajdla, et al., 2011). It depends which middleware are in use to access different methods to calibrate a Microsoft Kinect. There are some explanations and technical descriptions of the calibration process at the ROS Homepage (ROS CA, 2011) (ROS CT, 2011). For the OpenKinect project there is a calibration method at the OpenKinect Wiki (Burrus, 2011) . In the paper “Accurate and Practical Calibration of a Depth and Colour Camera Pair” by (Herrera, et al., 2011) is an explanation of calibrating a depth and colour camera pair.
4
Collect data from the Kinect
The normal purpose of Microsoft Kinect (MicrosoftKin, 2011) is for playing with the Microsoft Xbox console gaming system. But there are some projects that allow us to connect Honours Degree Project Nikolai Bickel
Page 64 of 69
26/04/2012
the Microsoft Kinect sensor to the personal computer. These middleware products provide the USB-driver and interfaces to access the data from the Kinect. The most popular are: OpenKinect Is an open source community around the topic Kinect. They focused to the libfreenect driver (libfreenect, 2011). The most of the driver program code is written in C. Libfreenect is OpenSource (Apache license) and available for Windows, Linux, and Mac. They also provide an encyclopaedia with information around the topic “Kinect” (OpenKinect, 2011).
OpenNI framework The OpenNI framework is published from the OpenNI organization (OpenNI organization, 2011). Companies like PrimSense who provide the 3D sensing technologies for the Kinect are in this organization. All source code of the driver and the sample programmes are available and in C#. Kinect for Windows SDK The Kinect for Windows SDK is published by Microsoft (MicrosoftSDK, 2011). It provides data from the Kinect to developers to build applications in C++, C# or Visual Basic. The source code is not published. At the moment the SDK is only for non-commercial use. Optional: Matlab There are possibilities to combine the OpenNI driver (mexkinect, 2011) and the Microsoft Kinect SDK driver (Dirk-Jan Kroon, 2011) with Matlab. The OpenNI library wrapper functions are almost bug free and have more functionality than the wrapper functions for the Microsoft SDK. But to use the OpenNI library wrapper functions it is necessary to use an older driver from the OpenNI framework and not the latest one. All of the middleware’s providing the raw-data which are needed to build a 3D body model. There are differences in the simplicity of the installation and the connect ability with Matlab.
5
The object “human body”
When working with a body as a scanning object there are some problems are summarized in presentation of D’Apuzzo (D’Apuzzo, 2009): The problem of scanning a body are practical problems (movements, breathing, hairs and eyes) and physical limits (stature, size, and weight). There are also some problems in the scanning process like scanning time, nudity and the problem of the privacy of the collected data. Especially the movements can be a big problem when using the ICP algorithm (shown in Section 5). Allen, Curless & Popović (2003) are writing it in their paper “The human body comes in all shapes and sizes, from ballet dancers to sumo wrestlers.” That means that it is difficult to make general assumptions for the object human body. Honours Degree Project Nikolai Bickel
Page 65 of 69
26/04/2012
6
Pre-processing Kinect data
When you want to collect the data from the Kinect you get all measurement points relative to the Kinect. But you need just the data of the person in front of the Kinect. So you need to get rid of the points that are not belonging to the human body. This process called segmentation (Rabbania, et al., 2011). In the paper “Home 3D Body Scans from Noisy Image and Range Data” Weiss, Hirshberg & Black (2011) explain the segmentation process that way: “We segment the body from the surrounding environment using background subtraction on the depth map. Given a depth map Dbg taken without the subject present and a depth map Df associated with a frame f, we take the foreground to be Dbg − Df > ϵ, where ϵ is a few mm. We then apply a morphological opening operation to remove small isolated false positives.”
The floor is also not important for the 3D body model. To find the floor and delete those points in the point cloud you can use the Kinects on board accelerometer to find the floor. The OpenNI middleware provides a function to find the floor coordinates.
7
3D - Registration
When the data from the Kinect are collected and converted to 3D world coordinates there is still no full 3D body model available. To get a full 3D body model it is necessary to collect data of a person’s body in different angle to combine the data in a full 3D body model. This is required because the Kinect can only collect data in front of the sensor. For instance, when a person stands with the face to the Kinect sensor the sensor cannot see the information’s of the back of this person. That means you need the data of an object, in our case a person in different angles, and match all of these data to one body model together. This process called registration and there is an explanation in the article written by Brown (1992). The problem when working with a Kinect as a sensor is pointed out in the paper “Home 3D Body Scans from Noisy Image and Range Data” written by Weiss, Hirshberg & Black (2011): “To estimate body shape accurately, we must deal with data that is monocular, low resolution, and noisy”. They use a part of the SCAPE model which are developed by Anguelov et al. (2005). Because the SCAPE model is made for shape completion they just use the SCAPE model which factors body shape and pose information. The SCAPE algorithm needs a training database of body shapes to work correctly. When we have a cloud of points from a person in different angels we need to try combining the different point clouds together. A possible algorithm would be the ICP (Iterative closest point). An explanation can found in the book written by ZHANG (1992). There are a lot of implementations in different programming languages accessible over the internet. Further there are a lot of derivations of the ICP method available. Problems could occur when using the ICP algorithm when the data from the Kinect are too noisy or not correctly segmented. Honours Degree Project Nikolai Bickel
Page 66 of 69
26/04/2012
Izadi, et al., (2011) suggest that the “Depth measurements often fluctuate and depth images contain numerous ‘holes’ where no readings were obtained”. In a Kinect image there are holes where the Kinect IR camera can’t “see” because of lightning conditions, reflection, transparency, occlusion, the objects being out of range or objects do not reflecting the infrared. And the Kinect need infrared to work correctly.
8
Meshing
The outcome of the registration process should be a point-cloud. They are a bunch of disconnected points floating near each other in three-dimensional space. When we look closely the image will break down into a bunch of distinct points with space visible between them. “If we wanted to convert these points into a smooth continuous surface we’d need to figure out a way to connect them with a large number of polygons to fill in the gaps. This is a process called "constructing a mesh"” (Borenstein, 2011).
An explanation to generate a mesh in Matlab is available in the article “A Simple Mesh Generator in MATLAB” written by Persson & Strang
9
Conclusion
This paper should be a help in building a 3D body model of a person with a single Kinect. The paper restricted to the object body and a single Kinect. The process to getting a 3d model is approximately the same when not working with a body as object. There could be an improved result when using multiple Kinect systems or have a lot of training data. Overall the process of getting a 3D body model is not easy and it is a big new task to make it automatable and user-friendly.
10 References Allen, B, Curless, B & Popović, Z (2003) The space of human body shapes: reconstruction and parameterization from range scans. SIGGRAPH '03 Anguelov, D, Srinivasan, , Koller, D, Thrun, S, Rodgers, J & Davis, J. (2005) SCAPE: Shape Completion and Animation of People. SIGGRAPH Conference
Honours Degree Project Nikolai Bickel
Page 67 of 69
26/04/2012
Borenstein, G. (2011) Making Things See. O'Reilly Media Brown, LG. (1992) A Survey of Image Registration Techniques. ACM Computing Surveys, vol 24, pp. 325-376. Burrus, N. (2011) Kinect Calibration OpenKinect. http://nicolas.burrus.name/index.php/Research/KinectCalibration (visited Nov. 2011) Dâ&#x20AC;&#x2122;Apuzzo, N. (2009) Hometrica. http://www.hometrica.ch/pres/2009_essilor_pres.pdf (visited Nov. 2011) Daniel Herrera C., Juho, K & Janne, H. (2011) Accurate and Practical Calibration of a Depth. LNCS 6855, vol II, pp. 437â&#x20AC;&#x201D;445.
Dirk-Jan, K. (2011) Kinect Microsoft SDK. http://www.mathworks.com/matlabcentral/fileexchange/33035 (visited Nov. 2011) Freedman, B., Shpunt, A., Machline, M. & Arieli, Y. (2010) Depth mapping using projected patterns. United States, Patent No. US 2010/0118123 Izadi, S, Kim, D, Hilliges, O, Molyneaux, D, Newcombe, R, Kohli, P, Shotton, J, Hodges, S, Freeman, D, Davison, A & Fitzgibbon, A. (2011) KinectFusion: Real-time 3D Reconstruction and Interaction. http://research.microsoft.com/pubs/155416/kinectfusion-uist-comp.pdf (visited Nov. 2011) Khoshelham, K .(2011) Accuracy analysis of kinect depth data. ISPRS libfreenect (2011) libfreenect. https://github.com/OpenKinect/libfreenect (visited Nov. 2011) mexkinect (2011) kinectmex. http://sourceforge.net/projects/kinect-mex/ (visited Nov. 2011) Honours Degree Project Nikolai Bickel
Page 68 of 69
26/04/2012
MicrosoftInt (2011) Introduction to Kinect for Windows, Microsoft. http://www.xbox.com/en-US/kinect (visited Nov. 2011) MicrosoftSDK (2011) Microsoft Kinect SDK. http://kinectforwindows.org/ (visited Nov. 2011) OpenKinect (2011) OpenKinect. http://openkinect.org/wiki/Main_Page (visited Nov. 2011) OpenNI organization (2011) OpenNI. http://openni.org/ (visited Nov. 2011) Persson, P-O & Strang, G. (2004) A Simple Mesh Generator in MATLAB. SIAM Review, vol 46, pp. 329-345. Rabbania, T, van den Heuvelb, FA & Vosselmanc. (2011) Segmentation of point clouds using smoothness constraint. ISPRS Commission V Symposium ROS (2011) ROS (Robot Operating System). http://www.ros.org/wiki/openni_kinect/kinect_accuracy (visited Nov. 2011) ROS CA (2011) ROS. http://www.ros.org/wiki/openni_camera/calibration (visited Nov. 2011) ROS CT (2011) ROS (Robot Operating System). http://www.ros.org/wiki/kinect_calibration/technical (visited Nov. 2011) Pajdla, T. , Smisek, J. & Jancosek, M. (2011) 3D with Kinect. ICCV Weiss, A, Hirshberg, D & Black, M (2011) Home 3D Body Scans from Noisy Image and Range Data. ICCV 2011 ZHANG, Z. (1992) Iterative Point Matching for Registration of Free-form Curves.
Honours Degree Project Nikolai Bickel
Page 69 of 69
26/04/2012