CMPE 537 Computer Vision
Fall 2005 Term Project Proposal
Real Time Hand Pose Estimation Işık Barış Fidaner 2005702532
Problem Since computers were invented, the interaction between human beings and the machines had a long way. In the old days, people were mostly pushing levers and buttons, while watching colored lights and pointers. After computers, more sophisticated input/output devices were developed. Firstly keyboard, then mouse was used for providing a smooth transition layer between human mind and computer processor. For this layer to get even thinner, there are two options. Either the human beings have to start interacting like computers, or vice versa. We prefer the second option. Human beings interact with the world mostly with their hands. It is told that the hands are an extension of the brain. It is known that about a quarter of the motor cortex is devoted to the muscles of the hands. Hand muscles are one of the most important devices of humans to interact with the objects and environment and the first tool to start production processes that make them human.
Figure 1. Homunculus - Areas of motor cortex that control different parts of the body
Işık Barış Fidaner
Başar Uğur 2005702906
Solution Hand movements like hitting, grasping, holding, dropping are our basic interactions with the physical world. We believe, these can also be the basic interaction mechanisms of humans with virtual physical worlds. This possibility depends on the technologies for processing the information extracted from hand movements. Haptic devices are an option, but another option is processing the visual data. We choose the second option, because the widespread use of webcams make this option more and more portable. But this is a computationally harder option. Real time recognition of hand movements requires a sequence of processes. Firstly, the image data is momentarily captured from the webcam. Second step is processing the image, so that the noise is removed and image is enhanced. Third step is edge detection and finding certain areas on the hand. To make this step easier, a special colored glove is going to be used. The glove is going to be composed of six colors, three on one side, three on the other side of the hand. In this stage, the computer knows the visible areas on the hand, and the situation between them. Then the fourth step is using this information to estimate the behavior of the hand. After estimating the behavior and validating the movement of the hand, it is actuated by a graphical 3D model of it. That is, the graphics environment will try to act as a mirror at first glance. But this system must be entirely tested before moving on. Successfully obtaining the mirror illusion, next step will be putting the hand in a virtual physical world. If nothing goes wrong, finally the user will be able to experience the interaction with a physical environment. Başar Uğur
CMPE 537 Computer Vision
Fall 2005 Realizing the Model
Capturing & Processing We are going to use a single webcam and an interface, like DirectShow or Video for Windows for real time image capturing. Image is going to be processed using filters for removing the noise and enhancing the colors we are going to use for seeing the hand. Area recognition Processed image is going to be filtered to detect the edges and finding areas that are one of the special six colors of the glove. These six colors define following areas on the hand: The inside and outside of hand palm, inside and outsice face of the thumb, and the inside and outside face of remaining fingers. These areas are selected for minimizing the area count and maximising the valuable information that is possible to gain from these areas.
A pre-designed model of the hand will be used. The information we get from shape recognition will be transformed to position and orientation of this model in the 3D world. This will be different for mirroring and interaction scenarios. For mirroring, the model will be realized as the mirror image of the user’s hand with respect to the computer screen. For physical interaction, the model will be realized in parallel to the real-world situation of the user’s hand. Physical world Virtual physical world is a graphical 3D construction of a world with applied Newton laws. Values of friction, connection and attractive forces among every object have been determined beforehand so as to apply these laws on them according to their time dependent values. By putting the hand as an object with those values determined, the user will experience an interaction with this world. Of course it will be different from feeling this world at the same time, but it sure will give a sense of visual interaction. References [1] Keskin C., Erkan A., Akarun L., “Real Time Hand Tracking and 3D Gesture Recognition For Interactive Interfaces Using HMM”, Computer Engineering Dept., Boğaziçi University.
Figure 2. Colored areas of the glove
Shape recognition After the shape and placement of the areas are determined, the 3D shape of the hand must be estimated. Of course, this cannot be an excellent estimate. But it is sufficient for our purpose to understand a reduced set of hand shapes. Examples: Is the hand open or closed? How much is the hand closed? Is it holding or dropping the object, or maybe it is throwing it?
Işık Barış Fidaner
[2] Horimoto S., Arita D., Taniguchi R., “RealTime Hand Shape Recognition for Human Interface”, Kyushu University. [3] Wlczek, P., Maccato, A., de Figueiredo, R.J.P., “Pose Estimation Of 3-Dimensional Objects From Single Camera Images”, DSP(5), No. 3, July 1995, pp. 176-183.
Başar Uğur