Parametric Modeling Using Natural Human Gestures

Page 1

This master’s project, completed by Ziad Ewais, and entitled

Parametric Modeling Using Natural Human Gestures has been approved in respect to its intellectual content and contributions.

Prof. Ramesh Krishnamurti (advisor) Chair of the Computational Design Program, Carnegie Mellon University – School of Architecture


Parametric Modeling Using Natural Human Gestures

A Master’s Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computational Design Carnegie Mellon University - School of Architecture 2012


In the memory of my late grandfather HAMED EWAIS The father, Artist, and teacher May Allah forgive his sins


Acknowledgement I want to thank the following for all of their guidance, support, and encouragement. Without you, this thesis would have never been accomplished. Prof. Ramesh Krishnamurti, for his advising and guidance academically and professionally Prof. Ossama Abdo, for his help with my technical writing and presentation Waleed Ammar, for all his technical help and encouragement Andrew Payne, for all his technical advice and answering all of my dreadful questions Madeline Gannon, for opening my mind to new thoughts and ideas regarding my research I would also send a very special thanks to my Father, Prof. Hazem Ewais, for simply everything that he gave to me and everything coming in the future. Also a huge hug to my Mother, Magda, who was praying for me day and night, and My Brother, Amr, for his emotional support and encouragement. And last but never least, my love of my life, the sun of my skies, and the moon of my darkest nights, Niveen and our lovely rose of our lives, Rosabella. Thank you all.


List of Contents 0

Abstract

1

1

Introduction

2

1.1

The mechanical and electronic history of computer digital input

2

1.2

Welcome to the device-less era of digital input

5

1.3

NUI design guidelines

6

1.4

Technology examples developed for NUI

7

1.5 1.5.1

Kinect and what does it offer

The Kinect sensor dissected

10 10

1.5.2

The data received by the sensor and how the SDK interpretation of it

11

1.6

Speech recognition guidelines

14

1.6.1

How does speech recognition work?

15

1.6.2

Speech recognition grammar guidelines

15

2

Case Studies

16

2.1

Sketchpad

16

2.2

SixthSense: Integrated information with the real world

17

2.3

3D Model Virtual Dressing Room Using Kinect

18

2.4

3D modeling in free-space using Kinect and Arduino gloves

19

2.5

Lockheed Martin’s CHIL Project

20

2.6

MASTER-PIECE: A Multimodal (Gesture + Speech) Interface for 3D Model Search and Retrieval Integrated in a Virtual Assembly Application

21

2.7

l’Artisan Électronique - Virtual pottery wheel

23

3

Contribution

25

3.1

Research introduction

25

3.2

Skeleton tracking

26

3.2.1

Skeleton tracking basics

26

3.2.2

Kinect Skeleton tracking and grasshopper

27

3.3

Speech recognition in Grasshopper

30

3.4

Virtual Pottery Wheel Demo

33

3.4.1

Demo introduction

33

3.4.2

Hand gestures for pottery making

33

3.4.3

Developing the example in Grasshopper

35


3.4.4

•

Pseudo code and how to use the example

37

3.4.5

•

Scenes and screenshots from the example

39

4

Conclusion

43

5

References

45

6

Appendix

48


List of figures 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 2.12 2.13 2.14 2.15 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08

An example of a punch card. Early IBM card punching machine First computer mouse invented by Douglas Engelbart in 1963 An example of a trackball mouse 3d mouse usage diagram Tablet and Stylus for PC’s – Commercially known as Bamboo HP Compaq tablet PC with rotating/removable keyboard Head mount display Data glove Perspective Pixel latest product the 82” interactive screen A user interacting with Microsoft Surface Kinect application for surgeons during operations The Kinect Sensor The Kinect sensor from the inside and it’s components The Kinect field of view A diagram illustrating the workflow from the sensor to the application The Kinect SDK architecture The depth stream byte description An example of both color and depth streams An example of a player tracked in the depth stream The SDK Skelton point definition chart Sutherland using Sketchpad Drawing on the computer screen using a light pen Shape recognition with Sketchpad Pranav Mistry demonstrating SixthSense project Demonstrating how to draw on wall using the projector A phone dial pad displayed on a palm of the users hand. The notion of a live interactive newspaper using SixthSense. The user trying out a dress using the virtual wardrobe The 3D Space modeler with Kinect and Arduino gloves Lockheed Martin’s CHIL Lab in session Virtual Assembly application Hand and head motion detection Shape recognition by determining tracked points on a polar angle scale The virtual pottery which consists of a translucent screen and a laser scanner The virtual pottery scanner and 3D clay printer The input process from the user until reaching the modeling stage in Rhino The depth stream from Kinect without the user has been tracked yet. The depth stream after a player has been tracked

A visual description of the skeleton component A skeleton component used with input and output The skeleton representation by points and bones in Rhino The unique gesture the user has to preform to activate the speech recognition component Grasshopper speech recognition component working with multi-threading processing

2 2 3 3 3 4 4 4 4 8 9 9 10 10 11 11 12 12 13 13 14 16 16 17 17 17 18 18 19 20 21 22 22 23 24 24 26 27 27 28 29 29 31 31


3.09 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20

Grasshopper speech recognition component The physical press-in gesture which is called “Centering” The physical press-in gesture called “overleaf” The physical hollowing gesture The physical gesture to elongate and slim a mold A representation of a Direct Acyclic graph (DAG) Diagram describing how to overcome the DAG that is enforced by Grasshopper in this example The virtual pottery wheel during modeling a vase Close up on the modeling the pottery The user editing the vase profile in edit mode Another model produced from the virtual pottery wheel The free-form revolved form created after the user defined a profile

32 33 34 34 35 36 37 39 39 40 40 42


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

Abstract Our everyday technology is becoming more adaptive and convenient for the usage of a human being. From touchscreen to voice controlled commands, technology is becoming easy for anyone to handle. However, architecture has not reached this level of usability to whoever is involved. Yes, Computer-Aided Architectural Design (CAAD) has a lot of breakthroughs to enhance the computational power and architecture, but the simplicity of recent technologies is still missing from the field of architectural design. CAAD has helped a lot opening minds to different thinking about architectural designing, but the tools need to be learnt first then become a professional user. This research is done to investigate the possibility of introducing a fully human gesture based parametric modeling tool that can be used by any professional and non-professional user. The intention is to involve the human’s natural gesture behavior more in the modeling process instead of inputting predefined commands using traditional input peripherals. This natural human gesture includes hand gestures, body movements, voice commands, etc… Applying this all to a 3D modeling application is tricky but doable. I will be describing the process involved from input, processing and output of the data and how to draw, model and edit in a 3D working space using only your hands and voice. The research goal of this project is to study the common human body gestures and language that are used regularly and naturally by any person and how can these actions be used to 3D model in virtual spaces. Body gestures like hand waving, hand twisting, pushing, pulling, etc…, and voice commands that are naturally spoken like draw, zoom, move, rotate, etc… This project studies the best way of utilizing such gestures and what human factor are involved in the variation of gestures used. Also this research focuses on how to develop such language to produce a complete framework that only depends on human interaction in space. This research project is targeted to be implemented in an existing and widely used 3D modeling application (Rhinoceros “Rhino” 1). I will be creating a human gesture based framework depending on the notion of Natural User Interface 2 that will be implemented in Rhino 3D, which will act as the 3D modeling space. For the capturing of the human natural gestures I will be using the Microsoft 3, Microsoft Kinect SDK V1.0, and Speech Application Programming Interface. 1

Rhinoceros (Rhino) is a stand-alone, commercial NURBS-based 3-D modeling tool, developed by Robert McNeel & Associates. The software is commonly used for industrial design, architecture, marine design, jewelry design, automotive design, CAD / CAM, rapid prototyping, reverse engineering as well as the multimedia and graphic design industries(Robert McNeel & Associates, 2012) 2 Natural user interface, or NUI, or Natural Interface is the common parlance used by designers and developers of human-machine interfaces to refer to a user interface that is effectively invisible, or becomes invisible with successive learned interactions. (Microsoft, 2012) 3 Kinect is a motion sensing input device by Microsoft for the Xbox 360 video game console and Windows PCs. (Microsoft, 2012)

1


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

Introduction

1

“All things will be produced in superior quantity and quality, and with greater ease, when each man works at a single occupation, in accordance with his natural gifts, and at the right moment, without meddling with anything else� (Plato)

1.1 The mechanical and electronic history of computer digital input In the decades since the first digital computers were programmed using mechanical switches and plug boards, computing and the ways in which people interface with computers have evolved significantly. Some aspects of this evolution have both been anticipated and withstood the test of time. (Wigdor et al. 2011) Computers have evolved in a very rapid fashion in the usability area of human interaction with computers. The way the user can input data into a computer system and obtain results from it has evolved so much. Digitally inputting data returns back to the early 1950’s when IBM introduced the IBM 700 series, and the only medium of inputting data was via punch cards 4 (Fig. 1.01). Where a person uses a card punching machine to enter the requested data, a holes punched card is produced that represents the data entered in a binary digital format (Fig. 1.02). This card is slipped into the computer and the results are printed out for the user.

Fig. 1.01 An example of a punched card. (wikipedia.com)

Fig. 1.02 Early IBM card punching machine. (wikipedia.com)

The evolvement continued for easier and more convenient ways for data input to a computer system. The evolvement continues by the invention of the computer keyboard, but the real breakthrough came after the invention of the computer mouse 5. This device can now translate the hand movement of the user controlling the mouse to a point on the output screen. This was the first time a user is controlling a computer device without the usage of characters to

4

A punched card, punch card, IBM card, or Hollerith card is a piece of stiff paper that contains digital information represented by the presence or absence of holes in predefined positions. 5 A mouse is a pointing device that functions by detecting two-dimensional motion relative to its supporting surface.

2


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

input the users data. The mouse was first introduced in the early 1960’s and has been used since with the variation of the mechanism used (Fig. 1.03). The mouse concept has also evolved through the years for more fast and easy way of human controlling and digital computer input. Then the trackball mouse 6 was introduced, which is widely used by artists that are involved with computer arts. It is simply a traditional mouse upside down (Fig. 1.04). So the user instead of moving around the whole device, he only uses his palm of his hand to direct the tracker ball to the location desired. Until this point in time, the mouse input devise was constrained in the ability to obtain information for 2 dimensions only.

Fig. 1.03 First computer mouse invented by Douglas Engelbart in 1963. (wikipedia.com)

Fig. 1.04 An example of a trackball mouse. (kingston.com)

With the introduction of new 3D tools and applications that required more sophisticated way to navigate in space the 3D mouse 7 was invented. A device which can work in a third dimension in space so the third dimension of space data could be captured. This device gave more freedom and realness to 3D modelers as they are true moving and modeling in 3D space. It is described as holding your 3D virtual model with your own hands (Fig. 1.05).

Fig. 1.05 3D mouse usage diagram. (3Dconnexion.com) 6

A trackball is a pointing device consisting of a ball held by a socket containing sensors to detect a rotation of the ball about two axes—like an upside-down mouse with an exposed protruding ball. 7 3D mouse is also known as bats, flying mice, or wands, these devices generally function through ultrasound and provide at least three degrees of freedom.

3


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

Other input devices were invented to give more freedom to the user to use his hands in a natural way. One of these devices is the tablet and stylus, or as formally known by the digitizer (Fig. 1.06). This device is mainly a digital surface pad and a pen stylus used by the user to preform hand written / drawn like task and digitally input it to a computer system. This device is used widely by artist and engineers as it represents the traditional pen and paper effect but the data is directly entered to the system. Other variation of this technology is the tablet PC, where the user directly uses a digital pen on a screen of a computer (Fig. 1.07).

Fig. 1.06 Tablet and Stylus for PC’s – Commercially known as Bamboo. (wacom.com)

Fig. 1.07 HP Compaq tablet PC with rotating/removable keyboard. (hp.com)

As the technology evolved, the need for user immersion in the virtual digital world became imminent. Technologies such as the virtual cave systems, the head mounted displays (HMD) 8 and computer gloves gave the opportunity to turn the user himself into an input device (Fig. 1.08). So instead of sitting down facing your screen, now you are walking and moving your arms around, twisting and tilting your head to look at, using your hands to pick up and touch items, and using your fingers to point and click in a 3D virtual environment (Fig. 1.09).

Fig. 1.08 Head mount display. (wikipedia.com)

Fig 1.09 Data glove (wikipedia.com)

8

A head-mounted display or helmet mounted display, both abbreviated HMD, is a display device, worn on the head or as part of a helmet, that has a small display optic in front of one (monocular HMD) or each eye (binocular HMD)

4


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

1.2 Welcome to the device-less era of digital input All of the previous technologies had one thing in common; all of them had either a mechanical or electronic device used for data input. As technologies are rising, so is the concept to human computer interaction is rising as well in the way the user can command or input data to a computer system. More studies are done on the tracking of human parts of the body like the hands, head and eyes to give the sense of immersion in the computer world without using or wearing any undesirable devices or gadgets. (Lu, 2011) From this sense of creating user input without assistance from any mechanical or electronic device emerges what is known now as Natural user interface (NUI). A Natural User Interface can be created from any technologies that use direct input from natural human gesture like voice commands, space body gesture, body movement, and so on (Wigdor et al., 2011). For this notion of device-less data input to work there must be a receiving unit or peripheral that will receive the input data and transmit it to the computer system. Such peripherals as RGB cameras, Infrared (IR) Depth Streaming Cameras, surrounding microphone arrays, motion sensing systems, etc… that will capture whatever natural commands / gestures the user is applying for input. Any user now can interact with any system that is built upon the previous concept. In order for this concept to work there has to be a set of rules and commands that will build a natural gestural language which is the medium that both parties, the user and the computer, understand and interact upon. This language has to be built based on the human’s natural perception of interaction with another human, not a machine. Because if this was built upon the interaction from a human to a machine, that will be artificial due to nature of the artificial intelligence of the machine receiving the commands. This language should be as easy as interacting with the physical world. Instead of using a keyboard and a mouse, NUI allows us to speak to machines; stroke their surfaces, gesture at them in the air, and so on. The naturalness refers to the way they exploit the everyday skills we have learned, such as talking writing, gesturing, walking, and picking up objects. (Rogers et al., 2011) “I believe we will look back on 2010 as the year we expanded beyond the mouse and keyboard and started incorporating more natural forms of interaction such as touch, speech, gestures, handwriting, and vision – what computer scientists call the ‘NUI’ or Natural user interface.” (Ballmer, 2010) NUI is not only for healthy people, it is also used for people with limited mobility and interaction, as small gestures, sounds, and movements are much easier than typing in commands on a keyboard or using a mouse to work with a Graphical User Interface (GUI) 9. (Microsoft, 2011) Difficulties and impairments are factors that reduce computer use among some individuals and a large audience for accessible technology exists among today’s existing computer users. 9

Graphical User Interface (GUI) is a type of user interface that allows users to interact with electronic devices with images rather than text commands.

5


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

Direct use of the hand as an input device is an attractive method of proving natural human computer interaction (HCI) so that computer use among those with impairments and difficulties will rise. (Srinivasan, 2007)

1.3 NUI design guidelines When it comes to NUI design for users, the best and most effective approach is that less is more and simple is better. The best NUI design is the design that is built upon small preformed or spoken gestures or commands. This is the main achievement of using NUI instead of GUI, which is the usage of the simplicity of natural human interactions. (Wigdor el al., 2011) Here are some guidelines various NUI designers are recommending: •

• •

• • •

• • • •

Design for human interaction not for a cursor on a screen: design your interface by targeting the human gestures and where it might come from like arms, hands, voice, etc… (Saffer, 2011) Design for physiology and kinesiology: Design your NUI application targeting the physiology of the user not how good he can perform the physical gestures. (Saffer, 2011) Humans have physical limitations: Design the application with bearing in your design thoughts that a person can’t keep doing hard challenging human movements, nor he can stay still in a single gesture position for more than a couple of seconds. (Saffer, 2011) Design for variability: Not all humans are alike, so the design should be compatible to all kinds of people from small, big, tall, and short, etc…(Saffer, 2011) Design for variety: Design the interface to have a couple of varied gestures. (Saffer, 2011) Screen coverage: Design your interface with regards to the screen size and aspect ratio you might present on and the user interact with. You can’t ask the user to go and point at something he can’t see. (Saffer, 2011) Know the technology: Research and learn all you could learn about the technology you are intending to use as the receptor for the human interaction. Know the limitations of the technology so you don’t design interactions or gesture that can’t be detected by your reception technology. (Saffer, 2011) Less is more and simple is better: the more complex the gesture is the fewer people will be attracted to use the application. (Saffer, 2011) Attract your user: Start with simple doable gesture to attract the user to continue working with the application. (Saffer, 2011) Match gesture complexity with the frequency of usage: For frequently used gesture the gesture must be as simple as possible. (Saffer, 2011) Design for joy not for performance: Unlike GUI experiences that focus on and privilege accomplishment and task completion, NUI experiences focus on the joy of doing. NUI experiences should be like an ocean voyage, the pleasure comes from the interaction, not the accomplishment. (Hinman, 2011) 6


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

Principle of Scaffolding: Successful natural user interfaces feel intuitive and joyful to use. Information objects in a NUI behave in a manner that users intuitively expect. Unlike a successful GUI in which many options and commands are presented all at once and depicted with very subtle hierarchy and visual emphasis, a successful NUI contains fewer options with interaction scaffolding. Scaffolding is strong cue or guide that sets a user’s expectation by giving them an indication of how the interaction will unfold. Good NUIs supports users as they engage with the system and unfold or reveal themselves through actions in a natural way as if he is not using a computer system. (Hinman, 2011) Principle of Contextual Environments: One of the great things about natural user interfaces is that they are dynamic and can locate themselves in space and time. Unlike GUIs that will present a user with the same set of options regardless of the context it is in, NUIs are responsive to the environment and suggests what the next interaction should be. (Hinman, 2011) Principle of Social Interaction: Unlike GUIs that are highly visual and often require a great deal of cognitive focus to use, NUIs are simpler and require less cognitive investment. Instead of getting lost in a labyrinth of menu options, menus on NUIs are streamlined, enabling more opportunities for users to engage and interact with other users instead of the system’s interface. (Hinman, 2011) Principle of Spatial Representation: Unlike GUI systems, where an icon serves as visual representation of information, NUIs represent information as objects. In the world of successful natural user interfaces, a portion of an object often stands for the object itself. NUI objects are intelligent and have auras. (Hinman, 2011) Principle of Seamlessness: GUIs require a keyboard and mouse for interaction with a computing systems. Touchscreens, sensors embedded in hardware, and the use of gestural UIs allow NUI interactions to feel seamless for users because interactions are direct. There are fewer barriers between the user and information. (Hinman, 2011)

1.4 Technology examples developed for NUI In the last decade a couple of software and computer systems companies became more interested in developing technologies that are NUI compatible. Technologies like touch screen, cameras, and IR sensors have been developed to work as full input peripherals of human computer interaction. Products of this technology differ in size, use, and interactivity. Also some products are developed for single users as smart phones and palm size tablets, and some are developed so social interactivity between users like Microsoft surface and Kinect for Xbox 360.

7


Parametric Modeling Using Natural Human Gestures

•

Chapter 1: Introduction

Perspective Pixel: One of the early examples is the work done by Jefferson Han on multi-touch interfaces. In a demonstration at TED in 2006, he showed a variety of means of interacting with on-screen content using both direct manipulations and gestures. For example, to shape an on-screen glutinous mass, Jeff literally 'pinches' and prods and pokes it with his fingers (Fig. 1.10). In a GUI interface for a design application for example, a user would use the metaphor of 'tools' to do this, for example, selecting a prod tool, or selecting two parts of the mass that they then wanted to apply a 'pinch' action to. Han showed that user interaction could be much more intuitive by doing away with the interaction devices that we are used to and replacing them with a screen that was capable of detecting a much wider range of human actions and gestures. Of course, this allows only for a very limited set of interactions which map neatly onto physical manipulation (RBI 10). Extending the capabilities of the software beyond physical actions requires significantly more design work. (wikipedia.com, 2012)

Fig 1.10 Perspective Pixel latest product the 82� interactive screen 10

Reality-Based interaction: real world interaction by allowing users to directly manipulate objects rather than instructing the computer to do so by typing commands

8


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

Microsoft Surface: The Microsoft Surface platform brings people together to connect, learn, and decide. It enables experiences that change the way people collaborate and connect with a 360-degree interface (Fig. 1.11). And, with PixelSense11, Microsoft Surface sees and responds to touch and real world objects— supporting more than 50 simultaneous inputs. (Microsoft.com, 2011)

Fig. 1.11 A user interacting with Microsoft Surface

Microsoft Kinect for Xbox 360 12 / Windows: The Kinect’s beginning returns to the year 2007 when Microsoft developers were working on a project called “Project Natal”. This project was aiming to create a camera that can read the users gestures to control the Xbox dashboard. This has developed to become the Kinect Fig. 1.12 Kinect application for surgeons during operations sensor for Xbox 360 as we know it. Kinect has sold more than 8 million units by the end of 2011 (cnet.com). Developers began hacking the sensor to open the ability of home brew application via windows and Mac OS.

11

PixelSense allows a display to recognize fingers, hands, and objects placed on the screen, enabling vision-based interaction without the use of cameras. The individual pixels in the display see what's touching the screen and that information is immediately processed and interpreted. 12 Xbox 360 is a very popular video game console from Microsoft, introduced in 2001 with a Pentium III CPU, 5x DVD, 20GB hard disk and custom graphics processor; the Xbox was designed to compete with Sony's PlayStation and Nintendo's GameCube.

9


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

In the early 2011 Microsoft produced the official Kinect SDK for windows in it beta testing phase. In February 2012 the first full version of the SDK was released to be timed with the releasing of the new Kinect sensor developed specially for windows. This research depend main on the Kinect sensor as the primary receptor of the human gestures that will be transmitted to the NUI developed inside of Rhino, enabling Rhino to be a full developed NUI 3D modeling application (Fig. 1.12).

1.5 Kinect and what does it offer As mentioned earlier, the main sensing peripheral that will be used as NUI input is the Microsoft Kinect Sensor. In this part I will describe what is the Kinect sensor, how does it work, what kind of data does the computer system receive, what is offered by the Kinect SDK, and how all of this can be translated in creating a NUI based 3D modeling application. 1.5.1 The Kinect sensor dissected The sensor is a black horizontally elongated shape sensor (Fig. 1.13). The sensor is divided in two parts, the head and the base. The head dimensions are 12 inches in width by 2.5 inches in depth by 1.5 inches in height. The head contains one RGB camera, one IR emitter, and one IR depth receiving camera. Also the head contains a microphone array consisting of 4 microphones and one cooling fan. While the base contains a rotating motor that rotates the sensor on It X-axis. This motor adjusts the pitching angle of the sensor (Fig. 1.14). (Webb el al., 2012)

Fig. 1.13 The Kinect Sensor

Fig. 1.14 The Kinect sensor from the inside and it’s components

The Kinect sensor contains an IR emitter that works with 808 and 880 nm being common commercial “IR” laser diode wavelengths. The IR depth camera has a resolution of 320 * 240 pixels, and the RGB camera is a standard VGA 640 * 480 pixels camera. Both cameras have a maximum speed of 60 Frames per Second (fps), but they usually work best at 30 fps. The IR 10


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

depth camera only recognizes the received IR data in the ranges between 0.8m and 4.0m 13. The field of view for both cameras are between -21.5˚ and 21.5˚ vertically, and between -28.5˚ and 28.5˚ horizontally. The pitch motor can look up an extra 27˚ to look down an extra -27˚ (Fig. 1.15). (Webb el al., 2012)

Fig. 1.15 The Kinect field of view (Webb el al., 2012)

The minimum hardware required to work with the Kinect sensor on a windows PC is as follows: • • •

Computer with dual-core, 2.66 GHz. Windows 7 and a graphics card that supports Microsoft DirectX 9.0c 14. 2 GB of RAM of memory

1.5.2 The data received by the sensor and how the SDK interpretation of it. As described above, the sensor has 3 different receptors, 2 cameras and an array of audio microphones. There are 3 main data streams received by the sensor, a video stream, a depth stream and an audio stream (Fig. 1.16).

Fig. 1.16 A diagram illustrating the workflow from the sensor to the application (Microsoft.com) 13

In the new Kinect for Windows sensor the “near” mode can receive data from 0.0m to 4.0m. (Microsoft.com) Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia 14

11


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

Fig. 1.17 The Kinect SDK architecture (Microsoft.com)

Color Stream: All the streams are in fact an array of bytes with different characteristics. The stream format determines the pixel format and therefore the meaning of the bytes. For the color image the format is colorImageFormat.RgbResulotion640x480Fps30, the pixel format is Bgr32. This means that there are 32 bits (4 bytes) per pixel. The first byte is the blue channel value, the second is the green channel value, and the third is the red channel value. The fourth byte is only used with the Brga32 format, as the fourth byte is the alpha value of the stream. This image size is 640 * 480 pixels; this means that the byte array will have 122880 bytes (640 * 480 * 4) (Fig. 1.19). (Webb el al., 2012) Depth Stream: The depth stream byte array differs from the color stream byte array. The depth stream pixels are 4 bytes per pixel, the bits from 3 to 15 are for the depth information, and bits from 0 to 2 are the player index information. The player index is the index assigned to the user currently active with the sensor. The Kinect sensor can detect up to 6 users at the same time. When there is no user active or the tracking function is disabled, the player index is 0 (Fig. 1.20).

Fig. 1.18 The depth stream byte description (Webb el al., 2012)

To obtain the depth information (in mm) from the depth stream byte we have to bit shift the bits right to remove the player index bits. The following pseudo code is how to determine the depth data using the SDK. // First find the pixel index to determine which byte from the byte array // Second bit shift the byte by the number of the player index bitmask width int pixelIndex = pixel.X + (pixel.Y * frame.Width); int depth = pixelData[pixelIndex] >> DepthImageFrame.PlayerIndexBitMasksWidth;

12


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

Fig. 1.19 An example of both color and depth streams

User Tracking: Once the depth stream is running, the user can either activate the track player function or not. If the user chooses to track players to obtain their position in the space, the SDK is smart enough to track up to 6 players as mentioned before and place their index in the first three bits of the depth stream byte per pixel. The SDK recognizes the player not from the user’s whole body, but from the user’s limbs positions in space. If an object is in space and can satisfy more than half of the recognizable limbs in space, the SDK marks such object as a user and assigns him an index in the depth stream byte per pixel. (Webb el al., 2012)

Fig. 1.20 An example of a player tracked in the depth stream

Skeleton Detection: Determining the position of a user is called “Skeleton Tracking”. The user is defined from his limbs not as a whole body (Fig. 1.21). First the SDK recognizes the furthest limb parts as the head, hands, and feet. Than it calculates the distance between them and the next near limb part which are the shoulder center, both wrists and both ankles. It works from outside until it reaches the center part which is the hip center point. Once it confirms that this object is a player it places its index at the first three bits in the depth stream byte. To do so the SDK bit shift left by 7 to place the index. Sometimes some parts are not appearing in the frame, the SDK estimates its position from the relative near limbs to that invisible part.

13


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

Fig. 1.21The SDK Skelton point definition chart (Microsoft.com)

Audio Capturing and Speech Recognition: The microphone array is the hidden gem of the Kinect sensor. The array is made up of four separate microphones spread out linearly at the bottom of the Kinect. By comparing when each microphone captures the same audio signal, the microphone array can be used to determine the direction from which the signal is coming. The technique can also be used to make the microphone array pay more attention to sound from one particular direction rather than another. Finally, algorithms can be applied to the audio streams captured from the microphone array in order to perform complex sound dampening effects to remove irrelevant background noise. (Webb el al., 2012) When the Kinect SDK is installed, the components required for speech recognition is installed automatically. The Kinect microphone array works on top of preexisting code libraries that have been around since Windows Vista. These preexisting components include the Voice Capture DirectX Media Object (DMO) and the Speech Recognition API (SAPI). (Webb el al., 2012) Speech recognition is broken down into two different categories: recognition of commands and recognition of free-form dictation. Free-form dictation requires that one train software to recognize a particular voice in order to improve accuracy. This is done by having speakers repeat a series of scripts out loud so the software comes to recognize the speaker’s particular vocal patterns. Command recognition applies another strategy to improve accuracy. Rather than attempt to recognize anything a speaker might say, command recognition constrains the vocabulary that it expects any given speaker to vocalize. Based on a limited set of exceptions, command 14


Parametric Modeling Using Natural Human Gestures

Chapter 1: Introduction

recognition is able to formulate hypotheses about what a speaker is trying to say without having to be familiar with speaker ahead of time. (Webb el al., 2012)

1.6 Speech recognition guidelines 1.6.1 How does speech recognition work? When a person speaks, vibrations are created. The speech recognition technology converts these vibrations, i.e. analog signals into a digital form by means of an analog-to-digital converter (ADC). Digitization of sound takes place by its measurement at regular intervals. The sound is filtered into different frequency bands and normalized, so that it attains a constant volume level. It is checked whether the sound matches with the already stored sound templates. The next step in the speech recognition procedure, is dividing the analog signals into segments that range from a few hundredths to thousands of a second. These segments are matched with phonemes that are already stored in the system. Phonemes are specific sounds that are understood by people speaking a particular language. The statistical modeling systems, which use mathematical systems and probability, play an important role in today's speech recognition systems. These systems are used to determine or predict the outcome after a particular phoneme. It becomes easier to predict where a particular word begins and ends. The Hidden Markov model and Neural Networks are the two statistical modeling systems, out which the former is the one commonly used. The outcome after a particular word in a sentence depends upon the vocabulary of the speech recognition system. It is difficult even for a computer to determine the possible outcome after a particular phoneme, due to the sheer number of words in a language. Thus, it is necessary to 'train' the speech recognition system. Speaking into the system helps in the training. Once the user gets used to the system, it becomes easy for the system to determine the possible outcome after a particular word or a phoneme. (AccuConfrence.com) 1.6.2 Speech recognition grammar guidelines A speech recognition grammar is a set of word patterns that direct the speech recognition system to respond to a human voice. Speech recognition grammar responds to the calls made by human beings in a predefined manner. The task of the analysis component in a speech-based interface is to convert the acoustic signal generated by a user utterance to a semantic content. Usually, this task is broken down into a number of sequentially ordered subtasks such as the following: • • •

Speech recognition: Convert the (digitized) speech signal to a string of words. Syntactic parsing: Convert the string of words to a syntactic structure or parse tree. Semantic interpretation: Convert the syntactic structure to a semantic representation (e.g. a logical formula or database query).

15


Parametric Modeling Using Natural Human Gestures

Case Studies

Chapter 2: Case studies

2

“As we explore new human interface devices and incorporate new interactions into our designs, we have the opportunity to create deep connections between users and their technology” (Jonathan Follett) There are a couple of different projects that were intended to present the ability of using human gestures to create, manipulate, and interact with 3D models in virtual space.

2.1 Sketchpad Introduction: Sketchpad is considered to be the first true attempt to enable the user to interact with a computer system outside a command line interface, but rather use the human’s ability to draw lines and shapes using a pen and paper mechanism. Sketchpad (aka Robot Draftsman) was a revolutionary computer program written by Ivan Sutherland in 1963 in the course of his PhD thesis, for which he received the Turing Award in 1988 (Fig. 2.01). It helped change the way people interact with computers. Sketchpad is considered to be the ancestor of modern computer-aided drafting (CAD) programs as well as a major breakthrough in the development of computer graphics in general. Sketchpad was the first program ever to utilize a complete graphical user interface, using an x-y point plotter display and the recently invented light pen 1 (Fig. 2.02). (Sutherland, 1963) Analysis: As mentioned before, Sketchpad is considered to be the ancestor of GUI and Human Computer interaction beyond the command line interface. This project gave me the reason to believe that a next step is ahead of us, which is a human interaction in space beyond GUI. How that NUI can replace the user’s interaction with computer systems and enabling more intuitive interaction with a machine. Sketchpad has also introduced the notion of shape recognition. As the user’s drawing is not accurate due to human nature, the computer system can overcome and draw the intended shapes by recognizing and analyzing the user’s drawing paths.

Fig. 2.01 Sutherland using Sketchpad

Fig. 2.02 Drawing on the computer screen using a light pen

1

A light pen is a computer input device in the form of a light-sensitive wand used in conjunction with a computer's CRT display. It allows the user to point to displayed objects or draw on the screen in a similar way to a touchscreen but with greater positional accuracy. (Wikipedia.com, 2010)

16


Parametric Modeling Using Natural Human Gestures

Chapter 2: Case studies

Fig. 2.03 Shape recognition with Sketchpad

2.2 SixthSense: Integrated information with the real world ‘Sixthsense’ is a wearable gestural interface that augments the physical world around us with digital information and lets us use natural hand gestures to interact with that information. (Mistry, 2009) Introduction: The SixthSense prototype is comprised of a pocket projector, a mirror and a camera. The hardware components are coupled in a pendant like mobile wearable device. Both the projector and the camera are connected to the mobile computing device in the user’s pocket. The projector projects visual information enabling surfaces, walls and physical objects around us to be used as interfaces; while the camera recognizes and tracks user's hand gestures and physical objects using computer-vision based techniques. The software program processes the video stream data captured by the camera and tracks the locations of the colored markers (visual tracking fiducials) at the tip of the user’s fingers using simple computer-vision techniques (Fig. 2.04). The movements and arrangements of these fiducials are interpreted into gestures that act as interaction instructions for the projected application interfaces. The maximum number of tracked fingers is only constrained by the number of unique fiducials, thus SixthSense also supports multi-touch and multi-user interaction. (Mistry, 2009)

Fig. 2.04 Pranav Mistry demonstrating SixthSense project

Fig. 2.05 Demonstrating how to draw on wall using the projector

17


Parametric Modeling Using Natural Human Gestures

Fig. 2.06 A phone dial pad displayed on a palm of the users hand.

Chapter 2: Case studies

Fig. 2.07 The notion of a live interactive newspaper using SixthSense.

Analysis: This project introduces the true way NUI is defined. The project introduces natural human gestures used by the everyday person in a physical practical fashion. People tend to look at a scene as in a picture frame using both their index fingers and thumbs of hands to create a rectangular frame and look through that frame to image a picture if taken in this direction and position. Also displaying any data on any surface makes the user free to use his tools anywhere and not bound to an electronic gadget or a display screen. Although this project looks at the future of NUI as a human-based interactive interface, but still it has many limitations. The gadget itself is kind of uncomfortable to walk with a camera and a projector hanging from you neck. You have to wear markers on your fingers, and you can only use four of your available ten.

2.3 3D Model Virtual Dressing Room Using Kinect Introduction: This polish company called “Arbuzz” had taken the augmented reality and NUI to a different level. They have created a virtual changing room were the user can test on clothes from a virtual wardrobe and fit it on him virtually (Fig. 2.08). The user can try out the clothes on the screen in front of him, and there is a virtual toolbar to pick and choose the clothes and accessories. The project has a sophisticated physics engine which enables the clothes to interact with the user wearing the virtual clothes. So if the user bends his torso, the dress also bends to take the body’s shape. Analysis: This project introduced the abilities of the Kinect and using it in augmented reality. The project also introduces the ability of introducing menus and toolbars that can work perfectly with the NUI guidelines. Lots of human gestures had been taken into considerations while developing this project, especially when it comes to a person trying out clothes. How this person’s looks at himself in a mirror when trying out a piece of new clothes, how he turns, bends and twist around himself to look at the product and how the product respond to this interaction. This project gave me a sense of what the Kinect sensor and the Kinect SDK are capable of by using both color and depth streams. Also how the user interacts with menus and toolbars 18


Parametric Modeling Using Natural Human Gestures

Chapter 2: Case studies

that can only be controlled if you are looking at yourself and interacting upon this vision. This project limitation comes to the less accurate depth stream and how it does not calculate accurately the user’s body and this is due to the low resolution the depth stream camera has.

Fig. 2.08 The user trying out a dress using the virtual wardrobe (arbuss.eu)

2.4 3D modeling in free-space using Kinect and Arduino 2 gloves Introduction: This project is mainly similar to what I am proposing in my research. This project was developed by using Kinect to capture the user’s skeleton and depth data to enable him to 3D model in space (Fig. 2.09). The project is a free-space 3D modeling tool using a Kinect camera to track his hands, which is used to create points in space and draft a model. To provide greater control while modeling in space, two Arduino-powered gloves that detect finger touches that act as mouse button clicks. The combinations between a thumb and any other finger on the same hand triggers a predefined command as move, rotate, edit, create polygon, etc‌ (engadget.com, 2011) Analysis: This particular project is the baseline for my research; it is developed to produce 3D models in space using the Kinect sensor to track the hands in 3D space. I wanted to improve this notion and substitute the Arduino gloves with voice commands. This will give more flexibility during the modeling process as I do not need to move my fingers in certain combinations to trigger predefined functions. Also voice commands will give the opportunity to add more predefined commands and make the user define what commands he needs. This project is limited in the 3D modeling process as it only can create points in space. Then the user chooses and joins them to create triangular or rectangular polygons.

2

Arduino is a popular open-source single-board microcontroller, descendant of the open-source Wiring platform, designed to make the process of using electronics in multidisciplinary projects more accessible. (Wikipedia.com, 2012)

19


Parametric Modeling Using Natural Human Gestures

Chapter 2: Case studies

Fig. 2.09 The 3D Space modeler with Kinect and Arduino gloves (engadget.com, 2011)

2.5 Lockheed Martin’s CHIL Project Introduction: This project developed by Lockheed Martin 3 is one of a kind 3D design and modeling project. It combines three main elements that complete the immersive experience for the users. These elements are virtual cave environment, motion capturing and collaborative work that allows multiple users to work in the same session. The company developed this project for the Collaborative Human Immersive Laboratory, or CHIL, to work and experiment with designing, developing, and testing Lockheed Martin’s products in a virtual reality testing environment (Fig. 2.10). This project is allowing engineering teams to work together in a collaborative environment with the technology of modeling and editing the projects in a 3D virtual reality environment. (engadget.com, 2011) This project is constructed of a huge area developed into a virtual reality cave. It also contains 24 motion capturing 4 cameras, theatrical lightings, and a cluster of huge flat screens covering the wall of the lab for further immersive experience. Analysis: This project gave me the thought of developing my research to include a second user. The CHIL project is a very successful applying the notion of collaborative development in virtual reality environment. The problem with this kind of project that is developed for certain uses and not available to everyone due to high tech equipment needed. Some of the good points of this project will be addressed in my research such as how to 3

Lockheed Martin is an American global aerospace, defense, security, and advanced technology company with worldwide interests 4 Motion capture, motion tracking, or mocap are terms used to describe the process of recording movement of one or more objects or persons.

20


Parametric Modeling Using Natural Human Gestures

Chapter 2: Case studies

overcome such a huge cost and make the notion of collaborative 3D modeling available to anyone.

Fig. 2.10 Lockheed Martin’s CHIL Lab in session (engadget.com, 2011)

2.6 MASTER-PIECE: A Multimodal (Gesture + Speech) Interface for 3D Model Search and Retrieval Integrated in a Virtual Assembly Application Introduction: MASTER-PIECE integrates gesture and speech modalities into a designer and assembly application so as to increase the immersion of the user and to provide a physical interface and easier tools for design, than the mouse and the keyboard. Moreover, the user is capable of generating simple 3D objects and search for similar 3D content in a database, which is nowadays another very challenging research topic. This project is developed so users can model and assemble various mechanical parts virtually and test them inside a virtual reality simulation environment. (Moustakas et al., 2005) The virtual assembly application is a graphical 3D interface for performing assembly of mechanical objects from their spare parts (Fig. 2.11). It has been initially developed to be used with haptic gloves and it allows the user to: • •

Assembly a mechanical object from its spare parts. Grasp and manipulate objects using haptic gloves.

The user is also capable of assembling parts of an object and record the assembly process for post-processing. The assembly procedure can be done using one or two hands (i.e. one or two haptic VR gloves 5). A position tracker (MotionStar Wireless Tracker 6) with one or two position 5

A VR glove (sometimes called a "dataglove" or "cyberglove") is an input device for human–computer interaction worn like a glove. Various sensor technologies are used to capture physical data such as bending

21


Parametric Modeling Using Natural Human Gestures

Chapter 2: Case studies

sensors installed is used to detect the position and orientation of the user hands in the space. (Moustakas el al., 2005)

Fig. 2.11Virtual Assembly application (Moustakas et al., 2005)

Another part of this project is the ability of free-form drawing in 3D space using hand motion tracking and voice commands (Fig. 2.12). In this part of the project, the user can draw lines in space using his normal hand motion, without the assistance of VR gloves, to draw basic shapes. The system tracks these lines and then run a shape recognition algorithm to recognize what is the shape drawn by the user (Fig. 2.13).

Fig. 2.12 Hand and head motion detection (Moustakas et al., 2005)

of fingers. Often a motion tracker, such as a magnetic tracking device or inertial tracking device, is attached to capture the global position/rotation data of the glove. 6

MotionStar is the most cost-effective means of tracking full-body motions in the world today.

22


Parametric Modeling Using Natural Human Gestures

Chapter 2: Case studies

Fig. 2.13 Shape recognition by determining tracked points on a polar angle scale

Analysis: MASTER-PIECE introduces a new way of user interaction during design and assembly phase. Although this project is mainly dedicated to mechanical assembly operations, it is still an innovative direction in Human Computer Interaction and NUI by using virtual reality, virtual reality gloves and voice commands. This is the first project that combines both hand tracking in virtual reality environment and voice commands used in the same process. This project’s main goal is to enable the user to interact with a predefined 3D library of mechanical parts. The user has the freedom to bring together the mechanical assembly required and test it. This is a main characteristic of the research proposed, as the user will have a huge freedom to model and deform an existing 3D model.

2.7 l’Artisan Électronique - Virtual pottery wheel Introduction: This project is a virtual representation of a pottery wheel, where the user can use his hand to sculpture and shape the basic virtual clay mold into any form using a similar rotation motion of an actual pottery wheel. The wheel is created virtually by representing a cylindrical mesh rotating virtually upon its Z axis and this is displayed on a translucent screen in front of the user. As the user moves his hand to deform and sculpture this mold of virtual clay, a 3D laser scanner scans the position and posture of the users hand and applies the same shape in deforming the virtual pottery on the screen (Fig. 2.14). After sculpturing the model, this is then processed and printed out using a custom made 3D Clay printer. This printer works on printing out the sculptured model by laying out layer over layer of clay. (unfold.be, 2011)

23


Parametric Modeling Using Natural Human Gestures

Fig. 2.14 The virtual pottery when which consist of a translucent screen and a laser scanner

Chapter 2: Case studies

Fig. 2.15 The virtual pottery scanner and 3D clay printer

Analysis: This project is a mere representation of one of this research applied demo. But instead in this research the Kinect sensor will take over the scanning part and locating the position of the hand in space. The pottery wheel will be an existing 3D model inside of Rhino and the user will have the opportunity to sculpture using his hands inside of Rhino.

24


Parametric Modeling Using Natural Human Gestures

Contribution

Chapter 3: Contribution

3

“The art of research is the art of making difficult problems soluble by devising means of getting at them.” (Jonathan Follett)

3.1 Research introduction This research is based upon investigating if 3D modeling can be successful with gadgetless input process. In this research I will develop an add-on to an existing 3D modeling application which is Rhino. This add-on is a framework that enables users to use their natural gestures and voice as input to Rhino. This will give the user the freedom he had during old fashion modeling using his hand postures, movements and gestures. The user’s natural gestures which are used in hand sculpturing putty, or molding clay, or pottery making using a pottery wheel. Natural hand gestures are as sculpturing a statue out of a rock solid base using hammers and chisels. I chose Rhino to develop the add-on in for various reasons. Rhino is a NURBS 1 – based 3D modeling application. A NURBS system gives huge freedom working with curves in creating and developing them. It gives great smoothness to curves without using a lot of control point, thus not using extensive calculations or occupying huge computer memory space. Rhino is a very popular 3D modeling application among college students and professional design firms. So developing this research as an add-on to Rhino will make this research be tested, modified, and developed among a wide range of users. As mentioned before, the main input to Rhino will be the user’s natural gestures and voice commands. Both, gestures and voice commands, will be captured by a Microsoft Kinect for Xbox 360. This sensor will capture and track the user’s skeleton by analyzing his position in space. Using this skeleton and representing in Rhino is done by embedding the Kinect SDK commands inside Rhino by using Grasshopper 3D 2 for coding as shown in Fig. 3.01. All the code is written in C# language and compatible for Microsoft .NET Framework 4.0 3.

1

Non-uniform rational basis spline (NURBS) is a mathematical model commonly used in computer graphics for generating and representing curves and surfaces which offers great flexibility and precision for handling both analytic (surfaces defined by common mathematical formulae) and modeled shapes. (Wikipedia.com, 2010) 2 Grasshopper™ is a visual programming language developed by David Rutten at Robert McNeel & Associates (McNeel, 2009) 3 The .NET Framework (pronounced dot net) is a software framework developed by Microsoft that runs primarily on Microsoft Windows.

25


Parametric Modeling Using Natural Human Gestures

User's gestures + voice commands

Kinect SDK

Chapter 3: Contribution

Coding in Grasshopper

Modeling in Rhino

Fig. 3.01 The input process from the user until reaching the modeling stage in Rhino

This research will demonstrate the ability to use natural human gestures to model in an existing 3D modeling application. This research is not just to substitute the main input methods from keyboard and mouse to human natural gestures and voice commands, but to involve more of the natural human gestures in the 3D modeling. To demonstrate this more I have prepared two examples focusing on the involvement of natural human gestures in the 3D modeling process. These examples are: • •

Virtual pottery wheel in Rhino Free-form 3D revolve modeling.

Before displaying these two examples I will demonstrate the basic foundation of the NUI add-on developed for Rhino in Grasshopper. The basic foundation of the add-on depends on the user’s tracking in 3D space via Kinect. The representation of the user is represented as points and joints in Rhino’s 3D space.

3.2 Skeleton tracking 3.2.1 Skeleton tracking basics Skeleton tracking is the tracking of 20 defined points in the user’s body which the Kinect can track and the Kinect SDK can interpret their 3 dimensional positions in space. The SDK tracks a user’s skeleton from tracking and locking on a player within the depth stream. As mentioned before, the depth stream is the IR camera reading the reflection from the IR projector on objects in space (Fig. 3.02). Whenever a body-like figure is moving in front of the depth stream within the readable ranges, the SDK recognizes this object as a player or a user. A bodylike figure is defined in the SDK as a head above a body torso and two arms extending from each side. Once these elements are recognized by the SDK, which is the upper half of a user, the SDK locks this object as a player in the depth stream. Once the player is locked the SDK defines the 20 points and joints of the player’s skeleton and define them as points with X, Y, and Z coordinates (Fig. 3.03). (Webb el al., 2012)

26


Parametric Modeling Using Natural Human Gestures

Fig. 3.02 The depth stream from Kinect without the user has been tracked yet. The monochrome color of the depth stream indicates that the light grey is near to the camera and the grey color gets darker when the objects go deeper. Beige and purple color means that these depths are undefined

Chapter 3: Contribution

Fig. 3.03 The depth stream after a player has been tracked and the SDK has determined where the locations of the skeleton joints are. The Player has a different color than the monochrome of the depth stream.

3.2.2 Kinect Skeleton tracking and grasshopper Grasshopper as described before is a visual coding application inside of Rhino which helps modeler to parametrically model using visual programming through predefined or custom components (Fig. 3.04). Also Grasshopper gives the opportunity to develop components through Grasshopper SDK and Rhino 4.0 SDK coded by the usage of Visual C#.NET. In order to code a custom made component and make it usable by other user, there is a certain structure coding the component. The Grasshopper component is mainly a C# class which uses the Grasshopper and Grasshopper IO as reference libraries. To develop a C# component it much contains 4 main functions beside the class constructor. These functions are: • • • •

Register Input Parameters function: which is used to register and allocate a part of the memory to the input parameters used by the component. Register Output Parameters function: which is used to register and allocate a part of the memory to the output parameters the component produces. Solver instance function: This is the main function where the programming of the component occurs. A GUID function: To ensure the uniqueness of the component.

27


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

Fig. 3.04 A visual description of the skeleton component and it’ as it appears in Grasshopper. Also the inputs and outputs of the component and their description

Pseudo code for skeleton recognition in Grasshopper *Note: the pseudo code is a code representation in plain English and written in the same way the classes or functions are defined. For more elaboration please refer to the appendix for the full written code in C# Class for detecting joints in space Class to create a skeleton from these joints detected Main component class contains: • • • • •

Function for input parameters of the Grasshopper component. Function for output parameters of the Grasshopper component. Function to setup the Kinect sensor Function that captures the depth stream from Kinect and creates a skeleton IF a user is detected. Function for a solver instance where it LOOPS through all the skeletons available and then LOOPS through each skeleton to grab the Point3d list available, the Joint ID’s available, and the Bones FOREACH skeleton. Then apply any changes induced from the input parameters as scale and height, and then send them as output through the output parameters.

28


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

Fig. 3.05 A skeleton component used with input and output

Fig. 3.06 The skeleton representation by points and bones in Rhino

29


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

3.3 Speech recognition in Grasshopper Speech recognition is one of the unused gems of the Kinect sensor, as most developer are overwhelmed with the sensors depth stream and skeleton tracking ability, they tend to overlook what voice commands and recognition can offer to them. Developing the speech recognition component for Grasshopper was challenging. Grasshopper is coded to work on a single processer 4 of whatever computer system it is working on even if this system is a multi-core processing 5 system. (Rutten, 2010) So this means that only one thread 6 is responsible for all the computational calculations happening inside of Grasshopper. So the problem with Grasshopper being a single core application and speech recognition is that speech recognition needs to have the audio channel open continuously to receive the audio (voice commands) the user is giving, then process these commands and reflecting that on the Grasshopper component’s output. This is occurring while the depth stream camera and IR projector are also working continuously to receive the user’s position in space. To have all of this work on a single processer is so demanding which makes the whole Rhino application hang up and crashes due to overload on the processor. To overcome this problem I have come up with two solutions, which are: Calling the speech recognition component on demand This was the first solution I came up with to overcome the problem of making the audio stream always open waiting for a command at any time from the user. Instead I figured out that if this component was called on demand, whenever the user needs the speech recognition component to be active, it will only work for a couple of seconds and then it will sleep until called upon again by the user. In order to call the speech recognition component the user had to perform a certain gesture that is completely different than any other modeling gesture he uses so it can be unique for calling the speech recognition command (Fig. 3.07). When the gesture is preformed, the component is activated by a Boolean switch and the audio stream opens waiting for the user to speak out the command. After the user speaks out the command and the component recognizes it, the audio stream closes. Opening the audio stream and waiting for the user to speak out the command hangs the application up until the audio stream closes. This usually takes from a minimum of 2.5 seconds to a maximum of 54 seconds which depends on the recognition of the commands. This was unacceptable as the user doesn’t know if the command was accepted or not, and hanging up a computer application destroys the fluidity performance expected from any computer application. 4

Processor is an electronic circuit which executes computer programs, containing a processing unit and a control unit 5 A multi-core processor is a single computing component with two or more independent actual processors (called "cores") 6 A thread of execution is the smallest unit of processing that can be scheduled by an operating system.

30


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

Fig. 3.07 The unique gesture the user have to preform to activate the speech recognition component

Using multithreading in the audio stream A better idea is to use a different processor dedicated only to activate the audio stream as soon as the component is initialized and its only job is to listen and send any recognized command to Grasshopper. In order to achieve this I had to convert the speech recognition component and enable it to work in a multithread 7 process. This solution is applicable to any computer system which is highly configured enough to run Rhino, as it is most probably working under a multi-core processing system. If the whole component will work on a different thread that will not effect in Grasshopper as it is not working on the same thread as Grasshopper does. So the best solution is to make a certain feature just work on a different thread and whenever this thread has a recognized command it stores it in a global variable that is accessible by any class instance or function calling this variable. The LISTEN function will be working always on a different thread and whenever it recognizes a command it will store it in LASTCOMMAND global variable, which is called by the grasshopper SPEECH component (Fig. 3.08) and (Fig. 3.09).

Fig. 3.08 Grasshopper speech recognition component working with multi-threading processing 7

Multithreading computers have hardware support to efficiently execute multiple threads

31


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

Fig. 3.09 Grasshopper speech recognition component

Pseudo code for speech recognition in Grasshopper Class CommandRecognizer is a helper class to open the audio stream and recognize the commands listened by cross referencing it with a user defined • •

Function dedicated to open the audio stream and returns a recognized command Function to obtain the speech spoken to the sensor and finding the closest match in the user defined word library

Main component class contains: • • •

Function for input parameters of the Grasshopper component which is received the users custom command library as a String List. Function for output parameters of the Grasshopper component which outputs a single command recognized as a String. Function for a solver instance which it first builds the command recognition grammar from the user’s String List from input. Then creates a new thread to activate the Listen Function from the CommandRecogniozer Class. The output comes from the LastCommand global variable that contains any recognized command from the helper class.

32


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

3.4 Virtual Pottery Wheel Demo 3.4.1 Demo introduction This example is developed to investigate the capabilities of using natural human gestures and speech commands in modeling pottery using a simulation of the traditional physical process of making pottery using a continuously turning wheel. The user uses the movement and position of his both hands to form, shape and sculpture the virtual pottery mold. These movements and gestures are taken from the traditional pottery 8 making such as how to elongate a cylindrical shaped mold, how to hollow a mold, how to flatten a mold, etc‌ 3.4.2 Hand gestures for pottery making The following illustrations will demonstrate the natural gestures used in physical pottery making and how to translate these gesture to be used digitally throughout this example.

Fig. 3.10 The physical press-in gesture which is called “Centering� where the user presses in with his hands to form the pottery mold to a cylindrical hemispherical form. (Rhodes, 1976)

8

Pottery is the material from which the pottery ware is made, of which major types include earthenware, stoneware and porcelain. (Wikipedia.com, 2012)

33


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

Fig. 3.11 The physical press-in gesture with one hand pressing with its palm open and one hand pressing with the side of the hand. This gesture is to create a slimmer top and round bottom part. This gesture is called “overleaf�. (Rhodes, 1976)

Fig. 3.12The physical hollowing gesture where the user uses the palm of his hands to press from the top and from in to out to hollow the mold. (Rhodes, 1976)

34


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

Fig. 3.13 The physical gesture to elongate and slim a mold by using the hands and press in with a lifting motion. This gesture is called “Collaring-in�. (Rhodes, 1976)

These gestures will be used by the user to form and shape the pottery mold in any form he want, but also using natural gesture of true physical modeling.

3.4.3 Developing the example in Grasshopper This example depends on the skeleton tracking and speech recognition components that I developed for Grasshopper. The skeleton tracking component will be the main component for the user to use in the modeling and forming process. The speech recognition component will be used in various selections, viewport operations, starting, and resetting the example. The example is depending mainly on analyzing the movement of the user’s hands and their position in space. I had faced a couple of problems to develop an example that is a virtual pottery wheel in Rhino. This as an example of these obstcales: Overcoming DAG Grasshopper is coded to work in a Direct Acyclic Graph 9 (DAG) structure (Fig. 3.14). These means if a started with a certain geometry and I wanted to manipulate it during development and make this manipulation reflect on the original geometry, I will receive an error from Grasshopper telling me that you want to create a cycle of data which is forbidden so the

9

In mathematics and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles.

35


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

system doesn’t go into an infinite loop 10 and the system crashes. So, to manipulate the geometry and respect DAG I will have to create a new geometry after each manipulation, which has a very high associated computational cost. My design proposal to overcome DAG is to create a single custom C# component in Grasshopper responsible of creating geometry from whatever information given to it. This information is stored in a single text file which is readable by all the components that are involved with manipulation of the geometry. Components that are responsible for selection, moving, rotating, scaling, skeleton tracking, and speech commands all have to do with the manipulation of the original geometry (Fig. 3.15).

Fig. 3.14 A representation of a Direct Acyclic graph (DAG)

10

An infinite loop (also known as an endless loop or unproductive loop) is a sequence of instructions in a computer program which loops endlessly, either due to the loop having no terminating condition, having one that can never be met, or one that causes the loop to start over

36


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

Fig. 3.15 A diagram describing how to overcome the DAG that is enforced by Grasshopper in this example.

3.4.4 Pseudo code and how to use the example Step 1: Initialization • • • • • • •

The initial state of the example is at the start of the example Grasshopper solution or IF the user speaks out the command “RESET SCENE”. Place initial geometry in scene. Decompose the geometry into vertices and delete initial geometry. Write in the initial vertices on the common text file (to overcome DAG) Make vertices the control points of the geometry. Create an array and store the vertices in. Create the working initial geometry from the control-points vertices array.

37


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

Step 2: Modeling mode: •

• • • • • • •

Once the user stands in front of the Kinect sensor, the sensor can detect him and a user skeleton representation will appear in the scene and by default the example is in “MODELING MODE”. The scene camera will be placed at the same position as the user’s head position so the View will be as if the user is standing in front of the wheel. The user can start modeling by speaking out the command “START MODELING”. The user will move his hands towards the model to form it. The change in the control-points vertices position will be recorded on the common text file. The main geometry making component will detect the change in vertices position. It will insert those new vertices in the index position of old unchanged ones. The geometry will be redrawn again with the new vertices. The user can stop modeling IF he speaks out the command “STOP”.

Step 3: Edit mode: • • •

The user can change the form after modeling from the control points. To enter the edit mode the user must speak out the command “EDIT MODE”. The Profile of the revolved form made previously by the user will appear and this will be represented as NURBS curve that the user can change via control points and the form will be directly altered. Any change of position happens to the control-points vertices the component will write and append the changes in the common text file.

Step 4: View mode: • •

The user can change his view to the scene by speaking out the command “VIEW MODE”. In view mode the user can zoom in and out by speaking out the command “ZOOM”. When in zoom mode, to zoom in the user will bring his both hands near to each other. To zoom out, he will move his hands away from each other. Also in view mode the user can pan in the scene by speaking out the command “PAN”. When in pan mode, the user can move his hand in any direction so the view changes its viewing position in the direction of the user’s hand.

38


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

3.4.5 Scenes and screenshots from the example

Fig. 3.16 The virtual pottery wheel during modeling a vase

Fig. 3.17 Close up on the modeling the pottery

39


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

Fig. 3.18 The user editing the vase profile in edit mode

Fig. 3.19 Another model produced from the virtual pottery wheel

40


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

3.5 Free-form 3D revolve modeling 3.5.1 Demo introduction In this example, the user can draw a curved profile by waving and stroking his hand in space. This will give the user to draw and free form and the example is designed to create a profile from these points capture from the user’s actions. The drawing action and revolving profile are created by speech commands the user must give to the Kinect. 3.5.2 Pseudo code and how to use the example Step 1: Initialization: • • • •

The user initial state that he is standing in an empty Rhino scene. The user speaks out the command “START PROFILE” to start drawing with his right hand in the air. These points are written in a TEXT file, as described in the previous example, to enable editing after the creation of the revolved model. The user speaks out the command “END PROFILE” to create a revolved model from the profile drawn.

Step 2: Edit mode: • • • • •

Edit mode is on after creating the revolved model. The user can approach the profile and choose a desired vertex he wants to move to alter the revolved model’s form. The user uses his right hand to snap a point. When a red shape appears on a vertex, this means it is snapped upon. The user speaks out the command “SELECT” to select the snapped vertex. The user moves his right hand in any direction and speaks out the command “Place” to move the selected point to this new location, thus the profile and the form is altered.

Step 3: View mode: • •

The user can change his view to the scene by speaking out the command “VIEW MODE”. In view mode the user can zoom in and out by speaking out the command “ZOOM”. When in zoom mode, to zoom in the user will bring his both hands near to each other. To zoom out, he will move his hands away from each other. Also in view mode the user can pan in the scene by speaking out the command “PAN”. When in pan mode, the user can move his hand in any direction so the view changes its viewing position in the direction of the user’s hand.

41


Parametric Modeling Using Natural Human Gestures

Chapter 3: Contribution

3.5.3 Scenes and screenshots from the example

Fig. 3.20 The free-form revolved form created after the user defined a profile

42


Parametric Modeling Using Natural Human Gestures

Chapter 4: Conclusion

Conclusion

4

“Begin thus from the first act, and proceed; and, in conclusion, at the ill which thou hast done, be troubled, and rejoice for the good.â€? (Pythagoras) This research was to investigate the possibility of giving an opportunity to a digital 3D modeler or 3D artist to model using his hands physically in a digital world. This research can be a basic foundation for physically 3D modeling interaction with a computer system. Although the technology available nowadays is considered to be primitive to accomplish such, but technology is enhancing rapidly and this basic idea soon will be the natural way to digitally 3D model. For future considerations and enhancement, the Kinect sensor should have a more precise IR projector and a higher resolution camera. This will enable more body parts to be detected easily and this will give more flexibility for the user. This enhancement will be able to detect hands as a palm and five fingers, instead of now just one point in space. The Kinect SDK 1.5 that should be released in May 2012 will have several new options of more detection ability in the dead zone in front of the sensor, and will have a mode to work with the top half of the body, instead of the whole body. In this case the user can model more comfortably sitting down and sitting more near to the screen (Microsoft, 2012). A future perspective for NUI is that the future technologies will produce a more compact version of the Kinect sensor to be embedded directly on laptop computers and workstations. So NUI will soon replace the traditional mouse and keyboard as input peripherals for computer systems. This will give this research idea more advanced development to extend its applications to include CAD / CAM applications, BIM applications, artistic drawing applications, photography editing applications, etc‌ A user now can be represented as a full skeleton with true measurements via Kinect. This can introduce a notion of architectural modeling with the sense of actual real physical building, but virtually. Different users with different expertise can each help in building the whole building virtually first, and each user can perform different tests as steel stress test virtually. Other tests such as wind and earthquake simulation can be tested in a scale of one to one as the user now is represented one to one in the virtual world. Although this research seems that it needs a lot of learning and training to obtain the necessary skills to model using NUI, but one of the main goals is to develop a system so sophisticated yet so easy for any one, professional or not, to use it with ease without prior knowledge nor training.

43


Parametric Modeling Using Natural Human Gestures

Chapter 4: Conclusion

The important feature that can be taken as a research focus in the near future is the ability for more than one user to use and participate in the same session. This will lead to more collaborative modeling which can lead to a new way to sharing modeling responsibilities and may lead in a much faster modeling process with unexpected forms and models resulted. I am predicting that human computer interaction and digital architectural tools will have a new level of involving natural human interaction and gestures to break down the barrier of humans using intelligent machines.

44


Parametric Modeling Using Natural Human Gestures

Chapter 5: References

References

5

o 3Dconnexion.com, (2011), 3D Mouse, http://www.3dconnexion.com/products/what-is-a3d-mouse.html o Arbuzz.eu, (2011), 3D Model Virtual Dressing Room Using Kinect, http://kinect.dashhacks.com/kinect-hacks/2011/07/11/3d-model-virtual-dressing-roomusing-kinect o AccuConfrence.com,(2010), Ultimate Guide to Speech Recognition, http://www.accuconference.com/resources/speech-recognition.aspx o Ballmer, S., Consumer Electronics Show CESW 2010, Microsoft. o Cnet.com, (2010), Kinect selling figures for 2010, http://news.cnet.com/8301-10797_310253306-235.html o Engadget.com,(2011), 3D Modeling in Free-space Using Kinect and Arduino2 Gloves, http://www.engadget.com/2011/03/15/kinect-homemade-power-gloves-3d-modeling-infree-space-vide/ o Engadget.com, (2011), Lockheed Martin’s CHIL Project, http://www.engadget.com/2011/01/28/lockheed-martins-chil-blends-motion-capturewith-vr-creates-zo/ o Hinman, R. (2011), What are the basic principles of NUI design?, Quora.com o Hp.com, (2009), HP Compaq Tablet PC, http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c0019329 0&lang=en&cc=us&taskId=101&prodSeriesId=307008&prodTypeId=321957 o Kingston.com, (2010), Trackball Mouse, http://www.kensington.com/kensington/us/us/s/1444/trackballs.aspx o Lu, G. (2011), “Real-time Immersive human-computer interaction based on tracking and recognition of dynamic hand gestures”. Doctoral thesis, University of Central Lancashire o McNeel, R. (2012), Rhinoceros 3D, http://www.rhino3d.com o Microsoft.com, (2010), Kinect for Xbox 360, http://kinectforwindows.com o Microsoft.com, (2011), Kinect SDK v1.0, http://kinectforwindows.com 45


Parametric Modeling Using Natural Human Gestures

Chapter 5: References

o Microsoft.com, (2011), Microsoft Surface, http://www.microsoft.com/surface/en/us/default.aspx o Microsoft.com, (2011), About Speech Recognition Grammars, http://msdn.microsoft.com/en-us/library/hh361669.aspx o Mistry, P., Maes, P. (2009), SixthSense: Integrated Information with the Real World, MIT Media Lab, http://www.pranavmistry.com/projects/sixthsense/ o Rhodes, D., (1976), Pottery Form, Dover Publications, Chapter 3, “Centering”, 10 – 14, Chapter 4, “Hollowing”, 15 – 17 o Rogers, Y., Sharp, H., Preece, J.,(2011), Interaction Design: Beyond Human-Computer Interaction, John Wiley and Sons, Chapter 6.3, “Natural User Interfaces”, 215 – 217 o Saffer, D., (2011), What are the basic principles of NUI design?, Quora.com o Wigdor, D., Dennis, D., (2011), Brave NUI World, (Morgan Kaufmann Publishers) Chapter 1, “Introduction”, 3 – 8, Chapter 2, “The Natural User Interface”, 9 – 14, Chapter 4, “Less is more”, 23 – 26. o Saffer, D., (2011), What are the basic principles of NUI design?, Quora.com o Srinivasan, A., Premnath, C., Ravikumar, J., (2007), Advances in Computer Vision and Information Technology, I.K. International, Part 10, P 1465, A Novel Approach to Deviceless Computer Access for Disabled Users. o Sutherland, I., (1963), Sketchpad: A Man-machine Graphical Communication System, PhD diss., University of Cambridge, 1963 o Mostakas, K., Tzovaras, D., Carbini, S., Bernier, O., Viallet, J. E., Raidt, S., Mancas, M., Dimiccoli, M., Yagci, E., Balci, S., Leon, E. I., (2005), MASTER-PIECE: A Multimodal (Gesture + Speech) Interface for 3D Model Search and Retrieval Integrated in a Virtual Assembly Application, Enterface ’05, Mons, Belgium o Wacom.com, (2012), Bamboo Tablet and Stylus, http://www.wacom.com/en/Products/Bamboo/BambooTablets.aspx o Webb, J., Ashley, J., (2012), Beginning Kinect Programming with Microsoft Kinect SDK, (Apress), Chapter 1, “Getting Started”, 7 – 22, Chapter 2, “Application Fundamentals”, 23 – 42, Chapter 3, “Depth Image Processing”, 49 – 72, Chapter 4, “Skeleton Tracking”, 85 – 120, Chapter 6, “Gestures”, 167 – 220, Chapter 7, “Speech Recognition”, 223 – 245

46


Parametric Modeling Using Natural Human Gestures

Chapter 5: References

o Wikipedia.com, (2012), Directed Acyclic Graph, http://en.wikipedia.org/wiki/Directed_Acyclic_Graph o Wikipedia.com, (2012), punched card, http://en.wikipedia.org/wiki/Punched_card o Wikipedia.com, (2012), Computer Mouse, http://en.wikipedia.org/wiki/Computer_mouse o Wikipedia.com, (2012), Head Mount Display, http://en.wikipedia.org/wiki/Headmounted_display o Wikipedia.com, (2012), Data Glove, http://en.wikipedia.org/wiki/Wired_glove o Wikipedia.com, (2012), Perspective Pixel, Natural User Interface, en.wikipedia.org/wiki/Natural_User_Interface.htm o Wikipedia.com, (2010), Light Pen, http://en.wikipedia.org/wiki/Light_Pen o Unfold.be, (2011), l’Artisan Électronique - Virtual pottery wheel, http://unfold.be/pages/projects/items/l%E2%80%99artisan-electroniqu

47


Parametric Modeling Using Natural Human Gestures

Chapter 6: Appendix

Appendix

6

6.1 Skeleton tracking component – Grasshopper – C# public class Rjoint { // Constructor of recognized joint class which represents a point in space public Rjoint(JointType _id, Point3d _p) { pos = _p; id = _id; if (jnames == null) // Initilize with an empty string list to store the ID names for each

joint

{ jnames = new string[JCOUNT]; for (int i = 0; i < JCOUNT; ++i) // Loop through all the 20 joints representing the

skeleton

{

array }

}

jnames[i] = ((JointType)i).ToString(); // Add the joint ID string to the joint names

}

public const int JCOUNT = 20; // Constant of 20 joints per skeleton public Point3d pos = Point3d.Origin; // Initialize the point3d position field by a point in space at the origin (0,0,0) public JointType id = JointType.AnkleLeft; // Initialize the ID JointType field by any ID joint number public static string[] jnames; } public class Rskeleton { // Constructor of the recognized skeleton in space public Rskeleton(int _UserID) { UserID = _UserID; for (int i = 0; i < Rjoint.JCOUNT; ++i) // loop through all 20 points of the skeleton recognoized { J[i] = new Rjoint(((JointType)i), Point3d.Origin); // Initialize each joint by it's ID and position initialized at the origin } } public int UserID = -1; // Initialize the UserID by -1 --- No user found public Rjoint[] J = new Rjoint[Rjoint.JCOUNT]; // Initilize a Rjoint array by the number of joints in the skeleton public List<Point3d> GetPositions() // Function to obtain the positions in space { //Initialize a list of Point3d positions of joints in space then loop through each joint to obtain the position List<Point3d> pl = new List<Point3d>(J.Length); foreach (Rjoint rj in J) { pl.Add(rj.pos); } return pl;

48


Parametric Modeling Using Natural Human Gestures

Chapter 6: Appendix

}

}

//Function to return a list of lines representing the bones between joints public List<Line> GetBones() { List<Line> ln = new List<Line>(); //Define Left Leg ln.Add(new Line(J[(int)JointType.FootLeft].pos, J[(int)JointType.AnkleLeft].pos)); ln.Add(new Line(J[(int)JointType.AnkleLeft].pos, J[(int)JointType.KneeLeft].pos)); ln.Add(new Line(J[(int)JointType.KneeLeft].pos, J[(int)JointType.HipLeft].pos)); ln.Add(new Line(J[(int)JointType.HipLeft].pos, J[(int)JointType.HipCenter].pos)); //Define Right Leg ln.Add(new Line(J[(int)JointType.FootRight].pos, J[(int)JointType.AnkleRight].pos)); ln.Add(new Line(J[(int)JointType.AnkleRight].pos, J[(int)JointType.KneeRight].pos)); ln.Add(new Line(J[(int)JointType.KneeRight].pos, J[(int)JointType.HipRight].pos)); ln.Add(new Line(J[(int)JointType.HipRight].pos, J[(int)JointType.HipCenter].pos)); //Define Torso/Head ln.Add(new Line(J[(int)JointType.HipCenter].pos, J[(int)JointType.Spine].pos)); ln.Add(new Line(J[(int)JointType.Spine].pos, J[(int)JointType.ShoulderCenter].pos)); ln.Add(new Line(J[(int)JointType.ShoulderCenter].pos, J[(int)JointType.Head].pos)); //Define Left Arm ln.Add(new Line(J[(int)JointType.ShoulderCenter].pos, J[(int)JointType.ShoulderLeft].pos)); ln.Add(new Line(J[(int)JointType.ShoulderLeft].pos, J[(int)JointType.ElbowLeft].pos)); ln.Add(new Line(J[(int)JointType.ElbowLeft].pos, J[(int)JointType.WristLeft].pos)); ln.Add(new Line(J[(int)JointType.WristLeft].pos, J[(int)JointType.HandLeft].pos)); //Define Right Arm ln.Add(new Line(J[(int)JointType.ShoulderCenter].pos, J[(int)JointType.ShoulderRight].pos)); ln.Add(new Line(J[(int)JointType.ShoulderRight].pos, J[(int)JointType.ElbowRight].pos)); ln.Add(new Line(J[(int)JointType.ElbowRight].pos, J[(int)JointType.WristRight].pos)); ln.Add(new Line(J[(int)JointType.WristRight].pos, J[(int)JointType.HandRight].pos)); return ln; }

public class FF_Kinect_Skeleton : GH_Component // Main Grasshopper component class { public FF_Kinect_Skeleton() // Class constructor : base("Kinect Skeletal Tracker", "Skeleton", "Kinect Skeletal Viewer will allow tracking of up to two skeletons at a time.", "Firefly", "Vision") { } protected override void RegisterInputParams(GH_Component.GH_InputParamManager pManager) // Registering input variables for the component { pManager.Register_BooleanParam("Start Skeleton Tracker", "S", "Start Skeleton Tracker", true); pManager.Register_DoubleParam("Skeleton Scale Factor", "Sf", "Scale Factor for Skeleton", 1.0); pManager.Register_DoubleParam("Translate in Z-axis", "Z", "Translate the Skeleton Up or Down in the Z-axis", 0.0); pManager.Register_IntegerParam("Timer interval (in milliseconds) for updating events", "T", "Timer interval (in milliseconds) for updating events", 5); } protected override void RegisterOutputParams(GH_Component.GH_OutputParamManager pManager) // Registering output variables for the component { pManager.Register_PointParam("Points", "P", "Points"); pManager.Register_StringParam("Joint ID", "ID", "Joint ID"); pManager.Register_LineParam("Bones", "B", "Bones"); pManager.Register_StringParam("Out String", "out", "Print Out"); } #region "Methods"

49


Parametric Modeling Using Natural Human Gestures

Chapter 6: Appendix

void nuiFrameReady(object sender, SkeletonFrameReadyEventArgs e) // Function for retrieving the captured data frame by frame from the Kinect sensor { bool receivedData = false; using (SkeletonFrame k_skeletonFrame = e.OpenSkeletonFrame()) { if (k_skeletonFrame != null) { if (skeletons == null) { skeletons = new Skeleton[k_skeletonFrame.SkeletonArrayLength]; } k_skeletonFrame.CopySkeletonDataTo(skeletons); receivedData = true;

} } if (receivedData) { int snum = 0;

List<int> Tracked = new List<int>(); foreach (Skeleton k_sk in skeletons) { snum++; debugdata.Add("TID: " + k_sk.TrackingId.ToString() + " " + k_sk.TrackingState.ToString()); if (SkeletonTrackingState.Tracked == k_sk.TrackingState) { Rskeleton r_sk; if (!RSkeletons.TryGetValue(k_sk.TrackingId, out r_sk)) { r_sk = new Rskeleton(k_sk.TrackingId); RSkeletons[k_sk.TrackingId] = r_sk; } Tracked.Add(k_sk.TrackingId); foreach (Rjoint r_j in r_sk.J) // create for each skeleton its own joints and adjust its transformations via user input { Joint k_j = k_sk.Joints[r_j.id]; Point3d pp = new Point3d(k_j.Position.X, k_j.Position.Z, k_j.Position.Y);

}

}

}

pp.Transform(rotatetransform); pp.Transform(scaletransform); pp.Transform(movetransform); r_j.pos = r_j.pos * 0.1 + pp * 0.9;

List<int> ToDelete = new List<int>(); // create a list for deleting the skeletons made in case the Kinect is unplugged or not in use foreach (int k in RSkeletons.Keys) { if (!Tracked.Contains(k)) ToDelete.Add(k); } foreach (int dk in ToDelete) { RSkeletons.Remove(dk); } } } private void UnregisterSkeleton() { if (kinectSensor != null) { try

50


Parametric Modeling Using Natural Human Gestures {

}

Chapter 6: Appendix

kinectSensor.SkeletonFrameReady -= nuiFrameReady; XKinect.RemoveRef();

} catch (Exception e) { MessageBox.Show("Kinect Uninitialize Error: " + e.Message); }

} private void UnregisterEvents() { UnregisterSkeleton(); Grasshopper.GH_InstanceServer.DocumentServer.DocumentRemoved -= OnDocRemoved; if (mydoc != null) mydoc.ObjectsDeleted -= this.ObjectsDeleted; mydoc = null; } public void OnDocRemoved(GH_DocumentServer ds, GH_Document doc) { GH_Document mydoc2 = OnPingDocument(); if ((mydoc2 == null) || (object.ReferenceEquals(mydoc2, doc))) { UnregisterEvents(); } } public void ObjectsDeleted(object sender, GH_DocObjectEventArgs e) { if ((e.Objects.Contains(this))) { UnregisterEvents(); } } private void SetupKinect() // initialize the Kinect and setup for receiving data { kinectSensor = XKinect.Sensor; if (kinectSensor == null) return; try { XKinect.AddRef(); kinectSensor.SkeletonFrameReady += new EventHandler<SkeletonFrameReadyEventArgs>(nuiFrameReady); } catch (Exception e) { AddRuntimeMessage(GH_RuntimeMessageLevel.Warning, e.Message); } } private void ScheduleDelegate(GH_Document doc) { this.ExpireSolution(false); } private static RecognizerInfo GetKinectRecognizer() { Func<RecognizerInfo, bool> matchingFunc = r => { string value; r.AdditionalInfo.TryGetValue("Kinect", out value); return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "enUS".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase); }; return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault(); } #endregion KinectSensor kinectSensor; Dictionary<int, Rskeleton> RSkeletons = new Dictionary<int, Rskeleton>();

51


Parametric Modeling Using Natural Human Gestures

Chapter 6: Appendix

Transform rotatetransform = Transform.Identity; Transform scaletransform = Transform.Identity; Transform movetransform = Transform.Identity; List<string> debugdata = new List<string>(); GH_Document mydoc; public bool iflag = false; Skeleton[] skeletons; public String recSpeech = "Empty"; protected override void SolveInstance(IGH_DataAccess DA) // solver function where all the component’s calculation happen { if (mydoc == null) { mydoc = OnPingDocument(); Grasshopper.GH_InstanceServer.DocumentServer.DocumentRemoved += OnDocRemoved; mydoc.ObjectsDeleted += ObjectsDeleted; } double tscale = new double(); double tmove = new double(); int tinterval = new int(); bool m_start = new bool(); int eAngle = new int(); if (

{

)

DA.GetData<bool>(0, ref m_start) && DA.GetData<double>(1, ref tscale) && DA.GetData<double>(2, ref tmove) && DA.GetData<int>(3, ref tinterval)

if (tinterval < 5) tinterval = 5; if (eAngle < -25) eAngle = -25; if (eAngle > 25) eAngle = 25; scaletransform = Transform.Scale(Point3d.Origin, tscale); movetransform = Transform.Translation(Vector3d.ZAxis * tmove);

if (XKinect.Status != KinectStatus.Connected) { AddRuntimeMessage(GH_RuntimeMessageLevel.Warning, "Kinect Status: " + XKinect.Status.ToString()); return; } if (!m_start) return; else { if (kinectSensor == null) { try { SetupKinect(); } catch (Exception e) { AddRuntimeMessage(GH_RuntimeMessageLevel.Warning, e.Message); } if (kinectSensor != null) { rotatetransform = Transform.Rotation(kinectSensor.ElevationAngle * Math.PI / 180.0, Vector3d.XAxis, Point3d.Origin); }

}

DataTree<Point3d> pt = new DataTree<Point3d>(); DataTree<Line> ln = new DataTree<Line>(); int pid = 0;

52


Parametric Modeling Using Natural Human Gestures

Chapter 6: Appendix

List<int> uid = new List<int>(); foreach (Rskeleton rs in RSkeletons.Values) { pt.EnsurePath(pid).AddRange(rs.GetPositions()); ln.EnsurePath(pid).AddRange(rs.GetBones()); uid.Add(rs.UserID); pid++; } DA.SetDataTree(0, pt); DA.SetDataList(1, Rjoint.jnames); DA.SetDataTree(2, ln); DA.SetData(3, RSkeletons.Keys.Count);

}

}

}

OnPingDocument().ScheduleSolution(tinterval, ScheduleDelegate);

6.2 Speech recognition component – Grasshopper – C# 6.2.1 CommandRecognizer class {

class CommandRecognizer public int DELAY = 100; public float CONFIDENCE = 0.2F; public string lastRecognizedCommand; // Global variable that hold the last command spoken public CommandRecognizer singleton = null; private private private private

SpeechRecognitionEngine sre; KinectAudioSource source; KinectSensor sensor; Stream s;

public CommandRecognizer(List<string> commands) { if (singleton != null) return;

// class constructor

if (sensor == null) { sensor = XKinect.Sensor; if(sensor != null) { try { XKinect.AddRef(); // Obtain the KinectAudioSource to do audio capture source = sensor.AudioSource; source.EchoCancellationMode = EchoCancellationMode.CancellationOnly; // No AEC for this

sample

source.AutomaticGainControlEnabled = false; // Important to turn this off for speech

recognition

}

}

} catch(Exception e) { throw (e); }

if (XKinect.Status != KinectStatus.Connected) {

53


Parametric Modeling Using Natural Human Gestures

}

Chapter 6: Appendix

Return;

RecognizerInfo ri = GetKinectRecognizer(); if (ri == null) { //logger.WriteLine("Could not find Kinect speech recognizer. Please refer to the sample requirements."); return; } //logger.WriteLine("Using: {0}", ri.Name); // NOTE: Need to wait 4 seconds for device to be ready right after initialization /*int wait = 4; while (wait > 0) { logger.Write("Device will be ready for speech recognition in {0} second(s).\r", wait--); Thread.Sleep(1000); }*/ sre = new SpeechRecognitionEngine(ri.Id); var colors = new Choices(); foreach (string command in commands) { colors.Add(command); } var gb = new GrammarBuilder { Culture = ri.Culture }; // Specify the culture to match the recognizer in case we are running in a different culture. gb.Append(colors); // Create the actual Grammar instance, and then load it into the speech recognizer. var g = new Grammar(gb); sre.LoadGrammar(g); sre.SpeechRecognized += SreSpeechRecognized; sre.SpeechHypothesized += SreSpeechHypothesized; sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected; }

singleton = this;

public void Listen() { lastRecognizedCommand = null; // initialize the global variable with null using (s = source.Start()) { sre.SetInputToAudioStream( s, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null)); // open audio stream to receive commands

}

}

sre.RecognizeAsync(RecognizeMode.Multiple); while (true) { Thread.Sleep(DELAY); // wait until the Kinect is ready to receive again }

private RecognizerInfo GetKinectRecognizer() // create a new speech recognizer from Microsoft Speech Recognition Engine { Func<RecognizerInfo, bool> matchingFunc = r => { string value; r.AdditionalInfo.TryGetValue("Kinect", out value); return "True".Equals(value, StringComparison.InvariantCultureIgnoreCase) && "en-

54


Parametric Modeling Using Natural Human Gestures

Chapter 6: Appendix

US".Equals(r.Culture.Name, StringComparison.InvariantCultureIgnoreCase); }; return SpeechRecognitionEngine.InstalledRecognizers().Where(matchingFunc).FirstOrDefault(); } private void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e) // function responsible of recognizing the command from the grammar library given by the user { if (e.Result.Confidence >= CONFIDENCE) { lastRecognizedCommand = e.Result.Text; } }

}

6.2.2 Kinect Speech class public class Kinect_Speech : GH_Component { Thread listenerThread; CommandRecognizer recognizer; string logger; List<string> commands = new List<string>(); public Kinect_Speech() : base("Kinect Speech Recognition", "Speech", "Speech with Kinect", "Firefly", "Vision") { } protected override void RegisterInputParams(GH_Component.GH_InputParamManager pManager) { pManager.Register_BooleanParam("Start", "S", "To start recognition", true); pManager.Register_StringParam("Commands", "Cmds", "Defining commands"); } protected override void RegisterOutputParams(GH_Component.GH_OutputParamManager pManager) { pManager.Register_StringParam("out", "out", "out"); } private void ThreadRunner() { recognizer.Listen(); } protected override void SolveInstance(IGH_DataAccess DA) { bool start = new bool(); if (DA.GetData<bool>(0, ref start) && DA.GetDataList<String> (1, ref cmds)) { if (start == false) return; else { if (recognizer == null) { // initialize a recognizer commands.Clear(); for (int i = 0; i < commands_input; i++) { commands.Add(commands_input[i]); }

recognizer = new CommandRecognizer(commands);

55


Parametric Modeling Using Natural Human Gestures

} try {

Chapter 6: Appendix

// create a thread that runs recognizer.Listen() ThreadStart threadStart = new ThreadStart(ThreadRunner); listenerThread = new Thread(threadStart); listenerThread.Start();

string command = recognizer.lastRecognizedCommand; if (command == null) { command = "none"; } else { recognizer.lastRecognizedCommand = null; } DA.SetData(0, command);

}

}

}

} catch (Exception ex) { logger = ex.Message; }

public override Guid ComponentGuid { get { return new Guid("{D036747E-1929-444E-B641-1DD82CA5CE1E}"); } }

56




Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.