Using Microsoft's Kinect Skeleton Tracking Feature to Explore the Relation Between a User and his Physical Environment Stavros Gargaretas The Why Factory Technical University of Delft Delft, The Netherlands Sgargaretas@student.tudelft.nl
Abstract - Our physical environments are augmented with electronic monitoring technologies that are increasingly learning how to collect data from their inhabitants. The project 'The Evolving Room', part of the greater research on 'Adaptive City', envisions a scenario where ubiquitous monitoring of human behavior, provides a truly interactive relationship between an inhabitant and his environment. This report will demonstrate the use of Microsoft's Kinect skeleton tracking feature in achieving this relationship. It will First introduce the technology and methodology behind monitoring and tracking human movement. It will then describe, in steps, the way that the project 'The Evolving Room' is using this methodology, to firstly track and learn from user behavior, and secondly how the Kinect offers the user an interface, with which he can connect , but can also virtually change his environment. I. Introduction With advances in wireless, sensory and embedded technologies we are soon approaching the pervasive era of technology [1]. This has changed the way that we interact and understand computers. 'Monitoring systems in particular will benefit from this advancement, as data collection will be a core component of future pervasive systems' [2]. New monitor-type applications and services suddenly become possible through this emerging form of 'Ubiquitous Monitoring (UM)' [3], which is defined as the 'use of pervasive or wearable devices for collecting data to provide services and application for users.' [4] These ongoing advancements in data collection provide an important step to understanding and monitoring our environments and how we interact with it; yet the implication to architecture is not yet defined, or moreover, not yet fully explored. To date, this data has found useful application in domains such as healthcare[5], education[6] and security/ surveillance[7], and has had primarily focused on providing features which are designed to support the users of the architecture; i.e. Intelligent Architecture.
directly respond to it's inhabitants? Could there be a more interactive relationship between user and their physical surroundings? A. Adaptive Architecture This truly interactive relationship between an inhabitant and his physical environment is called Adaptive architecture. Although adaptivity is implied even in the most nomadic architectural forms, and simple manual operations, the possibility offered by new technologies implies a new architecture; one that becomes Kinetic in an interactive way; particularly through digital driven actuation devices [9]. The research into this kind of Kinetic architecture focuses on using sensor data and software to actuate structures. Where it differs from 'Intelligent Architecture', is that because of its immediacy, it implies a 'two-way interaction between the user and his technologically changing physical environment'. [10] Such environments become customized to the individual, and function with a monitoring/ data collection system, which is highly interactive; unlike older intelligent systems which had to store, process and interpret data. [11] B. The Evolving Room Project Within this 'new' definition of Adaptive Architecture, the 'Evolving Room' project envisions a scenario of immediate user to environment interaction. To achieve this, Microsoft's Kinect sensor [Fig.1] is used in two ways: 1. monitor/data collection system 2. Interface system, for interaction between the human and his physical environment.
This 'intelligence' manifests in Building Management Systems; i.e. systems of networks of intelligent electronic devices that monitor and control different parts of a building [8]. These remain at the level of, changing display screens, turning appliances on and off and regulating physiological conditions such as temperature and humidity. The question becomes, could this pervasive data collection offer a more dynamic form of actuation; i.e. Could the physical environment 1
The physical environment, therefore, (manifest here as the kinect space) gathers behavioral data from the user and uses this data for real time physical actuation; i.e. if the user will want to sit; the room will transform accordingly, based on it's memory set. At the same time, the room allows the user to interact with this transformation creating a feedback loop into the system. This Feedback loop forms the basis of the 'evolution' implied in the title; as the space will evolve with the user's evolving behavior. C. Scope of the Report
This report will begin with an introduction into the technical possibilities offered from the Kinect sensor and will describe (through examples) the way that the sensor is used within the
Grasshopper/ Rhino interface for this project. It will then go on to describe the two different ways the sensor has been used in the project [described above] and which form the logic of 'Evolving Room'. The report will finish with a reflection on the process and on the use of the Kinect sensor, and will indicate further scope of research.
important notions relevant to the project; that of memory [relating to monitoring and data collection] and that of interacting with a virtual environment [which relates to the user interaction with the system] A. Kinect/ Grasshopper Interface The Grasshopper plugin firefly is used to connect the kinect sensor to the Rhino 3d modelling interface. Firefly provides, through a set of software tools, a bridge between Grasshopper, the Arduino microcontroller and other input/output devices. It allows real-time data flow between the digital and physical worlds and in this way it enables the possibility to explore virtual and physical prototypes with unprecedented fluidity. [14] This is an important operation as it allows physical data [in this case user position in 3d space] to be used in the virtual environment; an interaction which this project is based on. Grasshoper reads the physical position and actions of the user [Fig.3].
Fig.1 Microsoft Kinect Sensor for the Xbox 360
II. Microsoft Kinect Sensor The Kinect sensor is composed of advanced sensing hardware; a depth sensor; a color camera; and four-microphone array that provide full-body 3D motion capture, facial recognition, and voice recognition capabilities [12]. This on range camera technology is developed by Israeli developer PrimeSense, and is the reason that the sensor tracks movement of objects and individuals in three dimensions.[13] The capabilities of the Kinect sensor to track a human skeleton is of interest to this report. In skeletal tracking the body is represented as a series of 20 joints and 19 limbs. each joint represents a body part such as the neck, shoulder etc, and each limb represents legs, arms etc. [Fig.2] head shoulder centre right shoulder right elbow
centre hip
right wrist right hand right hip right knee right ankle right foot
left shoulder
spine
left elbow
Fig.3 Skeleton Tracker reading position and gestures of user left wrist left hand left hip
It does this through a skeletal tracking component, found in the firefly drop down menu. The important thing to notice about this component is that each joint and limb of the tracked skeleton maintains a defined item number. This is important for later operations when we want to switch between different parts of the body. By knowing the item number of each joint and limb, we can split the body parts to separate components. This can be seen in Fig.4. This makes it easier in later operations of the project, as we can easily switch between the elements of the body and test their interaction with the virtual environment.
left knee left ankle left foot
Fig.2 Kinect skeleton tracker indicating joints with names
The 'Evolving Room' project utilizes the Kinetic skeletal tracking, offered by the Kinect sensor, in order to build up a relationship between a user and a virtual environment. As briefly introduced in the previous chapter, this report will be split into two operations, which also form the logic of the 'Evolving Room'. 1. Monitoring/ data collection system 2. User Interaction (override) with system Before this, however, a brief introduction to the setup of the Kinect Skeletal tracking will be made. A series of starting experiments will also be described and through these, two
Fig.4 Splitting the Kinect component into separate components 2
Fig.5 Muybridge exploration
Fig.6 Halo creation
+ 1.00
+0.00
Fig.7 Overriding halo
B. Muybridge and Halo explorations The series of explorations illustrated in Fig.5 - Fig.7 work on two levels. On the one level, they explore the human and his relation to his environment. On the second level, these explorations introduce the two main themes of this graduation project; namely memory and user-environment interaction. Fig.5 shows the introducing of memory to the tracking system of the skeleton. Every time the user moves, the system documents his last move. Every point and curve that constitutes the skeleton is therefore logged separately with every changing condition. In this way you can access any condition and positioning of the user even after he has completed his move. In this exploration, only a limited number of positions are logged [20 values] and so every time a new position is introduced the last one in the list of 20 disapears; hence the fading Muybridge effect. This exploration also begins to define the idea of inertia which is associated with human movement and with different activities; i.e. laying down, sitting, jumping and so on. This becomes particularly interesting when two users interact, as one begins to see the overlapping of inertia between the two. Fig.6 shows the introduction of a virtual environment which is directly linked to the user's tracked skeleton. The halo is defined originally as a spherical bubble around the user. A varying attraction is introduced between the points of the skeleton and those of the sphere and so the sphere is deformed to the shape of the halo which is shown in the image. By changing the attraction between the bubble and the defined points of the skeleton, the shape and definition of the halo will also change. What is important about this exploration is the introduction of a relationship between the points of the skeleton [the user in physical space], and a virtual array of points. To further explore this relationship, Fig. 8 presents the first explorations of manipulating the virtual points which define the bubble, by using direct gestures. Here, the attraction between the bubble and the points of the skeleton [described above] is changed, no longer in the Grasshopper interface, but by the actions of the user. This is done by introducing certain 3
conditions into the script. In this case , as the left hand of the user is raised a certain distance [x] above the ground, the attraction between the 'bubble' and the skeleton points is reduced. Hence as the hand is raised higher and higher, there is less attraction between the 'bubble' and the skeleton and so the halo grows; tending towards it's original spherical form. This exploration is important as it has introduced the logic of the system override; i.e. by introducing certain conditions into the script we can allow for the user to manipulate the virtual environment. III. The Importance of Skeleton Tracking in the Logic of the Evolving Room It is relevant at this stage to introduce the logic of the 'Evolving Room'. Fig.9 shows a breakdown of the Grasshopper script which defines the logic of the system. The system begins with the tracking and definition of the user. This occurs in part 1 of the diagram, and has been reflected on in chapter II of this report. A. Monitor/ Data Collection (Part 2 of System Diagram) Once the system has identified and defined the user, it begins to collect and store data from his patterns of usage. Although this derives from the Muybridge exploration described in the previous chapter, more refined techniques for capturing and storing user movements were developed using the Kinect. These can be seen in Fig. 8. Two components make up this method; the halo [seen in the first image] and two perpendicular axis which form the [x] and [y]
Fig.8 More advanced monitoring of human movement
monitoring of user movement
2 Human body tracking
default transformation
System override
1
3
Space usage calculator
Fig.9 Diagram of the 'Evolving Room' system broken down to its key components
planes of the spinal curve of the skeleton [seen in second image]. When the halo intersects the planes, an intersection curve is formed and stored. Thus the system is now tracking the profiles of user movement in two directions. This is much lighter than using the original halo, and provides very precise tracking at the same time. An example of the kind of data which is stored can be seen in Fig. 10. Here you see three different instances for the activity of sitting, as it would be stored in the system.
Fig.10 Data curves for three different kind of sitting profiles 4
B. Default Transformation As the system continuously collects data, it learns from the projection and inertia of the users movements. In this way the system develops the ability to predict human movement, based on these parameters. As the user therefore enters the space, his projection and intent is inferred through Kinect tracking. The system will then produce a default transformation to respond to the user. This is the first instance that the user begins to
Fig.11 Passive transformation of the 'Evolving Room', based on the default data set
interact with this environment; in a passive way, since the system output is based directly on the data curves it has learned from over time. Fig. 11 illustrates this passive transformation of the 'rooms' data points in accordance to the default data set.
Once a passive transformation has been given by the system, the user has the opportunity through gestures and the kinect, to override the system output. It is at this point that the halo explorations, described in the start of this report, become important. By introducing a number of conditions, a palette of
gesture could be developed, with which the user can manipulate the passive transformation. Here there is an active relation between the user and his environment. Fig. 12 - Fig. 16 shows a series of explorations on system overriding gestures. While these are in no way complete, they begin to expose the different operations that could be introduced into the logic of the gesture recognition. By simply raising the right hand above a distance [x] [i.e. the height of the user], Fig. 12 shows how a user can cancel the transformation. Fig.14 shows a transformation which is based on the distance of the left hand to the data points; the closer the hand, the larger the opening in the 'skin' of the system. This kind of logic could also be customizable so that each user inputs his
Fig.12 - To cancel transformation: raise right hand above head
Fig.13 Manipulating points: attraction with distance
Fig.14 Varying opening: based on distance of hand from points
Fig.15 many openings: based on hand direction and distance
C. System Override (Part 3 of System Diagram)
own set of overriding principles. What is important in this step of the 'Evolving Room' logic is to recognise the feedback loop that occurs the moment the user overrides the system output. This active manipulation of the system, generates a new set of curves and conditions, which feed back into the default data set. In this way the system will 'evolve' with the users evolving behaviour. Fig.16 Changing transformation: scroll down menu of different activities popping up based on height of hand from ground plane 5
IV. Conclusion and Future Work In summary this report has presented the system logic of ´The Evolving Room´ and the use of Kinect skeletal tracker in achieving a relationship between a user and his environment. It began by setting out the Grasshopper - Kinect relationship. It went further in describing a series of explorations that were made with the Kinect sensor. These focused on exploring the user and his relationship to his environment. Two important notions, which form the basis of the project, were raised during these explorations. One is the idea of system memory; and the other is the idea of overriding the system. These two notions were then explaied within the logic of the Evolving Room. In each, the use of the Kinect sensor was elaborated on. In the first notion, the Kinect skeletal tracker offered a way to track and log data curves which came from user actions. In the second, the Kinect offered, through the use of gesture recognition, an interface through which the user could interact and actively change the system; thus generating a feedback into the sytem logic of the 'Evolving Room'. Two important observations have to be introduced in reference to potential future work on the project. The first is the limitation of a project that remains in the virtual. As the aim of the project is to explore the relationship between the user and a transforming physical environment, it is important to understand how this data driven virtual transformation will manifest in reality. For example the addition of materiality and material properties, will already alter the way the virtual transformation is occuring, and will thus also change, the relationship between the user and the environment, which is implied in this virtual exploration. The second observation is with regard to the realisation that such a coupled relationship between user and environment will inevitably have implications on the behaviour of the user. This is something Holger Schnadelbach has highlighted in his laboratory research with the ExoBuilding [Fig. 17] at the Mixed Reality Lab in Nottingham University. The project which involves the coupling between the users heartbeat and a moving structure, indicates significant behavioural changes by the users who adjust their breathing because of the immediate feedback the system gives them. In a similar way, this type of behavioural changes in 'The Evolving Room' project would be fascinating, and could become a goal for the final product of the project. This feedback does not have to be done though a physical transformation [although that would be ideal]; it could also arise from coupling users to a virtual model.
Fig.17 Exobuilding, Mixed Reality Lab, University of Nottingham
References [1] Moran, Stuart; Jaeger, Nils; Schnadelbach, Holger; Glover, Kevin, "Using adaptive architecture to probe attitudes towards ubiquitous monitoring," Technology and Society (ISTAS), 2013 IEEE International Symposium on , vol., no., pp.42,50, 27-29 June 2013, pg 1 [2] A. Albrechtslund, “House 2.0: Towards an Ethics for 6
Surveillance in Intelligent Living and Working Environments ,” Seventh International Conference of Computer Ethics: Philosophical Enquiry. San Diego, USA, 2007. [3] S. Moran, “User Perceptions of System Attributes in Ubiquitous Monitoring: Toward a Model of Behavioural Intention,” University of Reading, Reading, 2011. [4] Moran, Stuart; Jaeger, Nils; Schnadelbach, Holger; Glover, Kevin, "Using adaptive architecture to probe attitudes towards ubiquitous monitoring," Technology and Society (ISTAS), 2013 IEEE International Symposium on , vol., no., pp.42,50, 27-29 June 2013, pg 1 [5] K. Van Laerhoven, P. Benny, W. Jason, T. Surapa, K. Rachel, K. Simon, G. Hans-Werner, S. Morris, W. Oliver, N. Phil, P. Nick, D. Ara, C. T. And, and G.-Z. Yang, “Medical Healthcare Monitoring with Wearable and Implantable Sensors,” 2005 [6] T. Tibúrcio and E. F. Finch, “The impact of an intelligent classroom on pupils’ interactive behaviour,” Facilities, vol. 23, no. 5/6, pp. 262–278, 2005, [7] A. Albrechtslund, “The postmodern panopticon: Surveillance and privacy in the age of ubiquitous computing,” Proceedings of CEPE 2005, Sixth international conference of computer ethics: Philosophical enquiry. Enschede, Netherlands, 2005. [8] Moran, Stuart; Jaeger, Nils; Schnadelbach, Holger; Glover, Kevin, "Using adaptive architecture to probe attitudes towards ubiquitous monitoring," Technology and Society (ISTAS), 2013 IEEE International Symposium on , vol., no., pp.42,50, 27-29 June 2013, pg 1
[9] Ibid. pg 2 [10] I bid. pg 3 [11] I bid. pg 3
[12] Zhengyou Zhang, "Microsoft Kinect Sensor and Its Effect," IEEE Multimedia, vol. 19, no. 2, pp. 4-10, April-June 2012, doi:10.1109/MMUL.2012.24 [13] http://www2.technologyreview.com/tr50/primesense/ [14] http://www2.technologyreview.com/tr50/primesense/