Sound/Social by Orrin Ward

Contents

System Requirements

Contents Submission Information

The Patch PC or Mac version have the same software and hardware requirements, but a different patch is needed to run run on each, due to different video input formats.

What is Sound/Social? Where is it based?

How does it work?

4/5

The Sound/Social Patch

Common Requirements for both Mac and PC Max MSP 5.0 or higher, 2.4Ghz Dual Core Processor or higher 2GB RAM or higher. (preferably 4GB on 64-bit OS) Sound Card with 4 output channels recognised by Max MSP cv.jit objects, available from http://www.iamas.ac.jp/~jovan02/cv/download.html for both Mac and PC Internal Webcams must be disabled/video feed prioritised.

User Manual

Development of the concept

Future Applications

System Requirements Online content Acknowledgements

PC Requirements KWorld DVD Maker device or similar, for taking the live video feed. Mac Requirements Elgato EyeTV Video Tuner The patch has been tested on both Mac and PC running these configurations and it handled the sound output reasonably well on both. Mac has not been tested with live video feed, but the Elgato EyeTV I understand to work with Max MSP in the same way the KWorld device does for PC. Website The website and blog are viewable on any modern web browser (Firefox, Chrome, Safari) with Flash Plug-in for video content. The website can be found at: http://www.orrinward.co.uk/soundsocial The development blog can be found at: http://orrinward.wordpress.com/category/final-year/prid302-final-project/

What is Sound/Social? Sound/Social is an interfaceless collaborative music creation environment. Using motion tracking and software that delocalises and redistributes sound within a space, users are encouraged to explore the environment in new ways.

Architecture I personally find Portland Square to be the second most interesting large building in the University of Plymouth, next to the Roland levinsky Building.

When someone walks into the space they activate a simple sound loop that follows them around where they walk. As people move closer to each other they start to hear their sound loops interact and play with each other.

The 3 large atria carry sound very well. Each of them is very open for movement and they are all key entrance/exit points of the building, so during testing phases it was easy to find volunteers. Because there is no interface or actively chosen participation, I could calibrate Sound/Social’s motion tracking and sound distribution simply by recording people moving through the space, rather than having to laboriously try and get people to come and be my guinea pigs.

As a user walks through the environment, their sound loops stays relatively consistent to them, but what they hear from other peoples sounds changes, creating a unique musical composition for each user in the space, all from a set of simple repeated sound loops.

Where is it based? Sound/Social is currently only available as a prototype that is entirely tailored for the ground floor level of any of the 3 Atria in Portland Square Building in Plymouth. My choice of Portland Square was made for a combination of technical, architectural and social reasons: Technical Portland Square is where i-DAT’s Arch-OS system is installed. The Arch-OS system is equipped with an Intelix Mixer, which controls all the speakers in open spaces in the building. I have counted 36 myself, but I believe there are more in the lecture theatres. Arch-OS also has it’s own motion tracking system that can utilise any 1 of 4 cameras connected to it. The motion tracking in place currently is not Max MSP friendly, but there are technicians who are capable of explaining the video feed system to me. I have found it very easy to incorporate my own motion tracking system with their existing video feeds.

The Roland Levinsky Building would be a more ideal location as the ground floor is even more open plan, but it does not have the required sound and vision hardware for it to be viable without funding. Social I find the Atria to be very interesting with the social interactiosn that occur in the space. I didn’t realise this before experimenting with it, but on observing motion in the environments it is quite peculiar how these vast open spaces are very much disused. With the exception of the Café in Atrium A, most people just walk directly through them to their destination. the spaces are large enough for people to actively take large detours in the space to avoid any crossing of personal space with others. One of the aims of Sound Social is to break down these invisible barriers and actively encourage the particpants, both willing and unexpected, to explore and deconstruct these barriers through sound interaction. As each sound follows a person, it becomes a temporary extension of their personality. Due to its temporary nature, it is a personality trait that the users do not feel shy about. As users approach each other they don’t hold shyness about their newly found character, so they openly explore it with strangers, without the usual boundaries between people.

How does does it work? There are various pieces of hardware that communicate with each other via a piece of software I created in Max MSP. Video Analysis Aerial view cameras are set up inside the portland Square Building. My software takes a live feed of one of the atria. Using modified versions of the open source cv.jit patches the software processes the feed to eliminate static objects and flickers in the feed. Though it isn’t shown in the patch, it converts the leftover moving objects into a binary black/white matrix. The cv.jit.blobs.centroids object then reprocesses this feed and assigns ‘blobs’ to each individual moving object. I have calibrated the feed so that it takes in a specific sized feed with certain thresholds applied so that the majority of the time it assigns a blob to each person. It is calibrated to assign only one blob, where originally it would occasionally assign a blob to each limb. This blob has an X and a Y value, as well as a size. The values that are important are the X and Y. The size of the object is checked against a threshold to eliminate any small flickers that get picked up as a blob. These blobs are then given unique IDs and separated to their own 1 object matrices using the cv.jit.label object. The X and Y values are then scaled to the correct size I have allocated to a speaker array. My patch outputs a maximum of 8 individual sound sin the environment, so I have created 8 instances of the cv.jit.label function. In each of these instances any blob can be selected. usually ID’s 1-8 are the people, but occasionally there may be a consistent flicker, in which case the user simply ignores that blob and uses the next in line instead. Due to varying lighting conditions and a temperamental video feed, each time the application is run it needs to be calibrated. The X and Y value of each of the blobs are then sent to the sound distribution side of the patch, where individual sounds are assigned to their locations.

Sound Calculation To give the feel of sounds following each user through the space without actually assigning a moving speaker to follow them, the distances of each speaker from the user is used to adjust the volume of their sound on each of those speakers. If a user is close to the rear right speaker, then the sound level coming from that speaker will be high. Depending on the digital scale of the space (which I will explain later), the volume at the other speakers will be lower, or on larger digital scales, muted. Upon opening the patch each of the speakers is manually assigned an X and Y location that matches up to their relative position from the video feed. If a user is standing directly underneath a speaker, the speaker’s co-ordinates should match that of the user. This makes it so that when a user walks to a speaker their position is calibrated perfectly, giving a stronger impression that the sound is following them. The impression of sounds following the user is created with very simple trigonometry. Using the co-ordinates of both the speaker and a user I find the distance between them using Pythagoras method for calulating a the hypoteneuse of a right angled triangle; A²+B²=C². In this case; (User X co-ordinate - Speaker X co-ordinate)² + (User Y co-ordinate - Speaker Y co-ordinate)² = Distance². The subpatch ‘Hypoteneuse’ calculates this constantly for each object, giving variable volume output for each sound on each speaker, making the sound assigned to the person pan in 2-dimensions with the user, giving the impression of a ghost sound. Sound Distribution The volumes are calculated independently of the sounds themselves, so there is a separate section that plays back the sound loops. The person who starts the application can load in any wav file they like and assign them to one of eight objects (relating to the motion tracking). The sound loaded into ‘1’ is most frequently played as only one person needs to be in the space to activate it. The least commonly played is ‘8’. The sounds are packed up per speaker using a dac~ object and sent out through the Intelix Mixer to the speakers in the space, roughly half a second later than the tracking data is received. this is due to the vast lengths of cable and processing limitations of my laptop.

The Sound/Social Patch The messy non-presentation view This is a spliced screenshot of my the top level of my Max MSP patch. The left side is predominantly to do with the video input, and the right hand side is all about the sound output. The vast majority of things that need user control are on the left, with the exception of setting the speaker locations (top far right) and the sound play settings (centre middle, above the blocks). I strongly advise presentation view.

The Sound/Social Patch

Presentation View (default) This is what the user sees when they open the patch. I have opted to set it to open in Presentation view as the layout is a lot easier to understand and far more logical for use. The full view of the patch is logical for the way the data flows, but not the way it is interacted with. The version shown is the final version, limited to 4 sounds rather than 8.

Video Feed (Teal) The video feed controls allow the user to select either a live feed or a prerecorded video (sample prerecorded videos of people walking through Portland Square are included in the submission). The box in the middle plays back the video, with an overlay of all the blobs being calculated (denoted by a red cross and a green circle).

It is divided up into 5 logical sections; Sound Controller, Video feed settings, Speaker locations, Motion tracking and finally, Blob selections.

Speaker Locations (Green) The Speaker locations box allows the user to calibrate the environment to where the speakers show up on the camera feed. The default value is set for Atrium C. Other Atria need to be calibrated manually.

Sound Controller (Lilac) The sound controller section allows a user to select which tracks are played for each blob in the space. It also has global Pause, Resume and On/Off settings. The â&#x20AC;&#x2DC;dac~â&#x20AC;&#x2122; object, when double-clicked, opens the DSP Status window, which allows the user to configure the sound card and driver to allow for multichannel output.

Motion Tracking (Yellow) The motion tracking controller sorts out background separation and sorting out the calculated blobs. Blob selections (Red) The blob selection boxes allow a user to select the blobs that send data.

User Manual This section outlines how to set up both the hardware and software to use Sound/Social unaided. Recommended settings for calibration are set as default on loading the patch, but if manual calibration is needed, it has to be practised and experimented with, as conditions such as lighting, outer activity (people on video that are on higher floors) and starting conditions vary. Where set values can be incorrect, I will outline the desired result so a user can adjust the settings until they find something similar to what is required. I would highly recommend the System Requirements outlined on the last page of this document to be taken as a bare minimum. If the machine struggles with the 8 track version of the patch, the 4-track version is a lot less intensive. Preparing the computer Before doing anything else, make sure you have: - a multichannel sound card installed correctly. - an S-Video to USB device installed and working (Elgato EyeTV or KWorld DVD Maker are recommended for Mac and PC respectively. (not necessary if using prerecorded video) Sorting the Video Feed If using a live feed, plug the S-Video input from the correct Atrium in portland square into the S-Video to USB device. Open up the ‘Video test’ Max executable located in the ‘Sound Social Max patches’ folder with this documentation. Toggle the metro to on and press ‘open’. The camera feed should be showing in that video. If it isn’t then you need to configure your device properly. If it does show, close the app (CTRL+Q). Setting up the audio To set the audio up to be played through the speakers in Portland Square, you will need 4 AV/Cat5 Baluns (found in the i-DAT office) and the map of the speaker layouts. Speakers in Atrium B need to be fixed, but Atria A and C work fine as they are. You will also need audio cables that match your sound card at one end, and convert to coaxial audio.

The Patch Once you have the video feed and the audio connecting with the computer you can open up the Max patch, found in the ‘Sound Social Max patches’ folder. It should open up in presentation mode. Sound 1. If you are using Atrium C, move to step 2. If you need to calibrate the speakers, use the ‘Speaker Locations’ box in ‘unlocked’ mode to change the X and Y values to where they appear in the video. 2. Click on the first ‘open’ button in the Sound Controller section. Select one of the wav files included. Repeat this for each of the 4/8 buttons available. 3. Double-click on the dac~ object and adjust the settings in the DSP Status window to match your sound settings, ensuring that the drivers are correct and that it is outputting 4 separate channels. 4. Turn the Sound on using the toggle button if it isn’t on already. 5. Press play and don;t worry if you don’t hear sound. Video 6. Go to the Teal video settings box and either ‘open’ the camera, or ‘read’ a video file. 7. Click on the relevant toggle. Video should now play in the centre, with the live feed and overlay of blobs. Motion Tracking 8. Toggle the gate on and then off when the random blobs shown on the video to the left have disappeared. If there are blobs on people, this is fine. It usually needs about 5 seconds with the gate on. 9. The threshold should be fine as it is, but if there are lots of stray blobs (attached to things other than people), increase the threshold and see if they go away. Blob Selections 10. Use the dropdown menus on the blobs to select which blob they pick up. Try and pick blobs that are attached to people. Sound/Social should now be working! Adjust the master volume on your sound card/computer to get the volume at the desired level. A user in one corner should find it hard to hear a user standing in the opposite corner.

Development of the concept This section will outline how I arrived from an initial decision to create a vague multichannel system to the final realisation of an interfaceless social musical collaborative environment. The multitouch concept For the first half of the academic year I was planning on creating a multitouch interface that allowed a ‘composer of the space’ to work alongside DJs by taking control of the location of sounds in an environment in a more organic, fluid way. This idea is where I came up with my concept of creating delocalised sound using trigonometry to simulate sounds from anywhere in an environment by altering the volumes of the speakers as if the sound were coming from a single point. I actually came up with that idea in my second year, for a Human Computer Interaction project. I have also had a lot of support from people who seem to think this would be revolutionary if developed further, with real commercial possibilities. I have had interest in this concept from Inventors, Nightclub managers, boutique festival events organisers and also a high-end tasting events company. The reason I opted against carrying this through is that I wanted to create something that was explorative for the user, rather than creating a system that allowed a single person to control the environment. Social Environment Interaction Through my Production of Space module I gained an interest in the way people move and interact with their environment, when the environment is populated with strangers and no native point of interest, i.e a narrow street or a big and open, albeit dull, Atrium in portland Square. On my Sound Practice module (where I was shown Max MSP) some of my peers were working with Reactivision software, which is, I gather, essentially a multitouch platform without the touch part. It reads multiple gestural behaviours from various encoded signs and a camera. The coded signs themselves are interpretted as blobs with values attached, that a user controls.

I combined the idea of multiple controlling objects detected by webcam with my newly found interest in spatial exploration and interaction, and came up with the idea that became Sound/Social, where multiple objects in a space control the sound. Those objects in my case, are the people actually in the space. My multitouch concept may be more financially exploitable, but as an interaction model it is very linear.

I am still excited by the prospect of developing it, but I found the repeating feedback interaction model in this idea to be more exciting for a final year project.

In this system, the users collectively control the sound around them, which in turn informs and influences the users. A multitouch interface would be fantastic in a nightclub or a theatre, where people go to be entertained with audio. In a space such as Portland Square however, there isn’t an expecting audience, and Sound/Social interrupts the environment. Someone is far more likely to engage in a project that they have an element of control over. It would also be impractical and frowned upon to have a bunch of DJs playing throughout the day in Portland Square. My finalised Sound/Social concept is a lot more subtle, and I believe it is beautiful through it’s simplicity.

Future Applications I personally see Sound/Social in it’s current state as something I won’t develop an awful lot further. I think as a location-specific installation it works well. It certainly engages the people inside the environment and I am very pleased with the response I received when simply calibrating the sounds - People came down from their offices to see what was going on, and I ended up having two PHD students and the head of Biological Sciences wandering around Atrium A seeing what the could do with it. If I had the money to invest in better hardware I would love to refine it. There still plenty of bugs that could be ironed out with simple hardware upgrades: - My laptop burnt out 2GB of RAM when I had the application running 16 sounds. If I could get a more powerful, and better cooled, standalone PC dedicated to it I could easily adapt it to run many more sounds in a larger environment. - The video feeds all have flickers and poor image quality. If I had permission within the space and funding I would install fisheye cameras in the centre of each atria, which would give more consistent, and better calibrated results for the motion tracking. - A better sound card. I had to choose whether I create the patch so that it runs well on my own laptop, and buy a cheap USB Sound Card that is compatible with it, using the S-Video to USB devices that Guido Bugmann had installed, or design it for Mac, using a borrowed, far superior M-Audio Firewire 1814 and paying £75 for a video device I could never use again. Though the sound was adequate enough to demonstrate function, the quality was not brilliant. I would hope to expand the speakers from a system of 4, to 16 or perhaps 32 in a grid, in a larger space. As a whole finished product, I believe Sound Social is quite limited in expanding it to be something suitable for commercial use. I have had interest from 2 events companies saying that if I could invest money and time into it to get it to a transportable and more refined state that they would be interested. I simply don’t have the ability to do this by myself.

If I break down what I have created and reapproach the systems involved with Sound Social, however, I believe there are many different, intriguing possibilities. Non-conventional theatrical performance My girlfriend is a performance artist, and she recently performed at thetastingsessions.com’s Deliver Us From Innocence food tasting event, where the audience/participants were led through several scenarios in Shoreditch Town Hall, with a continuing narrative throughout. This would be an ideal situation to have the speakers set up with my sound-distribution system and an interface that allows a controller to move each segment of the narrative through the environment, following or leading the relevant groups of people. Rather than hearing the narrative come from all around the environment, positioning it as a moving object throughout the space would give a great sense of immersion, and believability that the narrator was with them, rather than being belted out over a tannoy. Multitouch Control model with a DJ I strayed from my initial idea with the intention of opening the project up to a more intriguing relationship between the participants and their environment. My initial idea of using the sound distribution with a multitouch interface remains the most exciting prospect with a real world leisure outcome. If a nightclub were to be kitted out with a network many speakers it would open up new possibilities with musical creation. An example could be the way that currently, when one song ends it typically fades into the next. With my system in place this could be done not just with the audio track but with the space. One track could bleed in from the edges of a crowd into the centre for a new track to begin. individual instruments or beats could be played in specific locations, adding multidimensionality to existing sound environments, and all at the hands of a composer of the space.

Submission Information