22 minute read

Beyond Music: Sound Recording for VR

Have you ever thought about working in audio for games or virtual reality? Did you know that movie trailers are not made by the same people who make the movies? Sound Recordist and Designer Stephan Schütze provides some background to how things are done for these fascinating aspects of the audio industry.

Advertisement

Tutorial: Stephan Schütze

The process of laying out tracks and building up sonic worlds allows for lots of creativity, but the underlying consideration is communicating with the audience about things seen and unseen.

In over 20 years of working as a location recordist and sound designer, I have recorded and produced audio for interactive media, sound libraries, movie trailers, film, virtual reality and augmented reality. Yes, that is as fun as it sounds! In doing so, I’ve discovered that each of these formats has different needs and requires a different approach to creating the content. In this article, I’m going to share what I’ve learnt.

CAPTURE The most important thing when recording is to capture a great sound; we have to hunt down those rare and interesting sound opportunities like a photographer chasing the perfect sunset. Location recording is much more than just pointing a microphone at something noisy – it is about discovering a world so rich with incredible sound sources that we will never run out of things to record.

The source sound is everything, and whatever can capture that sound becomes the best tool at the time. I’ve had people laugh when they learn that I sometimes use a Zoom H1, and I chuckle back knowing that my H1 recordings are in the official Captain Marvel movie trailer. Remember, Academy Award-winning Hollywood sound designer Ben Burtt (Star Wars, Indiana Jones, etc.) did some of his best work with analogue tape and razor blades...

The core consideration is to start with clean usable sounds, and I’ve always strived to achieve this while recording. Our recording skills are critical to what we do, but the software that supports us frees our creativity. Advances in technology, such as iZotope’s RX, mean that I can record kookaburras on my porch and not worry too much about a bit of wind noise in the trees because I can remove it later.

Once we have a selection of amazing raw material for a particular project, we need to prepare it for the platform we’re working on. Each platform has its own requirements, as I’ll describe below.

GAMES Games are interactive, making them quite different to other forms of media. Interactive formats are constructed to behave much like the real world does, so there can never be a single ‘final’ mix for a game; the mix is created automatically by the game’s ‘sound engine’, in real time, in response to the player’s actions. That means you can’t just drop in a convenient stereo ambience of a forest, for example. The player is likely to be walking through that forest and can choose their own path, and that will determine the sounds that are heard and how they are mixed together. In the forest example, we need to have individual bird sounds placed in the trees that the player moves between. The stereo or surround impression the player hears is from multiple mono sounds positioned in a 3D world. So when the player walks between two trees the bird sounds will be heard on either side, and these sounds will grow louder as the player approaches them and softer as the player moves away from them. All of that real-time manipulation is done by the game’s sound engine. The audio team can establish guidelines for how loud each individual sound or group of sounds is, how they blend together and so on, but it is the actions of the player and their specific position at any one time that defines the mix they will hear. This makes balancing game audio hugely challenging because you cannot blend individual sound elements to cover gaps or weak transitions. Most of the individual sound elements are exposed to the player, and the player’s journey through the game world determines how they are blended together.

Obviously, all sounds used within a game need to be clean and very isolated. Going back to my

forest example, I’ll need separate bird sounds, a separate wind sound, a separate stream sound, and separate versions of any other sound elements; they all need to be highly isolated, and placed individually. When I prepare bird sounds to use in a game, for example, I need to cut them up into individual calls and place them into the desired place within the game’s 3D world. A common tool for this purpose is Wwise, produced by Canadian company Audiokinetic. Wwise allows me to take a selection of individual bird sounds and define their behaviour so that I can create a volumetric area in which birds will twitter randomly in real time. I essentially simulate real-world bird behaviour inside the game world.

VIRTUAL & AUGMENTED REALITY The New Reality formats are hugely different to every other platform; even games, which they share some similarities with. The key reason for this is the desire to create a realistic spherical sound field for the audience. Remember, we are placing our audience inside the virtual world; unlike a game, they are no longer just looking into that world through a screen, they are immersed within it.

The various technologies we use to achieve this attempt to simulate how we localise sound in the real world. They simulate the interaural amplitude differences, the interaural time and phase differences, and the filtering of sounds reaching us from different directions, all to create a realistic immersive sound experience. The technology is still evolving, but that’s what it aims to do. There are many companies trying to create the perfect technology solution for this functionality. Companies such as Two Big Ears were doing an excellent job of this, and that is why they were bought by Facebook – their technology now forms the basis of the Oculus’ audio system. At this point in time we do not fully understand the science of how humans triangulate sound sources, so it is no wonder that the different technology solutions vary in effectiveness.

The important thing about all of the above is that the amplitude and frequency makeup of a sound are critical aspects of how our brains calculate where a sound is coming from. This means we cannot just boost the midrange to give a sound more presence or add a sub channel for added impact, because any enhancement like this could interfere with the directional nature of spatial positioning. Essentially, if we want a sound to be located 45 degrees to the audience’s right and 15 degrees above the horizon, any alterations we make to that sound may influence the perception of where it originates from. Elements that might add excitement to a project, such as enhanced low frequency content, need to be carefully designed to highlight the audio without working against the illusion of realism.

As with game audio, I start with clean and well-isolated mono files and place them into the simulated ‘world’, allowing the technology to handle distance attenuation and occlusion filtering. Then I layer in combinations of stereo, 3D, ambisonic and even binaural sound material to achieve the end result. For example, VR Regatta, the VR sailing game for Oculus Rift, uses an array of over a dozen layers that play simultaneously in real time – just to create the spherical wind sounds! The layers are all interacting in real time within the spherical sound field; in many ways, it is more like a live performance than a pre-produced product.

Much of what makes this content sound great is achieved during implementation. Virtual Reality (VR) and Augmented Reality (AR) are both going to be incredible formats in the future, and I honestly think no one has gotten close to realising their potential yet. Audio production for these formats is difficult and challenging, but it is also incredibly satisfying as we are literally creating a whole new form of media and discovering what can be effective. Good quality isolated audio is critical to the immersive effectiveness of these formats. If you plan on working in the VR, AR or 360 video space, I highly recommend you check out some of the discussion groups on Facebook that talk about the processes. The technology is changing so quickly that it can be a huge benefit to keep in touch with others working in this field, and

seek their advice when necessary. We are all fairly new to these formats, and sharing the knowledge helps everyone.

TRAILERS The approach to creating great audio for movie trailers is very different to games, VR, AR, and, surprisingly, film. It is odd and a little counterintuitive to discover that the trailer for a film will have a very different sound production process than the film it is promoting. For a start, it’s a completely different team that works on the trailers. Movie trailers, especially in Hollywood, are big business – there are studios dedicated entirely to producing high-impact trailers! And that is the core of it: it’s a specialised process because film trailers have 30 to 60 seconds to grab the audience’s attention and get them excited and engaged. While the visuals and dialogue for a trailer are cuts from the film, the sound and music are created specifically for the trailer.

Sound Effects (SFX) for trailers are super hyped. If the dial goes up to 10, then trailer SFX sit somewhere around 12 or 13! There is certainly an

element of the loudness wars here, but it is more nuanced than just turning up the volume and compressing the hell out of everything. The sounds themselves need to simultaneously achieve two things: they must be punchy and really cut through, but they also need to stay well out of the way of the trailer’s voiceover and music. It takes many, many layers of sounds to achieve the end result, and the editors do an incredible job of blending all this together for the audience. As someone who has been recording and using my own raw material for 20 years, I know my content really well. Despite this, there are a handful of my sounds in the trailer for the latest Fast & Furious movie (‘Hobbs & Shaw’) that I cannot even recognise because of how densely the sound has been layered and mixed to get that high impact end result.

When creating trailer SFX there is a significant emphasis on mid-range content, which is often boosted to what would normally be considered silly levels. The trailer music usually occupies much of the frequency range we want to use; it dominates the very low and very high end of the spectrum, forcing us to tailor the SFX to have maximum impact inside of those two extremes. When preparing the sounds, I have a template where I layer my own sounds and do a lot of work to boost that mid-range frequency content. I find this tricky because the easiest way to make a sound cut through a mix is to add or boost high-frequency content so the crispness of the sound carries it through. Without being able to rely on the lovely high frequencies, I need to paint with a broad brush across the middle of the frequency spectrum. Compression is essential, but I do a lot of that work manually. If I am layering eight to 10 sound files I will build a custom volume curve for each sound, and tune each one so that I hear the exact elements I want to hear at the exact time I want to hear them.

I mentioned earlier that game sound consists of many different layers that are all automatically mixed together by the game engine to provide a real-time experience. Film trailers are the polar opposite: the layers are not just hammered together, they are surgically grafted with each other to allow each element to do its best work and then move out of the way for the next element.

The challenges are very different for trailers, but the intensity of the audio content you are working with really creates excitement. Working on trailers is fun but very challenging, and, of course, it’s always cool to hear your content on the big screen! Just be careful of your ears because that high intensity can be really fatiguing.

FILM I am not going to include television as I have not worked for TV beyond commercials, and my film experience is not as significant as my game experience. Nonetheless, there are some fundamental differences between working with sound for linear media such as film, and for nonlinear media such as games, and these differences influence my approach to each platform.

The main thing for me with film is to achieve and maintain sonic consistency over time. One of the key aspects of film production I have encountered is the room tone/atmos and ambience setup. It’s a little like the audio equivalent of colour grading, where you have to take all of these individual shots, from different angles and with different lighting, and try to create a consistent lighting look across all the edits. It’s the same with the room tones and atmos. You need to remove all the background content from the dialogue tracks, as it is often really different and distracting to the audience, and replace it with a nice consistent tone across each scene. It’s like applying a smooth undercoat before you start painting.

With film, I tend to build up sounds from the rear to the front. I start with long, smooth, clean edits and transitions, and use spot sound effects to provide support for actions and events. Again, all the content I am editing and preparing needs to be super clean because, unlike nonlinear media that is mixed in real time, film has no limits to the number of sounds you can combine simultaneously. Layers and layers can be combined, but every single layer has the potential to contribute to the overall level of unwanted noise. Combining many layers risks becoming an utter mess of unwanted noise if the sounds are not super clean to begin with.

Unlike non-linear media, you have much more

control in film because you can set a scene exactly how you want it to sound and it will play back the same every time. After years of working in games, this is very different; you can push things to their limits and know they will never exceed those limits. This is why films can be dynamically more intense than games. The interactive and immersive nature of games is where they get their impact, not from a super tight mix.

SEEN & UNSEEN The process of laying out tracks and building up sonic worlds allows for lots of creativity, but the underlying consideration is communicating with the audience about things seen and unseen. Audio is our principal emotional sense, and the soundtrack for any media needs to enhance and support the narrative of the dialogue and music while also transporting the audience into the worlds we create.

Each delivery format has its own challenges, strengths and weaknesses. Understanding the best way to approach each format allows us to enjoy the process of creating an audio world, rather than fighting with the content to make it fit. Never be afraid to try new things or crazy ideas. So much of the best creative work comes from folk who think and act outside the box.

We all work in sound because we could not imagine ourselves doing anything else! As someone who records raw material for sound libraries and also uses that material in sound production, I’m experienced with the entire workflow – from capture of raw sounds to delivery of finished product. This provides me with a useful perspective, and I honestly think it makes me better at both jobs. When recording raw sounds on location, I consider all the ways I might want to manipulate those sounds as part of production. I will often record sounds that are very ordinary in their raw format, but I know are going to be a great basis for sound design. Likewise, when doing sound design I am aware of the limitations of recording on location and just how hard it can be to capture really clean content. I am going to explore these concepts in more detail in future issues of AudioTechnology. Stay tuned!

Mark Woods goes backstage to see how Sennheiser came to the rescue for a show that involves dialogue, singing, dancing, gymnastics and a live band.

Story: Mark Woods

Theatrical productions are tough audio.

Mixing bands has its own challenges, but at least the vocalists sing into the mics. I had cause to reflect on this while visiting Sydney’s State Theatre to see the non-stop whirl of songs, dance and gymnastics known as Bring It On: The Musical. Based on the loyalties and rivalries of competing cheerleaders, the show is focused on the performers and the choreography. The audio production includes 22 wireless headset channels and a live band. Sound designer Greg Ginger from Outlook Communications gave me a look around.

HIDING THE BAND The show needs all the floor space it can get so the stage at the State Theatre was extended forward 4m, covering the orchestra pit. The six-piece band was set up downstairs and backstage in the Green Room, leaving it looking like a combination of a Green Room, a rehearsal studio and a live recording room. It took a bit of soundchecking to get levels comfortable in the room, with everyone so up close and personal. The guitar amp ended up in a road case, and I think Greg Ginger wanted to put the drum kit in one too but couldn’t find a case big enough, so he dampened the skins with some cotton wool as a compromise. The benefits of the band being hidden away and removed from the stage were studio-like separation from the stage sound, and complete control in the FOH. The band had a conductor, and followed the show via headphone monitoring and a screen.

The mix required 24 channels to cover the band, with lots of Sennheiser e614 small-diaphragm condensers to be seen. The 24 channels were sent to a Digico SD5 FOH console, mixed in with the on-stage performers’ headset mics and some manually-triggered FX from QLab, then processed through a BSS Soundweb BLU-100 and fed to the house speaker system. The performers on stage heard themselves and the band through four little Funktion-One F55 dual five-inch speakers [[08]] attached to the front lighting truss, with another four on the middle truss. Positioned about 6m above the stage — just out of reach of flying cheerleaders — they weren’t close or loud, but supplied an even spread of sound that was just enough for the performers to perform happily.

ON-STAGE PERFORMERS The person who invents invisible wireless microphones is going to do very well, but until then it’s omni-directional headset mics and bodypack transmitters for theatrical performers. The average rock opera might get away with headset mics somewhere down the sides of the performers’ faces; it’s not an ideal place to pick up vocals in itself, but it’s a luxury compared to Bring It On where the mics are hidden away on the performers’ hairlines or in wigs.

The mics aren’t the only things that need to be hidden. With most of the cast in cheerleader outfits, hiding the wireless bodypack transmitters and keeping them in place as the performers jumped around could be an even bigger problem. Regular bodypacks (typically powered by a couple

With most of the cast in cheerleader outfits, hiding the wireless bodypack transmitters and keeping them in place as the performers jumped around could be an even bigger problem.

Green Room (& below) converted into performance space. Green room converted into performance space.

of AA batteries) would have been awkward under the tight costumes, and, with 22 wireless channels and a couple of spares, would have required a carton of fresh AAs for every show.

Sennheiser’s Digital 6000 system came to the rescue. Their new SK6212 bodypack transmitter, released earlier this year, is so small that it’ll change expectations for these devices. It feels strong, the replaceable antenna is thin and flexible, and I like the look and feel of the rounded corners and sides — it shouldn’t break or dig in if a performer falls over and lands on it. An additional inner seal helps to repel moisture, and Greg noted its effectiveness in this sweatier than average application. Bring It On: The Musical is a fairly simple but boisterous and dynamic show. The audio production had to work within the constraints

Sennheiser e614 small-diaphragm condenser microphones were used on nearly everything for the band. They had no problem handling the peaky cymbals/percussion, and their tight supercardioid pickup pattern helped with separation in the crowded green room. imposed by the physicality of the show, and relied heavily on the quality, stability and manageability of the large number of wireless channels. The singers on stage were the stars of the show and, apart from my concerns about mics in wigs and the lack of opportunity for any sort of mic technique, it worked remarkably well at showtime. The cast were experienced musical theatre performers with strong voices and good projection. The 2000-person capacity State Theatre has excellent acoustics and the PA was wellcontrolled in the highs and high-mids where it might feed back, so despite the inherently risky combination of a loud show, 22 omnis and a PA, there was no feedback or instability. The band sounded great, they were mixed to suit the vocals and I didn’t miss seeing them.

Funktion-One F55s mounted on the overhead trusses provided on-stage monitoring while remaining clear of flying cheerleaders

Sennheiser’s SK6212 digital bodypack transmitter

With 22 wireless mics on stage, plus spares, it pays to be organised.

Sennheiser’s EM6000 two-channel receiver and LM6000 charging module.

SENNHEISER’S 6000 SERIES The 6000 series as used in Bring It On consisted tons to access the menu options. The screen turns of three components: the SK6212 digital bodypack off automatically after a few seconds of inactivity, transmitters, the EM6000 digital receivers, and but pressing any key brings it back with a display the LM6000 charging module. of the operating frequency (or a user name), the Matchbox-sized at 63 x 47 x 20mm, the SK6212 battery level and the time of day; a nice touch. bodypack is about half the size we’re used to. Keep pressing and you get an audio level meter. Weight is also about half, at 112g total with the The operating frequency can be changed from BA62 lithium-polymer rechargeable battery on within the bodypack, and the power/mute LED board. LiPo batteries are expensive but light, and light can be turned off if it’s distracting. the BA62s allow an astonishing 12 hours of operThe EM6000 is a two-channel 1U digital receiver ation between charges. They can be removed and that presents itself as a premium product with recharged in the Digital 6000 rackmount charger solid construction and a quality feel to the convia the dedicated LM6062 charging module. trols. Operating over a wide 470MHz to 714MHz The long battery life made everything easier for switching range, the Digital 6000 system is free Bring It On, especially on days with two shows. from intermodulation between channels so it can Greg showed me one unit that had been on for use a simple equidistant frequency grid with a four hours and still indicated a full battery. LiPo minimum 400kHz spacing for maximum channel batteries do wear out eventually, but Sennheiser density and stability. The sharp OLED screen’s says users can expect well over 500 duty cycles default display for each channel contains everybefore replacement. thing you need at a glance, including a Link QualThe OLED screen on the front of the transmitity Indicator (LQI) meter, with more details and ter is small but clear. Above the screen sits a editing available by turning the control knob to single sunken Power/Mute button that requires access the menus. As soon as you stray from the determination and fingernails to activate; a short default display the Esc button lights up, offering press gets Mute, while pressing for a few seconds a quick way back. turns the unit off. It turns back on almost instantly. Around the back are the expected XLR and Below the screen are three slightly recessed but6.35mm analogue audio outputs, joined by digital AES-3 and RJ-45 Dante outputs (optional card). External sync is possible via I/O BNC connectors and the RF antennas have BNC output splits for daisy-chaining up to eight EM6000s. Greg made good use of these options by sending the Dante output to FOH, with the analogue output used for backup, and the headphone output used for switching monitoring. The Digital 6000 series uses the same Sennheiser Digital Audio Codec (SeDAC) found in the top-spec Digital 9000 Series, and Greg found the audio quality to be outstanding (particularly the D/A conversion) with a wide frequency response and dynamic range. Latency is quoted at 3ms for all output formats and was not noticeable in use, nor was any of the error correction or audio error masking that could be happening in the background. The sound design for Bring It On required a wireless microphone system that was sonically accurate, flat-battery-proof and discrete. Just the brief for the Sennheiser Digital 6000. Priced for professional applications, it’s been embraced by production and hire companies with good reason. Sennheiser is the long-standing market leader in the wireless segment and the Digital 6000 Series is as good as you can get.

This article is from: