9 minute read
Metaverse and broadcast
Technology is constantly evolving and the metaverse is a concept that is already very present in our lives.
What is the metaverse?
It is still a concept and not yet fully developed, so beware of people that are too definitive about what it is.
The Metaverse – along with Web 3.0 – is one of the names that are often given to the next iteration of the Internet, which promises to further break down geographical barriers by providing new virtual degrees of freedom for work, play, travel, and social interaction. In practice, it’s an enhanced and virtualreality-based internet user interface – mostly 3D, mostly interactive and mostly social – that will replicate the inherent depth and intuitiveness of the real world, as opposed to the flat interfaces (think Instagram, Amazon.com,
Netflix, Cnn.com, Salesforce, Microsoft Office, etc.) that we use today.
Often mentioned in close connection with the term “eXtended Reality” (XR, combination of Virtual Reality and Augmented Reality) and with decentralized-infrastructure technologies such as blockchains and NonFungible Tokens (NFTs), the Metaverse promises to be the next step- change in the evolution of networked computing after the introduction of the World Wide Web back in the 1990s, when we went from text-based interfaces to “browsing” of hypertexts with multimedia components, and the introduction of Web 2.0 back in 2004, when websites and apps started featuring real-time and persistent user-generated content.
The metaverse is often imagined as a collective virtual shared space, created by the convergence of virtually enhanced physical reality and physically persistent virtual reality. Some examples of expected areas of application include gaming, social media, entertainment events, work collaboration, training, and commerce.
What are the main challenges associated with consuming entertainment content in the metaverse through XR technology?
Aside from the big claims, so far most metaverse experiences have only been satisfactory to visionary GenZ and GenY early adopters, eliciting skepticism in many GenX and Baby-Boomer observers (especially financial analysts).
The overall user experience and the visual quality for XR must be exceptional, given that you are watching from a screen a few centimeters away from your eyes. Users expect highly detailed and realistic 3D models, at least at the levels of blockbuster video games, but there are many more pixels to render, for each of the two eyes, at a higher frame rate: this requires a lot of computing power and advanced graphics technology. And there is the rub. To provide Fortnite-like graphics, XR actually requires more than twice the processing power than what would be necessary for a more typical TV display. Since we don’t want to wear a Playstation5 (or two) on our face, either we accept much more basic graphics quality, or we must tap into processing power that goes beyond that of the XR headset.
The only alternative to the much-criticized reminiscences of Second Life thus lies in “split computing”, i.e., decoupling the consumer client device (which must be light, power efficient and cost effective) from the rendering engine (which must be computationally powerful and possibly leased by the minute of use). The latter makes sense in the cloud, but this comes with the added challenge of having to maintain ultralow-latency interactivity within the constraints of the delivery networks, often including a wireless component. Mainframe 3.0, with a twist.
Cloud split computing is also essential to interoperability. Currently, there is no standard for how developers can create virtual worlds (“metaverse sites”) that can be experienced across multiple platforms, from smart glasses to XR headsets to mobile phones. The simplest way to achieve this interoperability – as well as to provide lag-free experience for multiple users interacting in a same virtual world – is to perform the rendering computation in the cloud, so that each end user device receives a point of view of a same virtual scene and the device itself only needs to be compatible with ultra-low-latency video streaming, enabling very different devices to access the site regardless of their computing power.
How is broadcasting of multimedia content linked to the metaverse? What can it bring now and what can it bring for the user?
So far broadcasters –and media companies in general – have been mostly watching the Metaverse bandwagon from afar, with a mix of curiosity and detachment. Quite different from the fashion world that rapidly embraced gamification and NFTs as a way to stay relevant to new generations and develop new (and often surprising) sources of revenues such as selling limited-release garments for gamers to wear when playing their favorite games.
As mentioned above, the Metaverse is a social interactive space, between humans and a wide variety of content. Traditional video (for instance shown on large virtual panels), volumetric immersive video (wherein viewers can change their point of view in real time with 6 Degrees of Freedom, as if they were there) and in general broadcast of multimedia content will still form a key component of the Metaverse, whether it relates to entertainment, sports or news. Similarly to how we consume this content today, we will come together in the Metaverse to enjoy multimedia content together, indeed we are seeing the first instances of these services – see for instance Meta Xstadium and Horizon Worlds. Of course, this increases the challenges of having to provide a realistic common environment where people can interact in real-time. Initial attempts to provide similar experiences entailed putting multiple users in front of a virtual flat video (whether rectilinear or 360-degree), while thanks to cloud rendering with lowlatency XR streaming we are now seeing examples of both photorealistic
6-Degrees-of-Freedom movies and photorealistic
6-Degrees-of-Freedom metaverse events with the participation of multiple people, each controlling a game-like avatar. Between Meta Quest headsets,
Bytedance Pico headsets, Playstation VR2 headsets and the like, more than 10 million high-spending families can already be reached with these types of services, which makes for an interesting early-adopter audience.
Broadcasters should play a role, but they still bear some scars from their past unsuccessful attempts with stereoscopic 3D and 2018’s unsuccessful stints with VR 360 video, so they fear that this may follow those same steps. As trite as this sentence sounds, I believe this time is different. If they won’t act, large social networks and tech companies are already working on filling the gap in their place.
If a user already consumes content conveniently on regular devices, why would a user want to access virtual worlds to consume it?
For the same reason why users wanted UHD HDR on 70-inch OLED panels even though they already had black and white analog TV conveniently working on 24inch CRT screens decades before.
Users have always embraced new technologies that allow them to consume content in a more realistic way. We went from still to moving pictures, from black and white to color, from 4:3 analog to 16:9 digital, from SD SDR to Ultra HD HDR and now we are headed towards increasingly immersive video and sound. Immersing ourselves further with the content we enjoy is a natural next step, and as soon as viewers get used to higher quality, it’s difficult for them to accept anything less.
However, for a new level of quality to be successful, the technology needs to provide a seamless way for the user to access the content. This means light devices, competitive price points, interactivity and – of course – engaging content.
Is the technology ready to deliver realistic worlds? Why is a realistic sense of total immersion necessary? How can technology help to achieve these capabilities?
This is one of the most important topics of debate, since creating realistic metaverse experiences is a significant technical challenge and some of the example Metaverse sites that were made available so far have been judged underwhelming. In fact, some of the experiences recently rolled out have been more detrimental than useful to make a case for the Metaverse, since they gave a false impression that technology is still a long way off, while it is not.
From my privilege pointview, I’m happy to answer yes, the technology is ready to deliver realistic worlds. But (there is a “but”) only with the proper combination of best-of-breed available technologies, which neither consumers nor most industry observers have yet been able to experience in action. As I mentioned before, the only way to achieve photorealistic immersive quality with lightweight and relatively inexpensive client devices is to decouple and untether client devices from rendering engines, separating computation from display.
This largely entails moving rendering engines to the cloud and transmitting to each end-user device a personalized video of its Metaverse view. In this way, latest-generation GPUs such as NVIDIA’s RTX 4090 – which boasts almost 10x more TFLOPs than Playstation5’s GPU – can be used to produce photorealistic immersive experiences, maxing out the display capabilities of client devices.
Remote rendering requirements in terms of bandwidth and latency are significant, but they are being overcome through the deployment of latest compression technologies, such as the new MPEG5 LCEVC compression enhancement standard, which allows to stay within realistic wireless bandwidth constraints, reduces compute requirements and guarantees more consistent ultra-low latency. The overall infrastructure cost for similar experiences are in the ballpark of less than 1 USD per hour, which is acceptable for a range of high-value experiences.
In short, current infrastructure and devices are already capable of serving realistic Metaverse experiences to a subset of well-connected consumers, mostly for use cases capable of justifying the per-hour cost of remote compute: some of these experiences will start being showcased and rolled out over the next two years. Availability of content will drive the usual virtuous cycle, driving more users to equip themselves to access that content.
Technology that makes the metaverse work today comes from the world of video games. What can the technology associated with broadcasting contribute to improving the performance of this digital context?
We could say that multiplayer video games like Fortnite are already a form of Metaverse, but we may also say that the Metaverse is precisely about using those user interface archetypes for applications other than gaming.
Traditional media companies bring to the table their ability to mobilize large concurrent audiences by producing engaging content – typically requiring a lower degree of interactivity.
In a word, broadcasters will contribute scale, progressively bringing to the Metaverse all the necessary creativity, technology, and infrastructure to craft metaverse events with millions of concurrent users.
The Metaverse will provide a great place for people to interact and socialize around broadcast events.
How can cloud technology support the growth and expansion of the metaverse? What about mobile networks such as 5G and 6G?
5G has been a painful financial investment for telco operators, since they are investing tens of billions of dollars without concrete opportunities to grow their revenues, often with negative Return on Invested Capital (ROIC). At least metaverse companies –similarly to social networks and streaming companies – will continue benefiting from those investments.
Cloud technology is fundamental to the growth and expansion of the metaverse, since as I mentioned a sensible way to provide realistic experiences on lightweight display devices entails time-sharing use of cloud compute resources. We must eliminate the need for each user to have to invest on an expensive gaming machine (and/ or having it next to them) to access the metaverse. For the Metaverse to scale as a new type of Internet user interface, it must be available to everyone when needed, with the compute capacity that one needs at any one time. Cloud is perfect for this.
Decoupling rendering engines from receiving devices has the abovementioned advantages but does put an enormous amount of strain onto the delivery network to ensure a high-quality and ultra- low latency experience. A convergence of new technologies will help in this regard, from more efficient networks such as 5G, 6G or Wi-Fi 7 to novel multi-layer compression approaches such as MPEG-5 LCEVC, which are ideal for ultralow-latency transmissions thanks to their capacity of discarding data packets during sudden drops in wireless bandwidth without compromising the overall quality of the signal.
How does V-Nova approach the evolution associated with the retransmission of entertainment in the metaverse? What technological solutions does it propose to meet all these challenges?
V-Nova’s compression technologies are key enablers for high-quality volumetric worlds, as demonstrated by our 2022 Steam VR release of the first-ever photorealistic volumetric movie, Construct (Construct VR - The Volumetric Movie on Steam (steampowered.com)), awarded with a Lumiere
Award by Hollywood’s Advanced Imaging Society.
At the same time, V-Nova’s LCEVC is a key enabler of high-quality and ultralow-latency XR streaming, multiplying the number of households and offices that can be served with cloud rendering.
MPEG-5 LCEVC (Low Complexity Enhancement
Video Coding) is ISOMPEG’s new hybrid multilayer coding enhancement standard. It is codecagnostic in that it combines a lower-resolution base layer encoded with any traditional codec (e.g., h.264, HEVC, VP9, AV1 or VVC) with multiple layers of residual data that reconstructs the full resolution. The LCEVC coding tools are particularly suitable to efficiently compress details, from both a processing and a compression standpoint, while leveraging a traditional codec at a lower resolution effectively uses the hardware acceleration available for that codec and makes it more efficient. LCEVC has been demonstrated to be a key enabler of ultra-low-latency XR streaming, producing benefits such as the following:
(1) Bandwidth that fits within typical Wi-fi constraints, even for the highest XR qualities
(2) Reduction of latency jitter, possibly the biggest impediment to quality of experience with cloud rendering
(3) Low processing complexity, which translates into compatibility with existing XR devices
(4) Ability to send 14-bit depth maps along with the video, which is critical to AR applications
V-Nova has developed the first, and so far the only, LCEVC SDK available commercially. We stand behind the adoption, implementation, and deployment of LCEVC as it is one of the smartest video compression technologies available, able to overcome some of the key challenges now being faced by the Metaverse.