5G and eye-tracking enabling mobile VR | Ericsson Research Blog

Ericsson Research Blog

Research, insights and technology reflections

5G and eye-tracking enabling mobile VR

How do we cope with the anticipated huge demands on bandwidth and device computational power coming from high-quality virtual reality as video resolutions grow higher and higher? Researchers from Ericsson Research, the Royal Institute of Technology KTH, Stockholm, Sweden, and Tobii have demonstrated how bandwidth and device computational requirements can be dramatically reduced by using smart eye-tracking combined with low-latency 5G networks and distributed cloud. This allows network service providers to provide excellent experiences at a fraction of the network burden.

What are the barriers to overcome if VR is to really take off and reach mass market adoption?

Here are a few things from our wish list:

    1. For a truly immersive experience, much higher resolution will be needed than what is currently available in even high-end VR devices. At the same time, the cost of devices must go down significantly.

    2. We need to cut the cord from the tethered computer required today for high-quality VR graphics processing and instead use energy-efficient battery-powered lightweight wearables.

    3. It is absolutely necessary to find cost-efficient ways for network service providers to stream high quality VR content to massive amounts of mobile users.

But how can all this be realized? One obvious way to meet all three challenges is to utilize the fact that high-resolution video quality is only needed in the user’s field of view, known as the viewport. Why waste network bandwidth and device compute resources on the parts that the users are less likely to see? Viewport-aware streaming and rendering based on users’ head movements is a popular and promising track currently being explored.  

In a research collaboration between the Royal Institute of Technology (KTH), Tobii and Ericsson, we have taken this optimization approach to the next level by demonstrating “foveated streaming”, a novel solution limiting the provision of high video quality only to the fraction of the viewport in the immediate proximity of the user’s fixation point.

This solution is enabled by the information provided by the next generation eye-trackers, which are embedded in the head mounted display (HMD), and used to identify the portion of content to be selectively streamed at high quality from a cloud server to the end device.

This allows to achieve significant savings both in network bandwidth and device computational power. A big challenge is that the loop from the eye-tracker to the content server and back to the HMD needs to be really fast.  Thankfully, 5G provides this exact property, with low-latency wide area access! Add to that the foveal server hosted at a distributed cloud edge of the network service provider’s network, a fast video coding mechanism, and there you go.

Eye tracking and foveated rendering in user devices

Foveated rendering is an enabler for the PC and VR industry, as it can reduce requirements in rendering capacity or enable higher resolution displays without adding more GPU resources. By knowing where the user is focusing their gaze, only a limited area of the display is rendered at high quality, while the rest of the display is rendered in such a way that the user will not notice the reduction in resolution. All throughout the computer graphic industry, a lot of effort is put into supporting foveated rendering into the rendering technologies and game engines.

Central to the functionality is eye tracking technology.

In late 2018 and into 2019, a number of commercially available headsets will come out with Tobii Eye Tracking technology. This will enable devices with better foveated rendering and also further improve user experience through interaction enhancements and social interaction and eye contact in virtual environments.

Recent announcements by SoC provider Qualcomm about a reference implementation of Tobii Eye Tracking in the Snapdragon 845 chipset serve as a testament to the future adoption rate of eye tracking tech in the standalone VR and AR markets.

So far, the benefit of using eye tracking technology for foveal rendering lies on the VR device side only. But as eye tracking technology becomes more widely available in standalone VR devices, it will become viable to stretch the concept of foveated rendering from the device to the network to cope with the massive network bandwidth demand that will come from popular use of mobile VR streaming.

Foveated streaming over 5G

We have built a prototype with Tobii eye tracking integrated in an HTC Vive HMD, which transmits the eye-gaze signal over an Ericsson 5G end-to-end network prototype (radio + core network) to a local “foveated cloud server” where we in real-time select which parts of the video to deliver over the network to the user device in high quality.

So, in principle we have moved the role of foveated rendering from the device to this foveated cloud server by developing a novel streaming solution using HEVC video coding. In the prototype we demonstrate bandwidth reductions of up to 85 percent compared with transmitting the entire video in full resolution. Feedback from network service providers trying out our demo at Mobile World Congress in February was enormously encouraging. Most visitors could not spot any or only very little difference in perceived quality despite the huge difference in bandwidth consumption.
 


Figure 1. Content adapted at a distributed cloud server, over a low-latency network, and based on real-time eye-gaze measurements
 


Figure 2. High-quality foreground tiles selected exactly at the spot where the user is looking. Low-resolution background in the rest of the viewport. Screenshot from demo.
 


Figure 3. Demonstration at Mobile World Congress

Codec configuration and fast adaptation

The prototype uses the HEVC video coding format, and the content is encoded into three different streams. For the first stream, the VR content is partitioned into spatial regions by using the Tiles tool supported in HEVC. Each region, or tile, is made available in high quality on the server. The second stream is a low resolution, low bitrate representation of the video. This second stream does not use the Tiles tool. Finally, the third stream is a so-called all-Intra stream in which all tiles are encoded in high quality without prediction from previous tiles.

During media consumption, the gaze direction is continuously captured from the user and sent to the server. An area around the gaze direction is calculated and the tiles that fall within this area are forwarded from the first stream and sent to the user in high quality. The second stream is sent in parallel and the end device decodes both the second stream as well as the high-quality tiles. The decoded high-quality tiles replace the corresponding area in the decoded second stream before the video picture is rendered to the user.

When the user changes the gaze direction by looking in a new direction, the server updates the high-quality area. Transmission for new tile positions that were previously not sent in high quality is started by sending the corresponding tiles of the next picture from the all-Intra stream. This means that there is at most one picture delay at the server before the high-quality area is updated. Moreover, this novel approach can be implemented on standard streaming protocols, like DASH.

Trade-off between latency and bandwidth savings

How low latency is then required? Well, one beauty of the concept is that there is no strict limit. Higher latency can to some degree be compensated for by increasing the size of the high-quality area of the video at the expense of more consumed bandwidth. Simply speaking, the lower the transmission latency, the more bandwidth you can save for the same user perceived quality. Alternatively, you can choose to deliver even higher quality where the user is looking at a moderate bandwidth increase. Again, the lower the latency is, the higher the gain in quality and bandwidth saving becomes.

Figure 4. Radii of high-quality foreground adapted to network conditions.

Other potential applications

In our prototype we focused on streamed pre-recorded 360 VR video as a use case. But there are, of course, numerous other possible applications like live-video feeds, video conferencing, video for remotely controlled vehicles and AR-assisted operators in factories, just to mention a few.

Last, we are also looking into how aggregated eye-gaze data can be used for further gains and savings on VR / AR streaming as well as content provisioning.

Alfredo Fanghella

Alfredo Fanghella recently became a Researcher in the Digital Representation and Interaction department at Ericsson Research. He works in the area of media content analysis, specifically analyzing video with Computer Vision and Machine Learning techniques. Alfredo studied Computer Science and Engineering at Simón Bolívar University in Caracas, Venezuela.

Alfredo Fanghella

Calin Curescu

Dr. Calin Curescu is a Master Researcher with Ericsson Research. He is working on various orchestration topics, such as model-based orchestration, or algorithms for joint service composition and allocation in the Distributed Cloud. He is an Orchestration Champion in the Cloud Services and Platforms area in Ericsson Research, and an Ericsson delegate in the TOSCA standardization. Calin holds a Ph.D. in Computer Science from Linköping University, Sweden, and has published over 25 research articles and patents.

Calin Curescu

Joakim Karlén

Joakim works with product management for VR at Tobii Tech. He is responsible for the VR eye tracking platform products that Tobii develop in order to drive business integrating eye tracking into VR devices. He has a long background in the telecommunications and software industry.

Joakim Karlén

Johan Lundsjö

Johan is a Principal Researcher in the area of network architectures and protocols at Ericsson Research. He joined Ericsson in 1996 and has been involved in research, standardization and early product concepts work for 3G, 4G and now 5G networks. Starting with a focus on radio interface and radio access networks, in recent years his scope has expanded to include cloud and more holistic system architecture aspects. Johan holds an MSc in Electrical Engineering from KTH – Royal Institute of Technology.

Johan Lundsjö

Konrad Tollmar

Konrad Tollmar is an Associate Professor at KTH. His main research interest is in mobile computing and various aspects on user-centered communication technologies, such as Quality of Experiences and how interactive technologies becomes a part of our everyday practice and life. Prof Tollmar co-lead the Mobile Service Lab at KTH and is the program director for KTH and EIT Digital master program in ICT Innovation. Prior to this, he worked at Ingvar Kamprad Design Centre - Lund University, CSAIL - MIT and The Smart Studio at Interactive Institute.

Konrad Tollmar

Mårten Skogö

Mårten Skogö is the Chief Science Officer and co-founder of Tobii. Mårten Skogö has been responsible for long-term development of Tobii’s eye tracking technology since its inception in 2001. He holds several patents related to eye tracking and other innovations. Before Tobii he co-founded Jenser Technology, a spin-off from the research he was doing at the Royal Institute of Technology (KTH). Skogö studied Engineering Physics at KTH, Sweden.

Mårten Skogö

Pietro Lungaro

Pietro Lungaro, Ph.D., is a Research Scientist at the Royal Institute of Technology (KTH) in Stockholm, Sweden. His research interests focus on exploiting user and service context information to optimize the provision of future media in mobile networks. He is the author of several journal and conference papers in the areas of Quality of Experience (QoE), user-centric solutions and algorithms for enabling the provision of VR/MR/AR services in future 5G networks and novel interfaces between humans and AI-controlled machines.

Pietro Lungaro

Rickard Sjöberg

Rickard is an Expert in video compression in the Digital Representation and Interaction department at Ericsson Research, Kista, Sweden. He has been with Ericsson since 1997 and has worked in various areas related to video coding, both in research and product development. In parallel, he has been an active contributor in the video coding standardization community, with hundreds of contributions relating to the H.264 and HEVC video coding standards. Rickard currently works as the technical leader of the video coding activities at Ericsson Research.

Rickard Sjöberg