Ericsson Research Blog

Research, Insights and Technology Reflections

Remote Excavation: Using WebRTC and Real-Time Video with an Eye on 5G

How User Experience and Real-Time Communication Research work together uncovering the Future of Mobile Networks.

Did you see the remote controlled digger demo at MWC2014 in Barcelona? It is Ericsson Research’s latest example of how we work to adapt to the rapidly changing landscape of mobile network use and to drive development of future network technologies and use cases.

For us, the demo really gets to the heart of uncovering future uses of mobile networks. The use case – remote excavation, or steering a digger via the Network – is very useful for understanding the requirements for and experimenting with future network technologies.

The Digger demo was a joint effort between the ‘User Experience Lab’ and the ‘Services, Multimedia and Networks’ teams at Ericsson Research.

Hans Vestberg presented the demonstration on Tuesday February 25th during his keynote speech at the Mobile World Congress 2014 in Barcelona.

There are many complex technical aspects to being able to control a large machine but we wanted to investigate the challenges of controlling it from anywhere; the machine could be somewhere in Asia and the operator could be in South America.

The core of the remote control problem is to give the operator an experience of controlling the machine that is as good as, or better than actually sitting on-site in the driver’s seat. This produces requirements that will really test the capabilities of the network as well as the applications providing the complete solution.

Breaking down the problem to its simplest form, we needed a tight, low-latency stream of sound, video and control information flowing back and forth between the operator and machine.

We used the real-time communications framework developed here at Ericsson Research to rapidly implement the end-to-end solution of sending the sound and video from the excavator to the operator and sending the controls to the excavator. This same framework was previously used to support the Bowser WebRTC-compliant mobile browser app and is based on the GStreamer open source multimedia framework.

A model excavator machine was used with a standard mobile phone with a 360-degree lens was mounted on top of it. Using our real-time communication framework and WebRTC API, we developed an application to capture this 360-degree video, capture audio, compress both streams and send them to the operator.

doughnut

The operator uses an Oculus Rift virtual reality headset connected to a standard desktop computer running a second application that was built using the same RTC framework.

The operator application receives the 360-degree video, which looks a bit like a doughnut, decompresses it and maps it onto a sphere.

The Oculus Rift headset gives low-latency and accurate feedback of where the operator is looking which the application can use to play the correct view of the video to each of the operator’s eyes.

A standard video game controller is used to communicate the controls to the operator application and they are sent via a data channel and eventually communicated directly into the motors of the excavator.

All of these components allow us to steer the machine, observe the actions taking place and react to them. This solution was created as a proof of concept but it alone is already surprisingly usable and successful in demonstrating the idea and problems faced for the future of the use case.

It may sound like a cliché but it really is a surprise, even to researchers and engineers working on the project, just how quickly you become immersed in the experience. It takes just a few minutes to understand the controls and focus on doing things with the machine. The prototype has rough edges but the core experience is present and engaging.

Latency is one of the most key factors in feeling connected and immersed in the driving experience. The time between when the driver issues a command, the machine reacts and the driver sees and hears that reaction needs to be less than about 50 milliseconds. This need for a tight connection between both ends of the solution places a strong requirement on latency in future networks, especially those sections that are using mobile radio networks such as 5G technologies.

The quality of the sound and video is also of great importance – the driver needs to be able to see and hear all the sights and sounds as if they were in the driver’s seat without missing any details. More than this, a simple 2D video stream is not good enough – stereoscopic 3D is needed to provide sense of depth.

The complete solution that would be deployed in full-scale excavators and other machinery would need far more information and controls than our model setup.

Arrays of cameras and microphones may each deliver separate streams; sensors could report information about the surroundings, about the environment being interacted with as well as the state of the vehicle being driven. All of this information would need to be communicated and presented to the operator so they have everything they need to dig the hole in Asia while sitting in South America and not have to leave their family for weeks at a time just to be on-site.

Robert Swain, Ericsson Research