Beyond HTML5: Peer-to-Peer Conversational Video | Ericsson Research Blog

Ericsson Research Blog

Research, insights and technology reflections

Beyond HTML5: Peer-to-Peer Conversational Video

We’ve in a previous blog post shown you our work on conversational voice and video using “beyond HTML5” solutions.

In that work we used websockets and a media relay to route streams between peers. Now we’d like to show you how we have extended this to use peer-to-peer streaming.

Peer-to-peer streaming means that voice/video frames are streamed directly between peers, without any server in between. The effect is lower latency and more efficient network utilization. Up until now, however, web browsers have lacked the capability to communicate peer-to-peer. Instead, communication has traditionally relied on a shared relay server in the network.

The attached video includes a quick demo and a brief explanation of the connection establishment procedure. Below we explain this in more detail.


There is an existing proposal for an API called ConnectionPeer for establishing a direct connection between two peers (web browsers). ConnectionPeer is a very minimalistic API, and leaves most of the signaling (logging in, inviting friends, and so on) to be performed using traditional HTTP techniques. For example, the EventSource API (which we submitted to WebKit in 2009) comes in handy for receiving invitations.

The API is presented, along with a brief example, in the HTML specification at the WhatWG site,

ConnectionPeer is responsible for the minimal functionality needed for establishing peer-to-peer connectivity, using the following steps:

  • Each peer collects information about itself about how it can be reached from the outside. Typically this is one or more IP address/port combinations, along with some more information. This is done by the getLocalConfiguration method.
  • Add the corresponding information about the other peer (obtained “out of band”, that is, typically over HTTPS from the chat server). This is done by the addRemoteConfiguration method.
  • Establish the peer-to-peer connection, allowing for streaming data to be exchanged between the peers. This is done implicitly once both methods above have been called. Once the connection has been established, an onconnect event is generated and allows the application to react.
  • In addition to connection establishment, ConnectionPeer also includes methods for streaming data over the connection. These are used to add real-time voice and video streams. Here’s a sketch of an example of the above:

    <br /> <script> var serverConfig = ...; // provided by server to handle, e.g., TURN var local = new ConnectionPeer(serverConfig);</p> <p>window.onload = function() {</p> <p> local.onconnect = function() { // executed when we're connected to the other peer: // from now on, we can start adding streams }</p> <p> local.onstream = function() { // executed when the other peer adds a stream, e.g., video or voice var remoteView = document.getElementById("remoteView"); remoteView.src = local.remoteStreams[0].url; }</p> <p> var videoDevice = document.getElementById("videoDevice"); videoDevice.onchange = function() { // executed when the user selects a video source in the <device> element var localStream =; var selfView = document.getElementById("selfView");</p> <p> // display the selected video source (self view) selfView.src = localStream.url;</p> <p> // ... and show it to the remote peer by adding it to the connection local.addStream(localStream); } }</p> <p>// listen to an EventSource for invitation events var invitationEvents = new EventSource(...); invitationEvents.addEventListener("message", function(event) { // request the local connectivity configuration (step 1 above) local.getLocalConfiguration(function (peer, configuration) { // include the local configuration in an invitation response // to the server (step 2 above) using some "out-of-band" mechanism, // such as an XHR } }); </script>;</p> <p><video width="320" height="240" id="selfView" autoplay="true"></video><br /> <video width="320" height="240" id="remoteView" autoplay="true"></video></p> <p><device id="videoDevice" type="media"></p> <p>

    We became interested in this API and wanted to learn more about it, so we went ahead to implement it (at least, a subset of it). We are still in a Linux environment, using WebKit GTK+ and gstreamer, and are re-using the implementations of the device element and Stream API as well as large parts of the MediaStreamTransceiver (previous post)


    Most networks use some type of NAT (Network Address Translation), which complicates peer-to-peer connections like this. The ICE (Interactive Connectivity Establishment; RFC 5245) procedure allows for establishing connectivity even in the presence of NATs, using STUN/TURN servers. This means that step 1 above results in a set of addresses, including both local ones and NATed ones. It also means that a prioritization is made in step 3 that values local addresses higher than NATed ones, to make sure latency is kept as low as possible.

    We thus use ICE to implement the native parts of ConnectionPeer. In our modified WebKit GTK+, we use libnice for the ICE implementation, and it integrates rather nicely with gstreamer and the GTK+ main loop.

    It seems to us that the functionality of ICE matches the ConnectionPeer API rather well; however, we have some comments on the finer details, and we plan to bring those comments up with the WhatWG community.


    Although ConnectionPeer is a rather small API, it provides something fundamentally different from the traditional web: peer-to-peer connections without an intermediate relay. In the efforts to start standardization (through activities in IETF and W3C) of peer-to-peer support in browsers to enable real-time voice and video communication without plug-ins, the ConnectionPeer API is the most concrete API proposal so far. Our tests indicate that it is (with some minor changes) a good starting point.

    –Patrik Persson, Xing Fan, Yuan Song, Stefan Håkansson