We’ve in a previous blog post shown you our work on conversational voice and video using “beyond HTML5″ solutions.
In that work we used websockets and a media relay to route streams between peers. Now we’d like to show you how we have extended this to use peer-to-peer streaming.
Peer-to-peer streaming means that voice/video frames are streamed directly between peers, without any server in between. The effect is lower latency and more efficient network utilization. Up until now, however, web browsers have lacked the capability to communicate peer-to-peer. Instead, communication has traditionally relied on a shared relay server in the network.
The attached video includes a quick demo and a brief explanation of the connection establishment procedure. Below we explain this in more detail.
THE CONNECTIONPEER API
There is an existing proposal for an API called ConnectionPeer for establishing a direct connection between two peers (web browsers). ConnectionPeer is a very minimalistic API, and leaves most of the signaling (logging in, inviting friends, and so on) to be performed using traditional HTTP techniques. For example, the EventSource API (which we submitted to WebKit in 2009) comes in handy for receiving invitations.
The API is presented, along with a brief example, in the HTML specification at the WhatWG site, http://www.whatwg.org/specs/web-apps/current-work/multipage/commands.html#peer-to-peer-connections.
ConnectionPeer is responsible for the minimal functionality needed for establishing peer-to-peer connectivity, using the following steps:
In addition to connection establishment, ConnectionPeer also includes methods for streaming data over the connection. These are used to add real-time voice and video streams. Here’s a sketch of an example of the above:
We became interested in this API and wanted to learn more about it, so we went ahead to implement it (at least, a subset of it). We are still in a Linux environment, using WebKit GTK+ and gstreamer, and are re-using the implementations of the device element and Stream API as well as large parts of the MediaStreamTransceiver (previous post)
NAT TRAVERSAL AND ICE
Most networks use some type of NAT (Network Address Translation), which complicates peer-to-peer connections like this. The ICE (Interactive Connectivity Establishment; RFC 5245) procedure allows for establishing connectivity even in the presence of NATs, using STUN/TURN servers. This means that step 1 above results in a set of addresses, including both local ones and NATed ones. It also means that a prioritization is made in step 3 that values local addresses higher than NATed ones, to make sure latency is kept as low as possible.
We thus use ICE to implement the native parts of ConnectionPeer. In our modified WebKit GTK+, we use libnice for the ICE implementation, and it integrates rather nicely with gstreamer and the GTK+ main loop.
It seems to us that the functionality of ICE matches the ConnectionPeer API rather well; however, we have some comments on the finer details, and we plan to bring those comments up with the WhatWG community.
Although ConnectionPeer is a rather small API, it provides something fundamentally different from the traditional web: peer-to-peer connections without an intermediate relay. In the efforts to start standardization (through activities in IETF and W3C) of peer-to-peer support in browsers to enable real-time voice and video communication without plug-ins, the ConnectionPeer API is the most concrete API proposal so far. Our tests indicate that it is (with some minor changes) a good starting point.
–Patrik Persson, Xing Fan, Yuan Song, Stefan Håkansson