Media Communication

What do I get from WebRTC’s media communication? #

WebRTC allows you to send and receive an unlimited amount of audio and video streams. You can add and remove these streams at anytime during a call. These streams could all be independent, or they could be bundled together! You could send a video feed of your desktop, and then include audio/video from your webcam.

The WebRTC protocol is codec agnostic. The underlying transport supports everything, even things that don’t exist yet! However, the WebRTC Agent you are communicating with may not have the necessary tools to accept it.

WebRTC is also designed to handle dynamic network conditions. During a call your bandwidth might increase, or decrease. Maybe you all the sudden experience lots of packet loss. The protocol is designed to handle all of this. WebRTC responds to network conditions and tries to give you the best experience possible with the resources available.

How does it work? #

WebRTC uses two pre-existing protocols RTP and RTCP, both defined in RFC 1889

RTP is the protocol that carries the media. It was designed to allow real-time delivery of video. It doesn’t stipulate any rules around latency or reliability, but gives you the tools to implement them. RTP gives you streams, so you can run multiple media feeds over one connection. It also gives you the timing and ordering information you need to feed a media pipeline.

RTCP is the protocol that communicates metadata about the call. The format is flexible enough so you can add whatever you want. This is used to communicate statistics about the call. It is also necessary to handle packet loss and to implement congestion control. It gives you the bi-directional communication necessary to respond to network conditions changing.

Latency vs Quality #

Real-time media is about making trade-offs between latency and quality. The more latency you are willing to tolerate, the higher quality video you can expect.

Real World Limitations #

These constraints are all caused by the limitations of the real world. These are all characteristics of your network that you will need to overcome.

Bandwidth #

Bandwidth is the maximum rate of data that can be transferred across a given path. It is important to remember this isn’t a static number either. The bandwidth will change along the route as more (or less) people use it.

When you attempt to send more data then available bandwidth you will experience network congestion.

Transmission Time #

Transmission Time is how long it takes for a packet to arrive. Like Bandwidth this isn’t constant. The Transmission Time can fluctuate at anytime.

Jitter #

Jitter is the fact that Transmission Time may vary. Some times you will see packets arrive in bursts. Any piece of hardware along the network path can introduce issues.

Packet Loss #

Packet Loss is when messages are lost in transmission. The loss could be steady, or it could come in spikes. This isn’t an uncommon occurrence either!

Maximum transmission unit #

Maximum Transmission Unit is the limit on how large a single packet can be. Networks don’t allow you to send one giant message. At the protocol level you need to packetize your data into small packets.

The MTU will also differ depending on what network path you take. You can use a protocol like Path MTU Discovery to figure out what the largest packet size is you can send.

Media 101 #

Codec #

Frame Types #

RTP #

Packet Format #

Every RTP packet has the following structure:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       Sequence Number         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Synchronization Source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            Contributing Source (CSRC) identifiers             |
|                             ....                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            Payload                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Version (V) #

Version is always 2

Padding (P) #

Padding is a bool that controls if the payload has padding.

The last byte of the payload contains a count of how many padding bytes were added.

Extension (X) #

If set the RTP header will have extensions. This is described in greater detail below.

CSRC count (CC) #

The amount of CSRC identifiers that follow after the SSRC, and before the payload.

Marker (M) #

The marker bit has no pre-set meaning, and is up to the user.

It some cases it is set when a user is speaking. It is also commonly used to mark a keyframe.

Payload Type (PT) #

Payload Type is the unique identifier for what codec is being carried by this packet.

For WebRTC the Payload Type is dynamic. VP8 in one call may be different then another. The Offerer in the call determines the mapping of Payload Types to codecs in the Session Description.

Sequence Number #

Sequence Number is used for ordering packets in a stream. Every time a packet is sent the Sequence Number is incremented by one.

RTP is designed to be useful over lossy networks. This gives the receiver a way to detect when packets have been lost.

Timestamp #

The sampling instant for this packet. This is not a global clock, but how much time has passed in the media stream.

Synchronization Source (SSRC) #

A SSRC is the unique identifier for this stream. This allows you to run multiple streams of media over a single stream.

Contributing Source (CSRC) #

A list that communicates what SSRCes contributed to this packet.

This is commonly used for talking indicators. Lets say server side you combined multiple audio feeds into a single RTP stream. You could then use this field to say ‘Input stream A and C were talking at this moment’

Extensions #

Mapping Payload Types to Codecs #

RTCP #

Packet Format #

Every RTCP packet has the following structure:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|    RC   |       PT      |             length            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            Payload                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Version (V) #

Version is always 2

Padding (P) #

Padding is a bool that controls if the payload has padding.

The last byte of the payload contains a count of how many padding bytes were added.

Reception Report Count (RC) #

The number of reports in this packet. A single RTCP Packet can contain multiple events.

Packet Type (PT) #

Unique Identifier for what type of RTCP Packet this is. A WebRTC Agent doesn’t need to support all these types, and support between Agents can be different. These are the ones you may commonly see though.

  • Full INTRA-frame Request (FIR) - 192
  • Negative ACKnowledgements (NACK) - 193
  • Sender Report - 200
  • Receiver Report - 201
  • Generic RTP Feedback - 205

The significance of these packet types will be described in greater detail below.

Full INTRA-frame Request #

This RTCP message notifies the sender that it needs to send a full image. This is for when the encoder is giving you partial frames, but you aren’t able to decode them.

This could happen because you had lots of packet loss, or maybe the decoder crashed.

Negative ACKnowledgements #

A NACK requests that a sender re-transmits a single RTP Packet. This is usually caused when a RTP Packet is lost, but could also happen because it is late.

NACKs are much more bandwidth efficent then requesting that the whole frame get sent again. Since RTP breaks up packets into very small chunks, you are really just requesting one small missing piece.

Sender/Receiver Reports #

These reports are used to send statistics between agents. This communicates the amount of packets actually received and jitter.

The reports could be used for general or diagnostics, or basic Congestion Control.

Generic RTP Feedback #

How RTP/RTCP solve problems #

RTP and RTCP then work together to solve all the problems caused by networks. These techniques are still constantly changing!

Negative Acknowledgment #

Also known as a NACK. This is one method of dealing with packet loss with RTP.

A NACK is a RTCP message sent back to a sender to request re-transmission. The receiver crafts a RTCP message with the SSRC and Sequence Number. If the sender does not have this RTP packet available to re-send it just ignores the message.

Forward Error Correction #

Also known as FEC. Another method of dealing with packet loss. FEC is when you send the same data multiple times, without it even being requested. This be done at the RTP level, or even lower with the codec.

If the packet loss for a call is steady this is much better then NACKs. The round trip of having to request, and then re-transmit the packet can be significant for NACKs.

Congestion Control #

Congestion Control is the act of adjusting the media depending on the attributes of the network. If you don’t have a lot of bandwidth, you need to send lower quality video.

Congestion Control is all about making trade offs.

JitterBuffer #