1. Introduction

At the time the Real-Time Transport Protocol (RTP) [RFC3550] was originally designed, and for quite some time after, endpoints in RTP sessions typically only transmitted a single media source and, thus, used a single RTP stream and synchronization source (SSRC) per RTP session, where separate RTP sessions were typically used for each distinct media type. Recently, however, a number of scenarios have emerged in which endpoints wish to send multiple RTP streams, distinguished by distinct RTP synchronization source (SSRC) identifiers, in a single RTP session. Although the initial design of RTP did consider such scenarios, the specification was not consistently written with such use cases in mind; thus, the specification is somewhat unclear in places.

This memo updates [RFC3550] to clarify behavior in use cases where endpoints use multiple SSRCs. It also updates [RFC4585] to resolve problems with regard to timeout of inactive SSRCs and to clarify behavior around inclusion of feedback messages.

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] and indicate requirement levels for compliant implementations.

3. Use Cases for Multi-Stream Endpoints

This section discusses several use cases that have motivated the development of endpoints that send RTP data using multiple SSRCs in a single RTP session.

3.1. Endpoints with Multiple Capture Devices

The most straightforward motivation for an endpoint to send multiple simultaneous RTP streams in a single RTP session is when an endpoint has multiple capture devices and, hence, can generate multiple media sources, of the same media type and characteristics. For example, telepresence systems of the type described by the CLUE Telepresence Framework [CLUE-FRAME] often have multiple cameras or microphones covering various areas of a room and, hence, send several RTP streams of each type within a single RTP session.

3.2. Multiple Media Types in a Single RTP Session

Recent work has updated RTP [MULTI-RTP] and Session Description Protocol (SDP) [SDP-BUNDLE] to remove the historical assumption in RTP that media sources of different media types would always be sent on different RTP sessions. In this work, a single endpoint's audio and video RTP streams (for example) are instead sent in a single RTP session to reduce the number of transport-layer flows used.

3.3. Multiple Stream Mixers

There are several RTP topologies that can involve a central device that itself generates multiple RTP streams in a session. An example is a mixer providing centralized compositing for a multi-capture scenario like that described in Section 3.1. In this case, the centralized node is behaving much like a multi-capturer endpoint, generating several similar and related sources.

3.4. Multiple SSRCs for a Single Media Source

There are several cases in which multiple SSRCs are used to send data from a single media source within a session. These include:

Layered or Multiple Description Codecs: Where different layers or descriptions are sent with different SSRCs
Transport Robustness Mechanisms: Such as RTP retransmission [RFC4588] or forward error correction [RFC5109]
Simulcast: Sending multiple encodings of the same source at different qualities or resolutions

2. Terminology

3. Use Cases for Multi-Stream Endpoints

3.1. Endpoints with Multiple Capture Devices​

3.2. Multiple Media Types in a Single RTP Session​

3.3. Multiple Stream Mixers​

3.4. Multiple SSRCs for a Single Media Source​

3.1. Endpoints with Multiple Capture Devices

3.2. Multiple Media Types in a Single RTP Session

3.3. Multiple Stream Mixers

3.4. Multiple SSRCs for a Single Media Source