1. Introduction
At the time the Real-Time Transport Protocol (RTP) [RFC3550] was originally designed, and for quite some time after, endpoints in RTP sessions typically only transmitted a single media source and, thus, used a single RTP stream and synchronization source (SSRC) per RTP session, where separate RTP sessions were typically used for each distinct media type. Recently, however, a number of scenarios have emerged in which endpoints wish to send multiple RTP streams, distinguished by distinct RTP synchronization source (SSRC) identifiers, in a single RTP session. Although the initial design of RTP did consider such scenarios, the specification was not consistently written with such use cases in mind; thus, the specification is somewhat unclear in places.
This memo updates [RFC3550] to clarify behavior in use cases where endpoints use multiple SSRCs. It also updates [RFC4585] to resolve problems with regard to timeout of inactive SSRCs and to clarify behavior around inclusion of feedback messages.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] and indicate requirement levels for compliant implementations.
3. Use Cases for Multi-Stream Endpoints
This section discusses several use cases that have motivated the development of endpoints that send RTP data using multiple SSRCs in a single RTP session.
3.1. Endpoints with Multiple Capture Devices
The most straightforward motivation for an endpoint to send multiple simultaneous RTP streams in a single RTP session is when an endpoint has multiple capture devices and, hence, can generate multiple media sources, of the same media type and characteristics. For example, telepresence systems of the type described by the CLUE Telepresence Framework [CLUE-FRAME] often have multiple cameras or microphones covering various areas of a room and, hence, send several RTP streams of each type within a single RTP session.
3.2. Multiple Media Types in a Single RTP Session
Recent work has updated RTP [MULTI-RTP] and Session Description Protocol (SDP) [SDP-BUNDLE] to remove the historical assumption in RTP that media sources of different media types would always be sent on different RTP sessions. In this work, a single endpoint's audio and video RTP streams (for example) are instead sent in a single RTP session to reduce the number of transport-layer flows used.
3.3. Multiple Stream Mixers
There are several RTP topologies that can involve a central device that itself generates multiple RTP streams in a session. An example is a mixer providing centralized compositing for a multi-capture scenario like that described in Section 3.1. In this case, the centralized node is behaving much like a multi-capturer endpoint, generating several similar and related sources.
3.4. Multiple SSRCs for a Single Media Source
There are several cases in which multiple SSRCs are used to send data from a single media source within a session. These include:
- Layered or Multiple Description Codecs: Where different layers or descriptions are sent with different SSRCs
- Transport Robustness Mechanisms: Such as RTP retransmission [RFC4588] or forward error correction [RFC5109]
- Simulcast: Sending multiple encodings of the same source at different qualities or resolutions