Skip to main content

1. Introduction

1.1. Terminology

This document uses terminology from [RFC8825]. In addition, the following terms are used as described below:

RTP stream: A stream of RTP packets containing media data [RFC7656].

MediaStream: An assembly of MediaStreamTracks [W3C.CR-mediacapture-streams]. One MediaStream can contain multiple MediaStreamTracks, of the same or different types.

MediaStreamTrack: Defined in [W3C.CR-mediacapture-streams] as a unidirectional flow of media data (either audio or video, but not both). Corresponds to the [RFC7656] term "source stream". One MediaStreamTrack can be present in zero, one, or multiple MediaStreams.

Media description: Defined in [RFC4566] as a set of fields starting with an "m=" field and terminated by either the next "m=" field or the end of the session description.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1.2. Structure of This Document

This document adds a new Session Description Protocol (SDP) [RFC4566] mechanism that can attach identifiers to the RTP streams and attach identifiers to the groupings they form. It is designed for use with WebRTC [RFC8825].

Section 1.3 gives the background on why a new mechanism is needed.

Section 2 gives the definition of the new mechanism.

Section 3 gives the necessary semantic information and procedures for using the "msid" attribute to signal the association of MediaStreamTracks to MediaStreams in support of the WebRTC API [W3C-WebRTC].

1.3. Why a New Mechanism Is Needed

When media is carried by RTP [RFC3550], each RTP stream is distinguished inside an RTP session by its Synchronization Source (SSRC); each RTP session is distinguished from all other RTP sessions by being on a different transport association (strictly speaking, two transport associations, one used for RTP and one used for the RTP Control Protocol (RTCP), unless RTP/RTCP multiplexing [RFC5761] is used).

SDP [RFC4566] gives a format for describing an SDP session that can contain multiple media descriptions. According to the model used in [RFC8829], each media description describes exactly one media source. If multiple media sources are carried in an RTP session, this is signaled using BUNDLE [RFC8843]; if BUNDLE is not used, each media source is carried in its own RTP session.

The SDP Grouping Framework [RFC5888] can be used to group media descriptions. However, for the use case of WebRTC, there is the need for an application to specify some application-level information about the association between the media description and the group. This is not possible using the SDP Grouping Framework.

1.4. The WebRTC MediaStream

The W3C WebRTC API specification [W3C-WebRTC] specifies that communication between WebRTC entities is done via MediaStreams, which contain MediaStreamTracks. A MediaStreamTrack is generally carried using a single SSRC in an RTP session, forming an RTP stream. The collision of terminology is unfortunate. There might possibly be additional SSRCs, possibly within additional RTP sessions, in order to support functionality like forward error correction or simulcast. These additional SSRCs are not affected by this specification.

MediaStreamTracks are unidirectional; they carry media in one direction only.

In the RTP specification, RTP streams are identified using the SSRC field. Streams are grouped into RTP sessions and also carry a CNAME. Neither CNAME nor RTP session corresponds to a MediaStream. Therefore, the association of an RTP stream to MediaStreams need to be explicitly signaled.

WebRTC defines a mapping (documented in [RFC8829]) where one SDP media description is used to describe each MediaStreamTrack, and the BUNDLE mechanism [RFC8843] is used to group MediaStreamTracks into RTP sessions. Therefore, the need is to specify the identifier (ID) of the MediaStreamTrack and its associated MediaStream for each media description, which can be accomplished with a media-level SDP attribute.

This usage is described in Section 3.