1. Introduction
The Real-time Transport Protocol, the associated RTP Control Protocol (RTP/RTCP, RFC 3550) [1], and the profile for audiovisual communications with minimal control (RFC 3551) [2] define mechanisms for transmitting time-based media across an IP network. RTP provides means to preserve timing and detect packet losses, among other things, and RTP payload formats provide for proper framing of (continuous) media in a packet-based environment. RTCP enables receivers to provide feedback on reception quality and allows all members of an RTP session to learn about each other.
The RTP specification provides only rudimentary support for encrypting RTP and RTCP packets. Secure RTP (RFC 3711) [4] defines an RTP profile ("SAVP") for secure RTP media sessions, defining methods for proper RTP and RTCP packet encryption, integrity, and replay protection. The initial negotiation of SRTP and its security parameters needs to be done out-of-band, e.g., using the Session Description Protocol (SDP, RFC 4566) [6] together with extensions for conveying keying material (RFC 4567 [7], RFC 4568 [8]).
The RTP specification also provides limited support for timely feedback from receivers to senders, typically by means of reception statistics reporting in somewhat regular intervals depending on the group size, the average RTCP packet size, and the available RTCP bandwidth. The extended RTP profile for RTCP-based feedback ("AVPF") (RFC 4585, [3]) allows session participants statistically to provide immediate feedback while maintaining the average RTCP data rate for all senders. As for SAVP, the use of AVPF and its parameters needs to be negotiated out-of-band by means of SDP (RFC 4566, [6]) and the extensions defined in RFC 4585 [3].
Both SRTP and AVPF are RTP profiles and need to be negotiated. This implies that either one or the other may be used, but both profiles cannot be negotiated for the same RTP session (using one SDP session level description). However, using secure communications and timely feedback together is desirable. Therefore, this document specifies a new RTP profile ("SAVPF") that combines the features of SAVP and AVPF.
As SAVP and AVPF are largely orthogonal, the combination of both is mostly straightforward. No sophisticated algorithms need to be specified in this document. Instead, reference is made to both existing profiles and only the implications of their combination and possible deviations from rules of the existing profiles are described as is the negotiation process.
1.1. Definitions
The definitions of RFC 3550 [1], RFC 3551 [2], RFC 4585 [3], and RFC 3711 [4] apply.
The following definitions are specifically used in this document:
-
RTP session: An association among a set of participants communicating with RTP as defined in RFC 3550 [1].
-
(SDP) media description: This term refers to the specification given in a single m= line in an SDP message. An SDP media description may define only one RTP session.
-
Media session: A media session refers to a collection of SDP media descriptions that are semantically grouped to represent alternatives of the same communications means. Out of such a group, one will be negotiated or chosen for a communication relationship and the corresponding RTP session will be instantiated. If no common session parameters suitable for the involved endpoints can be found, the media session will be rejected. In the simplest case, a media session is equivalent to an SDP media description and equivalent to an RTP session.
1.2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [5].