2. Overview
Endpoints wishing to set up an RTP media session do so by exchanging offers and answers in SDP messages over SIP. In a typical use case, two endpoints would negotiate to transmit audio data over RTP using the UDP protocol.
Figure 1 shows a typical message exchange in the SIP trapezoid.
+-----------+ +-----------+
|SIP | SIP/SDP |SIP |
+------>|Proxy |----------->|Proxy |-------+
| |Server X | (+finger- |Server Y | |
| +-----------+ print, +-----------+ |
| +auth.id.) |
| SIP/SDP SIP/SDP |
| (+fingerprint) (+fingerprint,|
| +auth.id.) |
| |
| v
+-----------+ Datagram TLS +-----------+ |SIP | <-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-> |SIP | |User Agent | Media |User Agent | |Alice@X | <=================================> |Bob@Y | +-----------+ +-----------+
Legend: ------>: Signaling Traffic <-+-+->: Key Management Traffic <=====>: Data Traffic
Figure 1: DTLS Usage in the SIP Trapezoid
Consider Alice wanting to set up an encrypted audio session with Bob. Both Bob and Alice could use public-key-based authentication in order to establish a confidentiality protected channel using DTLS.
Since providing mutual authentication between two arbitrary endpoints on the Internet using public-key-based cryptography tends to be problematic, we consider more deployment-friendly alternatives. This document uses one approach and several others are discussed in Section 8.
Alice sends an SDP offer to Bob over SIP. If Alice uses only self- signed certificates for the communication with Bob, a fingerprint is included in the SDP offer/answer exchange. This fingerprint binds the DTLS key exchange in the media plane to the signaling plane.
The fingerprint alone protects against active attacks on the media but not active attacks on the signaling. In order to prevent active attacks on the signaling, "Enhancements for Authenticated Identity Management in the Session Initiation Protocol (SIP)" [RFC4474] may be used. When Bob receives the offer, the peers establish some number of DTLS connections (depending on the number of media sessions) with mutual DTLS authentication (i.e., both sides provide certificates). At this point, Bob can verify that Alice's credentials offered in TLS match the fingerprint in the SDP offer, and Bob can begin sending
media to Alice. Once Bob accepts Alice's offer and sends an SDP answer to Alice, Alice can begin sending confidential media to Bob over the appropriate streams. Alice and Bob will verify that the fingerprints from the certificates received over the DTLS handshakes match with the fingerprints received in the SDP of the SIP signaling. This provides the security property that Alice knows that the media traffic is going to Bob and vice versa without necessarily requiring global Public Key Infrastructure (PKI) certificates for Alice and Bob. (See Section 8 for detailed security analysis.)