6. Miscellaneous Considerations
6.1. Anonymous Calls
The use of DTLS-SRTP does not provide anonymous calling; however, it also does not prevent it. However, if care is not taken when anonymous calling features, such as those described in [RFC3325] or [RFC5767] are used, DTLS-SRTP may allow deanonymizing an otherwise anonymous call. When anonymous calls are being made, the following procedures SHOULD be used to prevent deanonymization.
When making anonymous calls, a new self-signed certificate SHOULD be used for each call so that the calls cannot be correlated as to being from the same caller. In situations where some degree of correlation is acceptable, the same certificate SHOULD be used for a number of calls in order to enable continuity of authentication; see Section 8.4.
Additionally, note that in networks that deploy [RFC3325], RFC 3325 requires that the Privacy header field value defined in [RFC3323] needs to be set to 'id'. This is used in conjunction with the SIP identity mechanism to ensure that the identity of the user is not asserted when enabling anonymous calls. Furthermore, the content of the subjectAltName attribute inside the certificate MUST NOT contain information that either allows correlation or identification of the user that wishes to place an anonymous call. Note that following this recommendation is not sufficient to provide anonymization.
6.2. Early Media
If an offer is received by an endpoint that wishes to provide early media, it MUST take the setup:active role and can immediately establish a DTLS association with the other endpoint and begin sending media. The setup:passive endpoint may not yet have validated the fingerprint of the active endpoint's certificate. The security aspects of media handling in this situation are discussed in Section 8.
6.3. Forking
In SIP, it is possible for a request to fork to multiple endpoints. Each forked request can result in a different answer. Assuming that the requester provided an offer, each of the answerers will provide a unique answer. Each answerer will form a DTLS association with the offerer. The offerer can then securely correlate the SDP answer received in the SIP message by comparing the fingerprint in the answer to the hashed certificate for each DTLS association.
6.4. Delayed Offer Calls
An endpoint may send a SIP INVITE request with no offer in it. When this occurs, the receiver(s) of the INVITE will provide the offer in the response and the originator will provide the answer in the subsequent ACK request or in the PRACK request [RFC3262], if both endpoints support reliable provisional responses. In any event, the active endpoint still establishes the DTLS association with the passive endpoint as negotiated in the offer/answer exchange.
6.5. Multiple Associations
When there are multiple flows (e.g., multiple media streams, non- multiplexed RTP and RTCP, etc.) the active side MAY perform the DTLS handshakes in any order. Appendix B of [RFC5764] provides some guidance on the performance of parallel DTLS handshakes. Note that if the answerer ends up being active, it may only initiate handshakes on some subset of the potential streams (e.g., if audio and video are
offered but it only wishes to do audio). If the offerer ends up being active, the complete answer will be received before the offerer begins initiating handshakes.
6.6. Session Modification
Once an answer is provided to the offerer, either endpoint MAY request a session modification that MAY include an updated offer. This session modification can be carried in either an INVITE or UPDATE request. The peers can reuse the existing associations if they are compatible (i.e., they have the same key fingerprints and transport parameters), or establish a new one following the same rules are for initial exchanges, tearing down the existing association as soon as the offer/answer exchange is completed. Note that if the active/passive status of the endpoints changes, a new connection MUST be established.
6.7. Middlebox Interaction
There are a number of potentially bad interactions between DTLS-SRTP and middleboxes, as documented in [MMUSIC-MEDIA], which also provides recommendations for avoiding such problems.
6.7.1. ICE Interaction
Interactive Connectivity Establishment (ICE), as specified in [RFC5245], provides a methodology of allowing participants in multimedia sessions to verify mutual connectivity. When ICE is being used, the ICE connectivity checks are performed before the DTLS handshake begins. Note that if aggressive nomination mode is used, multiple candidate pairs may be marked valid before ICE finally converges on a single candidate pair. Implementations MUST treat all ICE candidate pairs associated with a single component as part of the same DTLS association. Thus, there will be only one DTLS handshake even if there are multiple valid candidate pairs. Note that this may mean adjusting the endpoint IP addresses if the selected candidate pair shifts, just as if the DTLS packets were an ordinary media stream.
Note that Simple Traversal of the UDP Protocol through NAT (STUN) packets are sent directly over UDP, not over DTLS. [RFC5764] describes how to demultiplex STUN packets from DTLS packets and SRTP packets.
6.7.2. Latching Control without ICE
If ICE is not being used, then there is potential for a bad interaction with Session Border Controllers (SBCs) via "latching", as described in [MMUSIC-MEDIA]. In order to avoid this issue, if ICE is not being used and the DTLS handshake has not completed upon receiving the other side's SDP, then the passive side MUST do a single unauthenticated STUN [RFC5389] connectivity check in order to open up the appropriate pinhole. All implementations MUST be prepared to answer this request during the handshake period even if they do not otherwise do ICE. However, the active side MUST proceed with the DTLS handshake as appropriate even if no such STUN check is received and the passive MUST NOT wait for a STUN answer before sending its ServerHello.
6.8. Rekeying
As with TLS, DTLS endpoints can rekey at any time by redoing the DTLS handshake. While the rekey is under way, the endpoints continue to use the previously established keying material for usage with DTLS. Once the new session keys are established, the session can switch to using these and abandon the old keys. This ensures that latency is not introduced during the rekeying process.
Further considerations regarding rekeying in case the SRTP security context is established with DTLS can be found in Section 3.7 of [RFC5764].
6.9. Conference Servers and Shared Encryptions Contexts
It has been proposed that conference servers might use the same encryption context for all of the participants in a conference. The advantage of this approach is that the conference server only needs to encrypt the output for all speakers instead of once per participant.
This shared encryption context approach is not possible under this specification because each DTLS handshake establishes fresh keys that are not completely under the control of either side. However, it is argued that the effort to encrypt each RTP packet is small compared to the other tasks performed by the conference server such as the codec processing.
Future extensions, such as [SRTP-EKT] or [KEY-TRANSPORT], could be used to provide this functionality in concert with the mechanisms described in this specification.
6.10. Media over SRTP
Because DTLS's data transfer protocol is generic, it is less highly optimized for use with RTP than is SRTP [RFC3711], which has been specifically tuned for that purpose. DTLS-SRTP [RFC5764] has been defined to provide for the negotiation of SRTP transport using a DTLS connection, thus allowing the performance benefits of SRTP with the easy key management of DTLS. The ability to reuse existing SRTP software and hardware implementations may in some environments provide another important motivation for using DTLS-SRTP instead of RTP over DTLS. Implementations of this specification MUST support DTLS-SRTP [RFC5764].
6.11. Best Effort Encryption
[RFC5479] describes a requirement for best-effort encryption where SRTP is used and where both endpoints support it and key negotiation succeeds, otherwise RTP is used.
[MMUSIC-SDP] describes a mechanism that can signal both RTP and SRTP as an alternative. This allows an offerer to express a preference for SRTP, but RTP is the default and will be understood by endpoints that do not understand SRTP or this key exchange mechanism. Implementations of this document MUST support [MMUSIC-SDP].