8. Security Considerations

DTLS or TLS media signaled with SIP requires a way to ensure that the communicating peers' certificates are correct.

The standard TLS/DTLS strategy for authenticating the communicating parties is to give the server (and optionally the client) a PKIX [RFC5280] certificate. The client then verifies the certificate and checks that the name in the certificate matches the server's domain name. This works because there are a relatively small number of servers with well-defined names; a situation that does not usually occur in the VoIP context.

The design described in this document is intended to leverage the authenticity of the signaling channel (while not requiring confidentiality). As long as each side of the connection can verify the integrity of the SDP received from the other side, then the DTLS handshake cannot be hijacked via a man-in-the-middle attack. This integrity protection is easily provided by the caller to the callee (see Alice to Bob in Section 7) via the SIP Identity [RFC4474] mechanism. Other mechanisms, such as the S/MIME mechanism described in RFC 3261, or perhaps future mechanisms yet to be defined could also serve this purpose.

While this mechanism can still be used without such integrity mechanisms, the security provided is limited to defense against passive attack by intermediaries. An active attack on the signaling plus an active attack on the media plane can allow an attacker to attack the connection (R-SIG-MEDIA in the notation of [RFC5479]).

8.1. Responder Identity

SIP Identity does not support signatures in responses. Ideally, Alice would want to know that Bob's SDP had not been tampered with and who it was from so that Alice's User Agent could indicate to Alice that there was a secure phone call to Bob. [RFC4916] defines an approach for a UA to supply its identity to its peer UA, and for this identity to be signed by an authentication service. For example, using this approach, Bob sends an answer, then immediately follows up with an UPDATE that includes the fingerprint and uses the SIP Identity mechanism to assert that the message is from [email protected]. The downside of this approach is that it requires

the extra round trip of the UPDATE. However, it is simple and secure even when not all of the proxies are trusted. In this example, Bob only needs to trust his proxy. Offerers SHOULD support this mechanism and answerers SHOULD use it.

In some cases, answerers will not send an UPDATE and in many calls, some media will be sent before the UPDATE is received. In these cases, no integrity is provided for the fingerprint from Bob to Alice. In this approach, an attacker that was on the signaling path could tamper with the fingerprint and insert themselves as a man-in- the-middle on the media. Alice would know that she had a secure call with someone, but would not know if it was with Bob or a man-in-the- middle. Bob would know that an attack was happening. The fact that one side can detect this attack means that in most cases where Alice and Bob both wish for the communications to be encrypted, there is not a problem. Keep in mind that in any of the possible approaches, Bob could always reveal the media that was received to anyone. We are making the assumption that Bob also wants secure communications. In this do nothing case, Bob knows the media has not been tampered with or intercepted by a third party and that it is from [email protected]. Alice knows that she is talking to someone and that whoever that is has probably checked that the media is not being intercepted or tampered with. This approach is certainly less than ideal but very usable for many situations.

8.2. SIPS

If SIP Identity is not used, but the signaling is protected by SIPS, the security guarantees are weaker. Some security is still provided as long as all proxies are trusted. This provides integrity for the fingerprint in a chain-of-trust security model. Note, however, that if the proxies are not trusted, then the level of security provided is limited.

8.3. S/MIME

RFC 3261 [RFC3261] defines an S/MIME security mechanism for SIP that could be used to sign that the fingerprint was from Bob. This would be secure.

8.4. Continuity of Authentication

One desirable property of a secure media system is to provide continuity of authentication: being able to ensure cryptographically that you are talking to the same person as before. With DTLS, continuity of authentication is achieved by having each side use the same public key/self-signed certificate for each connection (at least with a given peer entity). It then becomes possible to cache the

credential (or its hash) and verify that it is unchanged. Thus, once a single secure connection has been established, an implementation can establish a future secure channel even in the face of future insecure signaling.

In order to enable continuity of authentication, implementations SHOULD attempt to keep a constant long-term key. Verifying implementations SHOULD maintain a cache of the key used for each peer identity and alert the user if that key changes.

8.5. Short Authentication String

An alternative available to Alice and Bob is to use human speech to verify each other's identity and then to verify each other's fingerprints also using human speech. Assuming that it is difficult to impersonate another's speech and seamlessly modify the audio contents of a call, this approach is relatively safe. It would not be effective if other forms of communication were being used such as video or instant messaging. DTLS supports this mode of operation. The minimal secure fingerprint length is around 64 bits.

ZRTP [AVT-ZRTP] includes Short Authentication String (SAS) mode in which a unique per-connection bitstring is generated as part of the cryptographic handshake. The SAS can be as short as 25 bits and so is somewhat easier to read. DTLS does not natively support this mode. Based on the level of deployment interest, a TLS extension [RFC5246] could provide support for it. Note that SAS schemes only work well when the endpoints recognize each other's voices, which is not true in many settings (e.g., call centers).

8.6. Limits of Identity Assertions

When RFC 4474 is used to bind the media keying material to the SIP signaling, the assurances about the provenance and security of the media are only as good as those for the signaling. There are two important cases to note here:

o RFC 4474 assumes that the proxy with the certificate "example.com" controls the namespace "example.com". Therefore, the RFC 4474 authentication service that is authoritative for a given namespace can control which user is assigned each name. Thus, the authentication service can take an address formerly assigned to Alice and transfer it to Bob. This is an intentional design feature of RFC 4474 and a direct consequence of the SIP namespace architecture.

o When phone number URIs (e.g., 'sip:[email protected]' or 'sip:[email protected];user=phone') are used, there is no structural reason to trust that the domain name is authoritative for a given phone number, although individual proxies and UAs may have private arrangements that allow them to trust other domains. This is a structural issue in that Public Switched Telephone Network (PSTN) elements are trusted to assert their phone number correctly and that there is no real concept of a given entity being authoritative for some number space.

In both of these cases, the assurances that DTLS-SRTP provides in terms of data origin integrity and confidentiality are necessarily no better than SIP provides for signaling integrity when RFC 4474 is used. Implementors should therefore take care not to indicate misleading peer identity information in the user interface. That is, if the peer's identity is sip:[email protected], it is not sufficient to display that the identity of the peer as +17005551008, unless there is some policy that states that the domain "chicago.example.com" is trusted to assert the E.164 numbers it is asserting. In cases where the UA can determine that the peer identity is clearly an E.164 number, it may be less confusing to simply identify the call as encrypted but to an unknown peer.

In addition, some middleboxes (back-to-back user agents (B2BUAs) and Session Border Controllers) are known to modify portions of the SIP message that are included in the RFC 4474 signature computation, thus breaking the signature. This sort of man-in-the-middle operation is precisely the sort of message modification that RFC 4474 is intended to detect. In cases where the middlebox is itself permitted to generate valid RFC 4474 signatures (e.g., it is within the same administrative domain as the RFC 4474 authentication service), then it may generate a new signature on the modified message. Alternately, the middlebox may be able to sign with some other identity that it is permitted to assert. Otherwise, the recipient cannot rely on the RFC 4474 Identity assertion and the UA MUST NOT indicate to the user that a secure call has been established to the claimed identity. Implementations that are configured to only establish secure calls SHOULD terminate the call in this case.

If SIP Identity or an equivalent mechanism is not used, then only protection against attackers who cannot actively change the signaling is provided. While this is still superior to previous mechanisms, the security provided is inferior to that provided if integrity is provided for the signaling.

8.7. Third-Party Certificates

This specification does not depend on the certificates being held by endpoints being independently verifiable (e.g., being issued by a trusted third party). However, there is no limitation on such certificates being used. Aside from the difficulty of obtaining such certificates, it is not clear what identities those certificates would contain -- RFC 3261 specifies a convention for S/MIME certificates that could also be used here, but that has seen only minimal deployment. However, in closed or semi-closed contexts where such a convention can be established, third-party certificates can reduce the reliance on trusting even proxies in the endpoint's domains.

8.8. Perfect Forward Secrecy

One concern about the use of a long-term key is that compromise of that key may lead to compromise of past communications. In order to prevent this attack, DTLS supports modes with Perfect Forward Secrecy using Diffie-Hellman and Elliptic-Curve Diffie-Hellman cipher suites. When these modes are in use, the system is secure against such attacks. Note that compromise of a long-term key may still lead to future active attacks. If this is a concern, a backup authentication channel, such as manual fingerprint establishment or a short authentication string, should be used.