1. Introduction
The IETF has specified datagram transport using UDP, Stream Control Transmission Protocol (SCTP), and Datagram Congestion Control Protocol (DCCP), as well as protocols layered on top of these transports (e.g., SCTP/UDP, DCCP/UDP, QUIC/UDP) and direct datagram transport over the IP network layer. This document describes a robust method for Path MTU Discovery (PMTUD) that can be used with these transport protocols (or the applications that use their transport service) to discover an appropriate size of packet to use across an Internet path.
1.1. Classical Path MTU Discovery
Classical Path Maximum Transmission Unit Discovery (PMTUD) can be used with any transport that is able to process ICMP Packet Too Big (PTB) messages (e.g., [RFC1191] and [RFC8201]). In this document, the term PTB message is applied to both IPv4 ICMP Unreachable messages (Type 3) that carry the error Fragmentation Needed (Type 3, Code 4) [RFC0792] and ICMPv6 Packet Too Big messages (Type 2) [RFC4443]. When a sender receives a PTB message, it reduces the effective MTU to the value reported as the link MTU in the PTB message. Classical PMTUD specifies a method of periodically increasing the packet size in an attempt to discover an increase in the supported PMTU. The packets sent with a size larger than the current effective PMTU are known as probe packets.
Packets not intended as probe packets are either fragmented to the current effective PMTU, or the attempt to send fails with an error code. Applications can be provided with a primitive to let them read the Maximum Packet Size (MPS), which is derived from the current effective PMTU.
Classical PMTUD is subject to protocol failures. One failure arises when traffic using a packet size larger than the actual PMTU is black-holed (all datagrams larger than the actual PMTU are discarded). This could arise when the PTB messages are not sent back to the sender for some reason (for example, see [RFC2923]).
Examples of where PTB messages are not delivered include the following:
-
The generation of ICMP messages is usually rate limited. This could result in no PTB messages being generated to the sender (see [RFC4443] Section 2.4).
-
ICMP messages can be filtered by middleboxes, including firewalls [RFC4890]. A firewall could be configured with a policy to block incoming ICMP messages, which would prevent reception of PTB messages by a sending endpoint behind this firewall.
-
When the router issuing the ICMP message drops a tunneled packet, the resulting ICMP message is directed to the tunnel ingress. This tunnel endpoint is responsible for forwarding the ICMP message, processing the quoted packet within the payload field to remove the effect of the tunnel and returning a correctly formatted ICMP message to the sender [TUNNELS]. Failure to do this prevents the PTB message from reaching the original sender.
-
Asymmetry in forwarding can result in there being no return route to the original sender, which would prevent an ICMP message from being delivered to the sender. This issue can also arise when either policy-based or Equal-Cost Multipath (ECMP) routing is used or when a middlebox acts as an application load balancer. An example of which is an ECMP router choosing a path toward the server based on the bytes in the IP payload. In this case, if a packet sent by the server encounters a problem after the ECMP router, then the ECMP router needs to direct any resulting ICMP message toward the original sender.
-
There are additional cases where the next-hop destination fails to receive a packet because of its size. This could be due to misconfiguration of the layer 2 path between nodes, for instance the MTU configured in a layer 2 switch, or misconfiguration of the Maximum Receive Unit (MRU). If a packet is dropped by the link, this will not cause a PTB message to be sent to the original sender.
Another failure could result if a node that is not on the network path sends a PTB message that attempts to force a sender to change the effective PMTU [RFC8201]. A sender can protect itself from reacting to such messages by utilizing the quoted packet within a PTB message payload to validate that the received PTB message was generated in response to a packet that had actually originated from the sender. However, there are situations where a sender would be unable to provide this validation.
Examples where the validation of the PTB message is not possible include the following:
-
When a router issuing the ICMP message implements RFC 792 [RFC0792], it is only required to include the first 64 bits of the IP payload of the packet within the quoted payload. There could be insufficient bytes remaining for the sender to interpret the quoted transport information.
Note: The recommendation in RFC 1812 [RFC1812] is that IPv4 routers return a quoted packet with as much of the original datagram as possible without the length of the ICMP datagram exceeding 576 bytes. IPv6 routers include as much of the invoking packet as possible without the ICMPv6 packet exceeding 1280 bytes [RFC4443].
-
The use of tunnels and/or encryption can reduce the size of the quoted packet returned to the original source address, increasing the risk that there could be insufficient bytes remaining for the sender to interpret the quoted transport information.
-
Even when the PTB message includes sufficient bytes of the quoted packet, the network layer could lack sufficient context to validate the message because validation depends on information about the active transport flows at an endpoint node (e.g., the socket/address pairs being used and other protocol header information).
-
When a packet is encapsulated/tunneled over an encrypted transport, the tunnel/encapsulation ingress might have insufficient context, or computational power, to reconstruct the transport header that would be needed to perform validation.
-
When an ICMP message is generated by a router in a network segment that has inserted a header into a packet, the quoted packet could contain additional protocol header information that was not included in the original sent packet and that the PL sender does not process or may not know how to process. This could disrupt the ability of the sender to validate this PTB message.
-
A Network Address Translation (NAT) device that translates a packet header ought to also translate ICMP messages and update the ICMP-quoted packet [RFC5508] in that message. If this is not correctly translated, then the sender would not be able to associate the message with the PL that originated the packet, and hence this ICMP message cannot be validated.
1.2. Packetization Layer Path MTU Discovery
The term Packetization Layer (PL) has been introduced to describe the layer that is responsible for placing data blocks into the payload of IP packets and selecting an appropriate MPS. This function is often performed by a transport protocol (e.g., DCCP, RTP, SCTP, QUIC) but can also be performed by other encapsulation methods working above the transport layer.
In contrast to PMTUD, Packetization Layer Path MTU Discovery (PLPMTUD) [RFC4821] introduces a method that does not rely upon reception and validation of PTB messages. It is therefore more robust than Classical PMTUD. This has become the recommended approach for implementing discovery of the PMTU [BCP145].
This document updates [RFC4821] to specify the PLPMTUD method for datagram PLs and also updates [BCP145] to refer to the method specified in this document for use with UDP datagrams instead of the method in [RFC4821].
It uses a general strategy in which the PL sends probe packets to search for the largest size of unfragmented datagram that can be sent over a network path. Probe packets are sent to explore using a larger packet size. If a probe packet is successfully delivered (as determined by the PL), then the PLPMTU is raised to the size of the successful probe. If a black hole is detected (e.g., where packets of size PLPMTU are consistently not received), the method reduces the PLPMTU.
Datagram PLPMTUD introduces flexibility in implementation. At one extreme, it can be configured to only perform black hole detection and recovery with increased robustness compared to Classical PMTUD. At the other extreme, all PTB processing can be disabled, and PLPMTUD replaces Classical PMTUD.
PLPMTUD can also include additional consistency checks without increasing the risk that data is lost when probing to discover the Path MTU. For example, information available at the PL, or higher layers, enables received PTB messages to be validated before being utilized.
1.3. Path MTU Discovery for Datagram Services
Section 5 of this document presents a set of algorithms for datagram protocols to discover the largest size of unfragmented datagram that can be sent over a network path. The method relies upon features of the PL described in Section 3 and applies to transport protocols operating over IPv4 and IPv6. It does not require cooperation from the lower layers, although it can utilize PTB messages when these received messages are made available to the PL.
The message size guidelines in Section 3.2 of the UDP Usage Guidelines [BCP145] state that "an application SHOULD either use the Path MTU information provided by the IP layer or implement Path MTU Discovery (PMTUD)" but do not provide a mechanism for discovering the largest size of unfragmented datagram that can be used on a network path. The present document updates RFC 8085 to specify this method in place of PLPMTUD [RFC4821] and provides a mechanism for sharing the discovered largest size as the MPS (see Section 4.4).
[RFC4821] Section 10.2 recommended a PLPMTUD probing method for the Stream Control Transport Protocol (SCTP). SCTP utilizes probe packets consisting of a minimal-sized HEARTBEAT chunk bundled with a PAD chunk as defined in [RFC4820]. However, RFC 4821 did not provide a complete specification. The present document replaces that description by providing a complete specification.
The Datagram Congestion Control Protocol (DCCP) [RFC4340] requires implementations to support Classical PMTUD and states that a DCCP sender "MUST maintain the MPS allowed for each active DCCP session". It also defines the current congestion control MPS (CCMPS) supported by a network path. This recommends use of PMTUD and suggests use of control packets (DCCP-Sync) as path probe packets because they do not risk application data loss. The method defined in this specification can be used with DCCP.
Section 4 and Section 5 define the protocol mechanisms and specification for Datagram Packetization Layer Path MTU Discovery (DPLPMTUD).
Section 6 specifies the method for datagram transports and provides information to enable the implementation of PLPMTUD with other datagram transports and applications that use datagram transports.
Section 6 also provides recommendations for SCTP endpoints, updating [RFC4960], [RFC6951], and [RFC8261] to use the method specified in this document instead of the method in [RFC4821].