Skip to main content

5. Datagram Packetization Layer PMTUD

This section specifies Datagram PLPMTUD (DPLPMTUD). The method can be introduced at various points in the IP protocol stack to discover the PLPMTU so that an application can utilize an appropriate MPS for the current network path.

DPLPMTUD SHOULD be performed only at one layer between a pair of endpoints. Therefore, when DPLPMTUD is enabled at a lower layer, an upper layer PL or application ought to avoid using DPLPMTUD. A PL MUST adjust the MPS indicated by DPLPMTUD to account for any additional overhead introduced by the PL.

DPLPMTUD Implementation Location Examples

Application Data  *

QUIC/RTP * (Can implement DPLPMTUD)

UDP * (Can implement DPLPMTUD)

IP Layer

Network Interface

The central idea of DPLPMTUD is probing by the sender. Probe packets are sent to find the maximum size of user message that can be completely transferred from the sender to the destination.

The following sections identify the components required for implementation, provide an overview of the operational phases, and specify the state machine and search algorithm.

5.1. DPLPMTUD Components

This section describes the timers, constants, and variables of DPLPMTUD.

5.1.1. Timers

The method utilizes up to three timers:

PROBE_TIMER

  • Configuration: Timeout greater than the maximum time to receive an acknowledgment to a probe packet
  • Minimum: MUST NOT be less than 1 second
  • Recommended: SHOULD be larger than 15 seconds
  • Reference: Section 3.1.1 of the UDP Usage Guidelines [BCP145] provides guidance on selection of timer values

PMTU_RAISE_TIMER

  • Function: The period a sender will continue to use the current PLPMTU, after which it reenters the Search Phase
  • Period: 600 seconds, as recommended by PLPMTUD [RFC4821]
  • Optimization: DPLPMTUD MAY inhibit sending probe packets when no application data has been sent since the last probe packet. A PL preferring to use the latest PMTU when again sending user data can choose to continue PMTU discovery for each path. However, this will result in sending additional packets

CONFIRMATION_TIMER

  • Applicability: MUST NOT be used when an Acknowledged PL is used
  • Function: For other PLs, configured as the period a PL sender waits before confirming the current PLPMTU is still supported
  • Relationship: Smaller than PMTU_RAISE_TIMER, used to reduce the PLPMTU (e.g., when a black hole is encountered)
  • Frequency: Confirmation needs to be frequent enough that the sending PL does not black-hole a large amount of traffic when data is flowing
  • Reference: Section 3.1.1 of the UDP Usage Guidelines [BCP145] provides guidance on selection of timer values
  • Optimization: DPLPMTUD MAY inhibit sending probe packets when no application data has been sent since the last probe packet

Note: DPLPMTUD specifies various timers; however, an implementation can choose to realize these timer functions using a single timer.

5.1.2. Constants

The following constants are defined:

MAX_PROBES

  • Definition: The maximum value of the PROBE_COUNT counter
  • Meaning: Represents a limit on the number of consecutive probe attempts of any size
  • Benefit: A MAX_PROBES value greater than 1 can provide robustness to isolated packet loss
  • Default: 3

MIN_PLPMTU

  • Definition: The smallest PLPMTU size that DPLPMTUD will attempt to use
  • Configuration: An endpoint might need to configure MIN_PLPMTU to provide space for extension headers and other encapsulation at layers below the PL
  • Path Dependency: This value can be interface and path dependent
  • IPv6: This size is greater than or equal to the size at the PL that results in a 1280-byte IPv6 packet, as specified in [RFC8200]
  • IPv4: This size is greater than or equal to the size at the PL that results in a 68-byte IPv4 packet
    • Note: IPv4 routers are required to be able to forward a datagram of 68 bytes without further fragmentation. This is the combined size of an IPv4 header and the minimum fragment size of 8 bytes. In addition, receivers are required to be able to reassemble fragmented datagrams at least 576 bytes in size, as stated in Section 3.3.3 of [RFC1122]

MAX_PLPMTU

  • Definition: The largest PLPMTU size
  • Limitation: Must be less than or equal to the maximum size of PL packet that can be sent on the outgoing interface (constrained by the local interface MTU)
  • Consideration: Ought also to be less than the maximum size of PL packet that the remote endpoint can receive (constrained by EMTU_R) when this is known
  • Design Limitation: Can be limited by the design or configuration of the PL in use
  • Application Limitation: An application or PL MAY choose a smaller MAX_PLPMTU when there is no need to send packets larger than a specific size

BASE_PLPMTU

  • Definition: A configured size expected to work for most paths
  • Range: Equal to or larger than MIN_PLPMTU and smaller than MAX_PLPMTU
  • Recommended: For most PLs, a suitable BASE_PLPMTU will be larger than 1200 bytes
  • IPv4: When using IPv4, there is no currently specified equivalent size, a RECOMMENDED default BASE_PLPMTU of 1200 bytes

5.1.3. Variables

This method utilizes a set of variables:

PROBED_SIZE

  • Definition: The size of the current probe packet as determined at the PL
  • Nature: This is a tentative value for the PLPMTU, awaiting confirmation

PROBE_COUNT

  • Definition: A count of the number of successive unsuccessful probe packets that have been sent
  • Reset: This is set to zero each time a probe packet is acknowledged
  • Note: Some loss of probes is expected during a search, so the loss of a single probe is not an indication of a PMTU problem

Packet Size Relationship Diagram

MAX_PLPMTU ────────┐


PROBED_SIZE ───────┤ (Under Probe)


PLPMTU ────────────┤ (Currently Used)


BASE_PLPMTU ───────┤ (Baseline)


MIN_PLPMTU ────────┘ (Minimum)

The diagram above illustrates the relationship between the packet size constants and variables when the DPLPMTUD algorithm performs path probing to increase the PLPMTU size. Probe packets of size PROBED_SIZE have been sent. Once acknowledged, the PLPMTU will be raised to PROBED_SIZE, allowing the DPLPMTUD algorithm to further increase PROBED_SIZE, moving toward sending probe packets of the actual PMTU size.

5.1.4. Overview of DPLPMTUD Phases

This section provides a high-level, informative view of the DPLPMTUD method by describing movement of the method through several operational phases. More detail can be found in the state machine (Section 5.2).

DPLPMTUD Phase Flow Diagram

Initial → Base Phase → Search Phase → Search Complete Phase
↓ ↑
└──────→ Error Phase ←──────────────┘

Base Phase

  • Purpose: The Base Phase uses packets of size BASE_PLPMTU to confirm connectivity to the remote peer
  • Connection Confirmation: For a connection-oriented PL, connection confirmation is implicit (can be performed in the PL connection handshake). A connectionless PL sends probe packets and uses acknowledgment of this probe packet to confirm the remote peer is reachable
  • PLPMTU Confirmation: The sender also confirms that the network path supports BASE_PLPMTU. This can be achieved by using PL mechanisms (e.g., using a handshake packet of size BASE_PLPMTU) or by sending a probe packet of BASE_PLPMTU size and confirming reception of that probe packet
  • Probe Timing: A probe packet of BASE_PLPMTU size can be sent immediately upon entry to the Base Phase (following the connection check). A PL not wishing to support paths with a PLPMTU less than BASE_PLPMTU can simplify this phase to a single step by performing the connection check using a probe of BASE_PLPMTU size
  • Success: Once confirmed, DPLPMTUD enters the Search Phase
  • Failure: If the Base Phase fails to confirm BASE_PLPMTU, DPLPMTUD enters the Error Phase

Search Phase

  • Purpose: The Search Phase utilizes a search algorithm to send probe packets to seek to increase the PLPMTU
  • Termination: The algorithm concludes by entering the Search Complete Phase when a suitable PLPMTU is found
  • PTB Response: The PL can respond to PTB messages using PTB messages to advance or terminate the search, see Section 4.6

Search Complete Phase

  • State: The Search Complete Phase is entered when the PLPMTU is supported on a network path
  • Periodic Confirmation: The PL can use the CONFIRMATION_TIMER to periodically repeat probe packets of the current PLPMTU size
  • Black Hole Detection: If the sender is unable to confirm reachability (e.g., if the CONFIRMATION_TIMER expires) or the PL signals a lack of reachability, then a black hole is detected and DPLPMTUD enters the Base Phase
  • Periodic Search: The PMTU_RAISE_TIMER is used to periodically resume the Search Phase to discover whether the PLPMTU can be raised

Error Phase

  • Trigger: The Error Phase is entered when the PLPMTU information for a path is conflicting or invalid (e.g., cannot support BASE_PLPMTU), which prevents DPLPMTUD from continuing and reduces the PLPMTU
  • Mitigation: This state implements a method to mitigate oscillations in the state event engine. It signals a conservative MPS value to higher layers via the PL
  • Exit: This state is exited when probe packets no longer detect an error. The PL sender then enters the Search State

Robustness: A method solely reducing the PLPMTU to a suitable size is sufficient to ensure reliable operation, but could be very inefficient when the actual PMTU changes or when the method (for whatever reason) makes a suboptimal choice for the PLPMTU.

Complete Implementation: A complete implementation of DPLPMTUD provides an algorithm that allows a DPLPMTUD sender to increase the PLPMTU following changes in the path characteristics, such as when a link is reconfigured with a larger MTU, or when there is a change to the set of links traversed by an end-to-end flow (e.g., after a routing or path failover decision).

5.2. State Machine

The state machine for DPLPMTUD is depicted below. If multipath or multihoming is supported, a state machine is needed for each path.

Note: For clarity, the diagram does not show all transitions.

State Machine Diagram

        [DISABLED]
↓ ↑
Connection Established/Lost
↓ ↑
[BASE] ←──────────────┐
↓ │
Probe Success Black Hole Detection
↓ │
[SEARCHING] ─────────────┤
↓ │
Probe Complete/Fail │
↓ │
[SEARCH_COMPLETE] ──────────┘
↑ ↓
Periodic Raise/Black Hole Detection

[ERROR]
(Error Handling)

State Definitions

DISABLED

  • Initial State: The initial state before probing has started
  • Entry Condition: Entered from any other state when the PL indicates a loss of connectivity
  • Exit Condition: Leaving this state once the PL indicates connectivity to the remote PL
  • Transition: When transitioning to BASE state, a probe packet of size BASE_PLPMTU can be sent immediately

BASE

  • Purpose: Used to confirm the network path supports the BASE_PLPMTU size, intended to allow an application to continue to work when the actual PMTU is temporarily reduced. It also seeks to avoid a sender using DPLPMTUD from not knowing that packets are undelivered due to a packet or ICMP black hole for an extended period during which it is searching for a larger PLPMTU
  • On Entry: PROBED_SIZE is set to the BASE_PLPMTU size and PROBE_COUNT is set to zero
  • Probing: Each time a probe packet is sent, the PROBE_TIMER is started
  • Successful Exit: The state is exited when a probe packet is acknowledged, the PL sender enters the SEARCHING state
  • Failure Exit: The state is also left when the PROBE_COUNT reaches MAX_PROBES or a validated PTB message is received. This causes the PL sender to enter the ERROR state

SEARCHING

  • Primary State: This is the main probing state
  • Entry Condition: Entered when a probe of BASE_PLPMTU completes
  • Successful Probe: Each time a probe packet is acknowledged, PROBE_COUNT is set to zero, PLPMTU is set to PROBED_SIZE, and PROBED_SIZE is then increased using the search algorithm (as described in Section 5.3)
  • Probe Failure: When a probe packet is sent without being acknowledged within the PROBE_TIMER period, PROBE_COUNT is incremented and a new probe is transmitted
  • Exit Condition: Exiting when PROBE_COUNT reaches MAX_PROBES to enter SEARCH_COMPLETE, a validated PTB is received corresponding to the last successful probe size (PL_PTB_SIZE = PLPMTU), or a probe of MAX_PLPMTU size is acknowledged (PLPMTU = MAX_PLPMTU)
  • Black Hole Detection: When a black hole is detected while in the SEARCHING state, this causes the PL sender to enter the BASE state

SEARCH_COMPLETE

  • Completion Flag: Indicates the search has completed. This is the normal maintenance state where the PL is not probing to update the PLPMTU
  • Duration: DPLPMTUD remains in this state until either the PMTU_RAISE_TIMER expires or a black hole is detected
  • Unacknowledged PL: When DPLPMTUD uses an Unacknowledged PL and is in the SEARCH_COMPLETE state, the CONFIRMATION_TIMER periodically resets PROBE_COUNT and schedules a probe packet of size PLPMTU. If MAX_PROBES successive PLPMTU-sized probe packets fail to be acknowledged, the method enters the BASE state
  • Acknowledged PL: When used with an Acknowledged PL (e.g., SCTP), DPLPMTUD SHOULD NOT continue to generate PLPMTU probes in this state

ERROR

  • Failure Situation: Indicates the network path is not known to support a PLPMTU of at least BASE_PLPMTU size or there is contradictory information about the network path that could otherwise cause the MPS signal to higher layers to oscillate excessively
  • Oscillation Mitigation: This state implements a method to mitigate oscillations in the state event engine
  • Conservative Value: It signals a conservative MPS value to higher layers via the PL
  • Exit: This state is exited when probe packets no longer detect an error. The PL sender then enters the SEARCHING state
  • Endpoint Fragmentation: The implementation permits enabling endpoint fragmentation if DPLPMTUD is unable to validate MIN_PLPMTU within PROBE_COUNT probes
  • Disable: If DPLPMTUD is unable to validate MIN_PLPMTU, implementations will transition to the DISABLED state
  • Note: MIN_PLPMTU can be the same as BASE_PLPMTU, simplifying the operation of this state

5.3. Search to Increase the PLPMTU

This section describes the algorithms used by DPLPMTUD to search for a larger PLPMTU.

5.3.1. Probing for a Larger PLPMTU

Implementations use a search algorithm across the search range to determine whether the network path can support a larger PLPMTU.

The method discovers the search range by confirming the minimum PLPMTU and then using probing to select a PROBED_SIZE less than or equal to MAX_PLPMTU. The MAX_PLPMTU is the minimum of the local MTU and EMTU_R (when learned from the remote endpoint). MAX_PLPMTU MAY be reduced by an application that sets a maximum to the size of datagrams it will send.

When the first probe of size greater than or equal to PLPMTU is sent, PROBE_COUNT is initialized to zero. Each probe packet that is successfully sent to the remote peer is confirmed by an acknowledgment from the PL (see Section 4.1).

Each time a probe packet is sent to the destination, the PROBE_TIMER is started. The timer is canceled when the PL receives an acknowledgment that the probe packet has been successfully sent across the path (Section 4.1). This confirms PROBED_SIZE is supported, and the PROBED_SIZE value is then assigned to PLPMTU. The search algorithm can continue to send subsequent probes of increasing size.

If the timer expires before a probe packet is acknowledged, the probe has failed to confirm PROBED_SIZE. Each time the PROBE_TIMER expires, PROBE_COUNT is incremented, the PROBE_TIMER is reinitialized, and a new probe of the same size or any other size (as determined by the search algorithm) can be sent. A maximum number of consecutive failed probes (MAX_PROBES) is configured. If the value of PROBE_COUNT reaches MAX_PROBES, probing will stop, and the PL sender enters the SEARCH_COMPLETE state.

5.3.2. Selection of Probe Sizes

The search algorithm determines the minimum useful increase in the PLPMTU. It is not constructive for the PL sender to attempt to probe all sizes. This would impose unnecessary load on the path. Implementations SHOULD select a set of probe packet sizes to maximize the gain in PLPMTU from each search step.

Implementations can optimize the search procedure by selecting step sizes from a table of common PMTU sizes. When selecting an appropriate next size to search, implementers ought also to consider common MPS sizes that applications might seek to use and that there could be common MTU sizes in use within the network.

5.3.3. Resilience to Inconsistent Path Information

The decision to increase the PLPMTU needs to be resilient to the possibility of inconsistency in the information that has been learned about the network path. Inconsistency in the path can arise when probe packets are lost for reasons other than the packet size (i.e., not size-related loss) or due to frequent path changes. Frequent path changes could result from unexpected "jitter" -- where some packets from a flow are delivered along one path, but other packets follow a different path with different properties.

A PL sender is able to detect inconsistency from a sequence of acknowledged PLPMTU probes or from a sequence of PTB messages that it receives. A PL sender can use an alternative search pattern when it detects inconsistent path information, one that limits the MPS that is provided to a smaller value for a period of time. This avoids unnecessary packet loss.

5.4. Robustness to Inconsistent Paths

Some paths could be unable to sustain packets of size BASE_PLPMTU. The Error State can be implemented to provide robustness for such paths. This allows fallback to a PLPMTU smaller than desired rather than suffer connection failure. This can utilize methods such as endpoint IP fragmentation to enable the PL sender to communicate using packets smaller than BASE_PLPMTU.

Algorithm Summary

Key elements of the DPLPMTUD algorithm:

  1. Conservative Start: Begin from BASE_PLPMTU
  2. Progressive Probing: Gradually increase probe size
  3. Confirmation Mechanism: Verify each size is available
  4. Black Hole Detection: Quickly respond to path problems
  5. Periodic Maintenance: Keep PLPMTU up-to-date
  6. Error Recovery: Handle exceptional situations

These mechanisms together ensure that DPLPMTUD can work reliably and efficiently under various network conditions.