Skip to main content

8. Fault Management

SCTP provides robust fault detection and recovery mechanisms to ensure reliable communication under network failure conditions.

8.1. Endpoint Failure Detection

SCTP endpoints must be able to detect complete failure of their peer endpoint.

8.1.1. Association-Level Error Counting

Each SCTP association maintains an Association Error Counter:

  • When any destination address's error counter reaches the threshold, the association is considered failed
  • When the association fails, the endpoint SHOULD report to the upper layer

8.1.2. Endpoint Failure Conditions

An endpoint is considered failed when:

  1. All destination addresses are marked as inactive
  2. The association's total error count exceeds the Association.Max.Retrans threshold

Association.Max.Retrans: Recommended default value is 10 retransmission attempts.

8.1.3. Failure Response

When endpoint failure is detected:

1. Stop sending new data to that endpoint
2. Report association failure to upper layer
3. Destroy the Transmission Control Block (TCB)
4. Optional: Send ABORT chunk to notify peer

8.2. Path Failure Detection

SCTP can detect individual transport path failures without affecting the entire association (if other active paths are available).

8.2.1. Path-Level Error Counting

Each destination transport address maintains a Path Error Counter:

  • Incremented on each transmission failure
  • Reset to 0 on successful transmission or receiving HEARTBEAT ACK

8.2.2. Path Failure Conditions

A path is considered inactive when:

  1. Consecutive transmission failures reach Path.Max.Retrans
  2. Consecutive HEARTBEAT failures reach Path.Max.Retrans

Path.Max.Retrans: Recommended default value is 5 retransmission attempts.

8.2.3. Path State Management

Path States:

  • Active: Path available for data transmission
  • Inactive: Path temporarily unavailable

State Transitions:

Active -> Inactive: 
- Consecutive failures reach Path.Max.Retrans

Inactive -> Active:
- Receive valid HEARTBEAT ACK
- Successful data transmission with acknowledgement

8.2.4. Path Failure Response

When the primary path fails:

1. Mark path as inactive
2. Select another active path as new primary path
3. Retransmit unacknowledged data on new primary path
4. Continue monitoring inactive path with HEARTBEAT

Path Selection Strategy:

  • Prefer recently successful paths
  • Consider path RTT and congestion state
  • Round-robin available paths to distribute load (optional)

8.3. Path Heartbeat

The HEARTBEAT mechanism is used to actively monitor destination address reachability.

8.3.1. HEARTBEAT Sending Rules

An endpoint SHOULD periodically send HEARTBEAT to each idle destination address:

Sending Interval:

HB.interval: Recommended default value is 30 seconds
Configurable range: 1 second to several minutes

Sending Conditions:

  • Destination has not sent any data within HB.interval time
  • Destination is currently inactive (probe more frequently)

HEARTBEAT Contents:

- Heartbeat Information TLV
- Sending timestamp
- Destination address information
- Optional: Sender-specific information

8.3.2. HEARTBEAT ACK Processing

Upon receiving HEARTBEAT ACK:

1. Calculate RTT = current time - sending timestamp
2. Update destination's RTO
3. Mark destination as active
4. Reset path error counter to 0

8.3.3. HEARTBEAT Timeout Processing

If HEARTBEAT ACK is not received within RTO time:

1. Increment path error counter
2. If error count >= Path.Max.Retrans:
- Mark path as inactive
- If primary path, select new primary path
3. Continue sending HEARTBEAT to probe for recovery

8.3.4. On-Demand HEARTBEAT

Besides periodic HEARTBEAT, an endpoint MAY send on-demand HEARTBEAT when:

  • Receiving peer's address list update
  • Suspecting path may have recovered
  • Needing quick path reachability verification

8.4. Handle "Out of the Blue" Packets

"Out of the Blue" packets are SCTP packets received by an endpoint that don't match any known association.

8.4.1. Identifying Out of the Blue Packets

A packet is considered "Out of the Blue" when:

  1. Verification Tag doesn't match any existing association
  2. Source address and port don't match any existing association
  3. Destination port matches but no corresponding association exists

8.4.2. Out of the Blue Packet Handling Rules

Receiving unexpected INIT chunk:

If endpoint is in CLOSED state:
- Respond with INIT ACK per normal procedure
Otherwise:
- Silently discard

Receiving unexpected ABORT chunk:

- If T bit is set:
- Verify using packet's Verification Tag
- Silently accept and discard

Receiving unexpected SHUTDOWN COMPLETE chunk:

- Verify T bit
- Silently accept and discard

Receiving other unexpected chunks:

Send ABORT chunk:
- Use received packet's Verification Tag
- Error cause: "Out of the Blue"
- T bit set to 1

8.4.3. ABORT Chunk Sending

When sending ABORT chunk in response to Out of the Blue packet:

ABORT Chunk Format:
- Chunk Type = 6
- T bit = 1
- Verification Tag = Verification Tag from received packet
- Error Cause (optional):
- Cause Code = 8 (Out of the Blue)
- Cause Info = Copy of received packet

8.4.4. Security Considerations

Security measures when handling Out of the Blue packets:

  1. Rate limit responses: Avoid being used for amplification attacks
  2. Verify source address: Validate source address legitimacy when possible
  3. Log anomalies: Log frequent Out of the Blue packets to detect attacks

8.5. Verification Tag Usage

The Verification Tag is SCTP's key security mechanism to prevent packet forgery and injection attacks.

8.5.1. Verification Tag Rules

When sending packets:

- Use Initiate Tag provided by peer in INIT or INIT ACK
- As Verification Tag field in SCTP common header

When receiving packets:

- Verify Verification Tag matches local Tag
- Discard packet if doesn't match (except special cases)

Special Cases:

  • INIT chunk: Verification Tag must be 0
  • SHUTDOWN COMPLETE and ABORT: Can use T bit to indicate which Tag to use

8.5.2. Verification Failure Handling

When receiving packet with incorrect Verification Tag:

If INIT chunk:
- Handle per Section 8.4
If ABORT or SHUTDOWN COMPLETE with T bit=1:
- Verify using Verification Tag from packet
Otherwise:
- Silently discard packet
- Do not send any response

Summary

SCTP's fault management mechanisms provide multi-layered robustness:

  1. Multi-path redundancy: Single path failure doesn't affect association
  2. Active monitoring: HEARTBEAT mechanism actively detects path status
  3. Fast failover: Immediately switch to backup path upon failure detection
  4. Defense mechanism: Verification Tag prevents malicious packet injection
  5. Configurable thresholds: Allow adjustment of failure detection sensitivity based on network conditions

These mechanisms together ensure SCTP's reliability and availability under various network failure scenarios.