1. Introduction
HTTP does not define the means to protect the data integrity of content or representations. When HTTP messages are transferred between endpoints, lower-layer features or properties such as TCP checksums or TLS records [TLS] can provide some integrity protection. However, transport-oriented integrity provides a limited utility because it is opaque to the application layer and only covers the extent of a single connection. HTTP messages often travel over a chain of separate connections. In between connections, there is a possibility for data corruption. An HTTP integrity mechanism can provide the means for endpoints, or applications using HTTP, to detect data corruption and make a choice about how to act on it. An example use case is to aid fault detection and diagnosis across system boundaries.
This document defines two digest integrity mechanisms for HTTP. First, content integrity, which acts on conveyed content (Section 6.4 of [HTTP]). Second, representation data integrity, which acts on representation data (Section 8.1 of [HTTP]). This supports advanced use cases, such as validating the integrity of a resource that was reconstructed from parts retrieved using multiple requests or connections.
This document obsoletes [RFC3230] and therefore the Digest and Want-Digest HTTP fields; see Section 1.3.
1.1. Document Structure
This document is structured as follows:
-
New request and response header and trailer field definitions.
- Section 2 (Content-Digest),
- Section 3 (Repr-Digest), and
- Section 4 (Want-Content-Digest and Want-Repr-Digest).
-
Considerations specific to representation data integrity.
- Section 3.1 (State-changing requests),
- Section 3.2 (Content-Location),
- Appendix A contains worked examples of representation data in message exchanges, and
- Appendixes B and C contain worked examples of Repr-Digest and Want-Repr-Digest fields in message exchanges.
-
Section 5 presents hash algorithm considerations and defines registration procedures for future entries.
1.2. Concept Overview
The HTTP fields defined in this document can be used for HTTP integrity. Senders choose a hashing algorithm and calculate a digest from an input related to the HTTP message. The algorithm identifier and digest are transmitted in an HTTP field. Receivers can validate the digest for integrity purposes. Hashing algorithms are registered in the "Hash Algorithms for HTTP Digest Fields" registry (see Section 7.2).
Selecting the data on which digests are calculated depends on the use case of the HTTP messages. This document provides different fields for HTTP representation data and HTTP content.
There are use cases where a simple digest of the HTTP content bytes is required. The Content-Digest request and response header and trailer field is defined to support digests of content (Section 6.4 of [HTTP]); see Section 2.
For more advanced use cases, the Repr-Digest request and response header and trailer field (Section 3) is defined. It contains a digest value computed by applying a hashing algorithm to selected representation data (Section 8.1 of [HTTP]). Basing Repr-Digest on the selected representation makes it straightforward to apply it to use cases where the message content requires some sort of manipulation to be considered as representation of the resource or the content conveys a partial representation of a resource, such as range requests (see Section 14 of [HTTP]).
Content-Digest and Repr-Digest support hashing algorithm agility. The Want-Content-Digest and Want-Repr-Digest fields allow endpoints to express interest in Content-Digest and Repr-Digest, respectively, and to express algorithm preferences in either.
Content-Digest and Repr-Digest are collectively termed "Integrity fields". Want-Content-Digest and Want-Repr-Digest are collectively termed "Integrity preference fields".
Integrity fields are tied to the Content-Encoding and Content-Type header fields. Therefore, a given resource may have multiple different digest values when transferred with HTTP.
Integrity fields apply to HTTP message content or HTTP representations. They do not apply to HTTP messages or fields. However, they can be combined with other mechanisms that protect metadata, such as digital signatures, in order to protect the phases of an HTTP exchange in whole or in part. For example, HTTP Message Signatures [SIGNATURES] could be used to sign Integrity fields, thus providing coverage for HTTP content or representation data.
This specification does not define means for authentication, authorization, or privacy.
1.3. Obsoleting RFC 3230
[RFC3230] defined the Digest and Want-Digest HTTP fields for HTTP integrity. It also coined the terms "instance" and "instance manipulation" in order to explain concepts, such as selected representation data (Section 8.1 of [HTTP]), that are now more universally defined and implemented as HTTP semantics.
Experience has shown that implementations of [RFC3230] have interpreted the meaning of "instance" inconsistently, leading to interoperability issues. The most common issue relates to the mistake of calculating the digest using (what we now call) message content, rather than using (what we now call) representation data as was originally intended. Interestingly, time has also shown that a digest of message content can be beneficial for some use cases, so it is difficult to detect if non-conformance to [RFC3230] is intentional or unintentional.
In order to address potential inconsistencies and ambiguity across implementations of Digest and Want-Digest, this document obsoletes [RFC3230]. The Integrity fields (Sections 2 and 3) and Integrity preference fields (Section 4) defined in this document are better aligned with current HTTP semantics and have names that more clearly articulate the intended usages.
1.4. Notational Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This document uses the Augmented BNF defined in [RFC5234] and updated by [RFC7405]. This includes the rules CR (carriage return), LF (line feed), and CRLF (CR LF).
This document uses the following terminology from Section 3 of [STRUCTURED-FIELDS] to specify syntax and parsing: Boolean, Byte Sequence, Dictionary, Integer, and List.
The definitions "representation", "selected representation", "representation data", "representation metadata", "user agent", and "content" in this document are to be interpreted as described in [HTTP].
This document uses the line folding strategies described in [FOLDING].
Hashing algorithm names respect the casing used in their definition document (e.g., SHA-1, CRC32c).
HTTP messages indicate hashing algorithms using an Algorithm Key (algorithms). Where the document refers to an Algorithm Key in prose, it is quoted (e.g., "sha", "crc32c").
The term "checksum" describes the output of applying an algorithm to a sequence of bytes, whereas "digest" is only used in relation to the value contained in the fields.
"Integrity fields" is the collective term for Content-Digest and Repr-Digest.
"Integrity preference fields" is the collective term for Want-Repr-Digest and Want-Content-Digest.