8. Representation Data and Metadata
8.1. Representation Data
The representation data associated with an HTTP message is either provided as the content of the message or referred to by the message semantics and the target URI. The representation data is in a format and encoding defined by the representation metadata header fields.
The data type of the representation data is determined via the Content-Type and Content-Encoding header fields. These define a two-layer, ordered encoding model:
representation-data := Content-Encoding( Content-Type( data ) )
8.2. Representation Metadata
Representation header fields provide metadata about the representation. When a message includes content, the representation header fields describe how to interpret that data. In a response to a HEAD request, the representation header fields describe the representation data that would have been enclosed in the content if the same request had been a GET.
8.3. Content-Type
The "Content-Type" header field indicates the media type of the associated representation: either the representation enclosed in the message content or the selected representation, as determined by the message semantics. The indicated media type defines both the data format and how that data is intended to be processed by a recipient, within the scope of the received message semantics, after any content codings indicated by Content-Encoding are decoded.
Content-Type = media-type
Media types are defined in Section 8.3.1. An example of the field is
Content-Type: text/html; charset=ISO-8859-4
A sender that generates a message containing content SHOULD generate a Content-Type header field in that message unless the sender does not know the intended media type of the enclosed representation. If a Content-Type header field is not present, the recipient MAY either assume a media type of "application/octet-stream" ([RFC2046], Section 4.5.1) or examine the data to determine its type.
In practice, resource owners do not always properly configure their origin server to provide the correct Content-Type for a given representation. Some user agents examine the content and, in certain cases, override the received type (for example, see [Sniffing]). This "MIME sniffing" risks drawing incorrect conclusions about the data, which might expose the user to additional security risks (e.g., "privilege escalation"). Furthermore, it is often the case that different media types use the same data format but differ only in the intended processing of that data, which cannot be distinguished by inspecting the data alone. When sniffing is implemented, implementers are encouraged to provide a means for the user to disable it.
Although Content-Type is defined as a singleton field, it is sometimes incorrectly generated multiple times, resulting in a combined field value that appears to be a list. Recipients often attempt to handle this error by using the last syntactically valid member of the list, which can lead to potential interoperability and security issues if different implementations have different error handling behaviors.
8.3.1. Media Type
HTTP uses media types [RFC2046] in the Content-Type (Section 8.3) and Accept (Section 12.5.1) header fields in order to provide open and extensible data typing and type negotiation. Media types define both a data format and various processing models: how to process that data in accordance with the message context.
media-type = type "/" subtype parameters
type = token
subtype = token
The type/subtype MAY be followed by semicolon-delimited parameters (Section 5.6.6) in the form of name/value pairs. The presence or absence of a parameter might be significant to the processing of a media type, depending on its definition within the media type registry.
A parameter value that matches the token production can be transmitted without quotation marks. However, a parameter value with an invalid character (i.e., that does not match the token production) MUST be quoted with double quotes (Section 5.6.4) when sent.
parameter = parameter-name "=" parameter-value
parameter-name = token
parameter-value = ( token / quoted-string )
Note: Some recipients of media type parameters treat them in a case-sensitive fashion. It is generally recommended to use lowercase for parameter names and use the most common case for parameter values.
Media types ought to be registered with IANA according to the procedures defined in [BCP13].
Note: Unlike some similar constructs in other header fields, media type parameters do not allow whitespace (even "bad" whitespace) around the "=" character.
8.3.2. Charset
HTTP uses charset names in the Content-Type (Section 8.3) and Accept-Charset (deprecated; Section 12.5.2) header fields to indicate or negotiate the character encoding scheme of a textual representation. A charset is identified by a case-insensitive token.
charset = token
Charset names ought to be registered in the IANA "Character Sets" registry (````http://www.iana.org/assignments/character-sets\````) according to the procedures defined in Section 2 of [RFC2978].
Note: The "charset" parameter on a media type might have different semantics depending on the specific media type definition.
The "charset" parameter is often used with media types for textual content. While standards defining such media types ought to prescribe the use of a particular charset if it is at all likely to improve interoperability, users and implementers of HTTP are strongly encouraged to specify a charset explicitly when it affects the interpretation of received content, even when the media type definition defaults to a charset in the absence of the charset parameter (as is often the case for "text" media types).
8.3.3. Multipart Types
MIME provides for a number of "multipart" types -- encapsulations of one or more representations within a single message content. All multipart types share a common syntax, as defined in Section 5.1 of [RFC2046], and include a boundary parameter as part of the media type value. The message content is itself a protocol element; a sender MUST generate only CRLF to represent line breaks between body parts.
HTTP message framing does not use the multipart boundary as an indicator of message body length, though it might be used by implementations that generate or process the content. For example, the "multipart/form-data" type is often used for carrying form data in a request, as described in [RFC7578], and the "multipart/byteranges" type is defined by this specification for use in some 206 (Partial Content) responses (see Section 15.3.7).
8.4. Content-Encoding
The "Content-Encoding" header field indicates what content codings have been applied to the representation, beyond those inherent in the media type, and thus what decoding mechanisms have to be applied in order to obtain data in the media type referenced by the Content-Type header field. Content-Encoding is primarily used to allow a representation's data to be compressed without losing the identity of its underlying media type.
Content-Encoding = #content-coding
An example of its use is
Content-Encoding: gzip
If one or more encodings have been applied to a representation, the sender that applied the encodings MUST generate a Content-Encoding header field that lists the content codings in the order in which they were applied. Note that the coding named "identity" is reserved for its special role in Accept-Encoding and thus SHOULD NOT be included.
Additional information about the encoding parameters can be provided by other header fields not defined by this specification.
Unlike Transfer-Encoding (Section 6.1 of [HTTP/1.1]), the codings listed in Content-Encoding are a characteristic of the representation; the representation is defined in terms of the coded form, and all other metadata about the representation is about the coded form unless otherwise noted in the metadata definition. Typically, the representation is only decoded just prior to rendering or analogous usage.
If the media type includes an inherent encoding, such as a data format that is always compressed, then that encoding would not be restated in Content-Encoding even if it happens to be the same algorithm as one of the content codings. Such a content coding would only be listed if, for some bizarre reason, it is applied a second time to form the representation. Likewise, an origin server might choose to publish the same data as multiple representations that differ only in whether the coding is defined as part of Content-Type or Content-Encoding, since some user agents will behave differently in their handling of each response (e.g., open a "Save as ..." dialog instead of automatic decompression and rendering of content).
An origin server MAY respond with a status code of 415 (Unsupported Media Type) if a representation in the request message has a content coding that is not acceptable.
8.4.1. Content Codings
Content coding values indicate an encoding transformation that has been or can be applied to a representation. Content codings are primarily used to allow a representation to be compressed or otherwise usefully transformed without losing the identity of its underlying media type and without loss of information. Frequently, the representation is stored in coded form, transmitted directly, and only decoded by the final recipient.
content-coding = token
All content codings are case-insensitive and ought to be registered within the "HTTP Content Coding Registry", as defined in Section 16.6.1. They are used in the Accept-Encoding (Section 12.5.3) and Content-Encoding (Section 8.4) header fields.
The following content codings are defined by this specification:
- compress (and x-compress): See Section 8.4.1.1.
- deflate: See Section 8.4.1.2.
- gzip (and x-gzip): See Section 8.4.1.3.
8.4.1.1. Compress Coding
The "compress" coding is an adaptive Lempel-Ziv-Welch (LZW) coding [Welch] that is commonly produced by the UNIX file compression program "compress". A recipient SHOULD consider "x-compress" to be equivalent to "compress".
8.4.1.2. Deflate Coding
The "deflate" coding is a "zlib" data format [RFC1950] containing a "deflate" compressed data stream [RFC1951] that uses a combination of the Lempel-Ziv (LZ77) compression algorithm and Huffman coding.
Note: Some non-conformant implementations send the "deflate" compressed data without the zlib wrapper.
8.4.1.3. Gzip Coding
The "gzip" coding is an LZ77 coding with a 32-bit Cyclic Redundancy Check (CRC) that is commonly produced by the gzip file compression program [RFC1952]. A recipient SHOULD consider "x-gzip" to be equivalent to "gzip".
8.5. Content-Language
The "Content-Language" header field describes the natural language(s) of the intended audience for the representation. Note that this might not be equivalent to all the languages used within the representation.
Content-Language = #language-tag
Language tags are defined in Section 8.5.1. The primary purpose of Content-Language is to allow a user to identify and differentiate representations according to the users' own preferred language. Thus, if the content is intended only for a Danish-literate audience, the appropriate field is
Content-Language: da
If no Content-Language is specified, the default is that the content is intended for all language audiences. This might mean that the sender does not consider it to be specific to any natural language, or that the sender does not know for which language it is intended.
Multiple languages MAY be listed for content that is intended for multiple audiences. For example, a rendition of the "Treaty of Waitangi", presented simultaneously in the original Maori and English versions, would call for
Content-Language: mi, en
However, just because multiple languages are present within a representation does not mean that it is intended for multiple linguistic audiences. An example would be a beginner's language primer, such as "A First Lesson in Latin", which is clearly intended to be used by an English-literate audience. In this case, the Content-Language would properly only include "en".
Content-Language MAY be applied to any media type -- it is not limited to textual documents.
8.5.1. Language Tags
A language tag, as defined in [RFC5646], identifies a natural language spoken, written, or otherwise conveyed by human beings for communication of information to other human beings. Computer languages are explicitly excluded.
HTTP uses language tags within the Accept-Language and Content-Language header fields. Accept-Language uses the broader language-range production defined in Section 12.5.4, whereas Content-Language uses the language-tag production defined below.
language-tag = `<Language-Tag, see [RFC5646], Section 2.1>`
A language tag is a sequence of one or more case-insensitive subtags, each separated by a hyphen character ("-", %x2D). In most cases, a language tag consists of a primary language subtag that identifies a broad family of related languages (e.g., "en" = English) and is optionally followed by a series of subtags that refine or narrow that language's range (e.g., "en-CA" = the variety of English as communicated in Canada). Whitespace is not allowed within a language tag. Example tags include:
fr, en-US, es-419, az-Arab, x-pig-latin, man-Nkoo-GN
See [RFC5646] for further information.
8.6. Content-Length
The "Content-Length" header field indicates the associated representation's data length as a decimal non-negative integer number of octets. When transferring a representation as content, Content-Length refers specifically to the amount of data enclosed so that it can be used to delimit framing (e.g., Section 6.2 of [HTTP/1.1]). In other cases where a complete representation is expected, Content-Length refers to the representation's current selected length.
Content-Length = 1*DIGIT
An example is
Content-Length: 3495
A user agent SHOULD send Content-Length in a request when the method defines a meaning for enclosed content and it is not sending Transfer-Encoding. For example, a Content-Length header field is normally sent in a POST request even when the value is 0 (indicating empty content).
A user agent SHOULD NOT send a Content-Length header field when the request message does not contain content and the method semantics do not anticipate such data.
A server MAY send a Content-Length header field in a response to a HEAD request (Section 9.3.2); a server MUST NOT send Content-Length in such a response unless its field value equals the decimal number of octets that would have been sent in the content of a response if the same request had used the GET method.
A server MAY send a Content-Length header field in a 304 (Not Modified) response to a conditional GET request (Section 15.4.5); a server MUST NOT send Content-Length in such a response unless its field value equals the decimal number of octets that would have been sent in the content of a 200 (OK) response to the same request.
A server MUST NOT send a Content-Length header field in any response with a status code of 1xx (Informational) or 204 (No Content). A server MUST NOT send a Content-Length header field in any 2xx (Successful) response to a CONNECT request (Section 9.3.6).
Aside from the cases defined above, in the absence of Transfer-Encoding, an origin server SHOULD send a Content-Length header field when the content size is known prior to sending the complete header section. This will allow downstream recipients to measure transfer progress, know their own framing boundaries, and reuse the connection for subsequent requests.
Because Content-Length is used for message delimitation in HTTP/1.1, its field value can impact how the message is parsed by recipients even when the message framing is not subject to HTTP/1.1 rules. If the value does not match the actual data length of the representation, the results can be anywhere from bad message framing to potential request smuggling or response splitting, depending on the circumstances.
A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field.
Note: HTTP's use of Content-Length for message framing differs significantly from the same field's use in MIME, where it is an optional field used only within the "message/external-body" media type.
8.7. Content-Location
The "Content-Location" header field references a URI that can be used as an identifier for a specific resource corresponding to the representation in this message's content. In other words, if one were to perform a GET request on this URI at the time of this message's generation, then a 200 (OK) response would contain the same representation that is enclosed as content in this message.
Content-Location = absolute-URI / partial-URI
The Content-Location value is not a replacement for the target URI (Section 7.1). It is representation metadata. It has the same syntax and semantics as the header field of the same name defined for MIME body parts in Section 4 of [RFC2557]. However, its appearance in an HTTP message has some special implications for HTTP recipients.
If Content-Location is included in a 2xx (Successful) response message and its value refers to a URI with the same scheme, authority, and path as the target URI, then the recipient MAY consider the content to be a current representation of that target resource. For a GET (Section 9.3.1) or HEAD (Section 9.3.2) request, this is the same as the default semantics when no Content-Location is provided by the server. For a state-changing request like PUT (Section 9.3.4) or POST (Section 9.3.3), it implies that the server's response content contains a current representation of that target resource, thereby distinguishing it from representations that might only report about the action (e.g., "It worked!"). This allows authoring applications to update their local copies without the need for a subsequent GET request.
If Content-Location is included in a 2xx (Successful) response message and its field value refers to a URI that differs from the target URI, then the origin server claims that the URI is an identifier for a different resource corresponding to the enclosed representation. Such a claim can only be trusted if both identifiers share the same resource owner, which cannot be programmatically determined via HTTP.
-
For a response to a GET or HEAD request, this is an indication that the target URI refers to a resource that is subject to content negotiation and the Content-Location field value is a more specific identifier for the selected representation.
-
For a 201 (Created) response to a POST request, the Content-Location field value is a reference to the resource that contains the current representation corresponding to the new resource.
Otherwise, such a Content-Location indicates that this content is a representation reporting on the requested action's status and that the same report is available (for future access with GET) at the given URI. For example, a purchase transaction made via a POST request might include a receipt document as the content of the 200 (OK) response; the Content-Location field value provides an identifier for retrieving a copy of that same receipt in the future.
A user agent that sends Content-Location in a request message is stating that its value refers to where the user agent originally obtained the content of the enclosed representation (prior to any modifications made by that user agent). In other words, the user agent is providing a back link to the source of the original representation.
An origin server that receives a Content-Location field in a request message MUST treat the information as transitory request context rather than as metadata to be saved with the representation. An origin server MAY use that context to guide in processing the request or to save it for other uses, such as within source links or versioning metadata. However, an origin server MUST NOT use such context information to alter the request semantics.
For example, if a client makes a PUT request on a negotiated resource and the origin server accepts that PUT (without redirection), then the new state of that resource is expected to be consistent with the one representation supplied in that PUT; the Content-Location cannot be used as a form of reverse content selection identifier to update only one of the negotiated representations. If the user agent had wanted the latter semantics, it would have applied the PUT directly to the Content-Location URI.
8.8. Validator Fields
Resource metadata is referred to as a "validator" if it can be used within a precondition (Section 13) to make a conditional request (Section 13.1).
Validator fields convey a current validator for the selected representation (Section 3.2).
In responses to safe requests, validator fields describe the selected representation (Section 3.2). Note that, depending on the status code semantics, the selected representation for a given response is not necessarily the same as the representation enclosed as response content.
In a successful response to a state-changing request, validator fields describe the new representation that has replaced the prior selected representation as a result of processing the request.
For example, an ETag header field in a 201 (Created) response communicates a validator for the resource created by the request, and an ETag header field in a 200 (OK) response to PUT communicates a validator for the new representation that has replaced the prior selected representation as a result of the PUT request.
This specification defines two forms of metadata that are commonly used to observe resource state and test for preconditions on requests: modification dates (Section 8.8.2) and opaque entity tags (Section 8.8.3). Additional information about various design considerations can be found in Section 13.2.
8.8.1. Weak versus Strong
Validators come in two flavors: strong or weak. Weak validators are easy to generate but are far less useful for comparisons. Strong validators are ideal for comparisons but can be very difficult (and occasionally impossible) to generate efficiently. Rather than impose that all forms of resource adhere to the same strength of validator, HTTP exposes the type of validator in use and imposes restrictions on when weak validators can be used as preconditions.
A "strong validator" is representation metadata that changes value whenever a change occurs to the representation data that would be observable in the content of a 200 (OK) response to GET.
A strong validator might change for reasons other than a change to the representation data, such as when a semantically significant part of the representation metadata is changed (e.g., Content-Type), but it is in the best interests of the origin server to only change the value when it is necessary to invalidate the stored responses held by remote caches and authoring tools.
Cache entries might persist for arbitrarily long periods, regardless of expiration times. Thus, a cache might attempt to validate an entry using a validator that it obtained in the distant past. A strong validator is unique across all versions of all representations associated with a particular resource over time. However, there is no implication of uniqueness across representations of different resources (i.e., the same strong validator might be in use for representations of multiple resources at the same time and does not imply that those representations are equivalent).
There are a variety of strong validators used in practice. The best are based on strict revision control, wherein each change to a representation always results in a unique node name and revision identifier being assigned before the representation is made accessible to GET. A collision-resistant hash function applied to the representation data is also sufficient if the data is available prior to the response header section being sent and the digest does not need to be recalculated every time a validation request is received. However, if a resource has distinct representations that differ only in their metadata, such as might occur with content negotiation over media types that happen to share the same data format, then the origin server needs to incorporate additional information in the validator to distinguish those representations.
In contrast, a "weak validator" is representation metadata that might not change for every change to the representation data. This weakness might be due to limitations in how the value is calculated (e.g., clock resolution), an inability to ensure uniqueness for all possible representations of the resource, or a desire to group representations by some self-determined set of equivalency rather than unique sequences of data.
An origin server SHOULD change a weak entity tag whenever it considers prior representations to be unacceptable as a substitute for the current representation. In other words, a weak entity tag ought to change whenever the origin server wants caches to invalidate old responses.
For example, the representation of a weather report that changes in content every second, based on dynamic measurements, might be grouped into sets of equivalent representations (from the origin server's perspective) with the same weak validator in order to allow cached representations to be valid for a reasonable period of time (perhaps adjusted dynamically based on server load or weather quality). Likewise, a representation's modification time, if defined with only one-second resolution, might be a weak validator if it is possible for the representation to be modified twice during a single second and retrieved between those modifications.
Likewise, a validator is weak if it is shared by two or more representations of a given resource at the same time, unless those representations have identical representation data. For example, if the origin server sends the same validator for a representation with a gzip content coding applied as it does for a representation with no content coding, then that validator is weak. However, two simultaneous representations might share the same strong validator if they differ only in the representation metadata, such as when two different media types are available for the same representation data.
Strong validators are usable for all conditional requests, including cache validation, partial content ranges, and "lost update" avoidance. Weak validators are only usable when the client does not require exact equality with previously obtained representation data, such as when validating a cache entry or limiting a web traversal to recent changes.
8.8.2. Last-Modified
The "Last-Modified" header field in a response provides a timestamp indicating the date and time at which the origin server believes the selected representation was last modified, as determined at the conclusion of handling the request.
Last-Modified = HTTP-date
An example of its use is
Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
8.8.2.1. Generation
An origin server SHOULD send Last-Modified for any selected representation for which a last modification date can be reasonably and consistently determined, since its use in conditional requests and evaluating cache freshness ([CACHING]) can substantially reduce unnecessary transfers and significantly improve service availability and scalability.
A representation is typically the sum of many parts behind the resource interface. The last-modified time would usually be the most recent time that any of those parts were changed. How that value is determined for any given resource is an implementation detail beyond the scope of this specification. What matters to HTTP is how recipients of the Last-Modified header field can use its value to make conditional requests and test the validity of locally cached responses.
An origin server SHOULD obtain the Last-Modified value of the representation as close as possible to the time that it generates the Date field value for its response. This allows a recipient to make an accurate assessment of the representation's modification time, especially if the representation changes near the time that the response is generated.
An origin server with a clock (as defined in Section 5.6.7) MUST NOT send a Last-Modified date that is later than the server's time of message origination (Date). If the last modification time is derived from implementation-specific metadata that evaluates to some time in the future, according to the origin server's clock, then the origin server MUST replace that value with the message origination date. This prevents a future modification date from having an adverse impact on cache validation.
An origin server without a clock MUST NOT assign Last-Modified values to a response unless those values are associated with the resource by some other system or user with a reliable clock.
8.8.2.2. Comparison
A Last-Modified time, when used as a validator in a request, is implicitly weak unless it is possible to deduce that it is strong, using the following rules:
-
The validator is being compared by an origin server to the actual current validator for the representation and,
-
That origin server reliably knows that the associated representation did not change twice during the second covered by the presented validator.
or
-
The validator is about to be used by a client in an If-Modified-Since, If-Unmodified-Since, or If-Range header field, because the client has a cache entry for the associated representation, and
-
That cache entry includes a Date value, which gives the time when the origin server sent the original response, and
-
The presented Last-Modified time is at least 60 seconds before the Date value.
or
-
The validator is being compared by an intermediate cache to the validator stored in its cache entry for the representation, and
-
That cache entry includes a Date value, which gives the time when the origin server sent the original response, and
-
The presented Last-Modified time is at least 60 seconds before the Date value.
This method relies on the fact that if two different responses were sent by the origin server during the same second, but both had the same Last-Modified time, then at least one of those responses would have a Date value equal to its Last-Modified time. The arbitrary 60-second limit guards against the possibility that the Date and Last-Modified values are generated from different clocks or at somewhat different times during the preparation of the response. An implementation MAY use a value larger than 60 seconds, if it is believed that 60 seconds is too short.
8.8.3. ETag
The "ETag" header field in a response provides the current entity tag for the selected representation, as determined at the conclusion of handling the request. An entity tag is an opaque validator for differentiating between multiple representations of the same resource, regardless of whether those multiple representations are due to resource state changes over time, content negotiation resulting in multiple representations being valid at the same time, or both. An entity tag consists of an opaque quoted string, possibly prefixed by a weakness indicator.
ETag = entity-tag
entity-tag = [ weak ] opaque-tag
weak = %s"W/"
opaque-tag = DQUOTE *etagc DQUOTE
etagc = %x21 / %x23-7E / obs-text
; VCHAR except double quotes, plus obs-text
Note: Previously, opaque-tag was defined to be a quoted-string ([RFC2616], Section 3.11); thus, some recipients might perform backslash unescaping. Servers therefore ought to avoid backslash characters in entity tags.
An entity tag can be more reliable than a modification date for several reasons: there might not be a clock available to the server; a resource's modification time might be set into the future for access control purposes; the resolution of a modification time is limited by how it is stored; distributed systems might have difficulty synchronizing clocks; an entity tag can incorporate additional metadata, such as the value of a Content-Encoding header field, to better distinguish representations; and so on.
Two entity tags are equivalent if their opaque-tags match character by character, regardless of either or both being tagged as weak.
An entity tag can be either a strong validator or a weak validator, with strong being the default. If an origin server provides an entity tag for a representation and the generation of that entity tag does not satisfy all of the characteristics of a strong validator (Section 8.8.1), then the origin server MUST mark the entity tag as weak by prefixing its opaque value with "W/" (case-sensitive).
ETag: W/"xyzzy"
ETag: ""
8.8.3.1. Generation
The principle behind entity tags is that only the service author knows the implementation of a resource well enough to select the most accurate and efficient validation mechanism for that resource, and that any such mechanism can be mapped to a simple sequence of octets for easy comparison. Since the value is opaque, there is no need for the client to be aware of how each entity tag is constructed.
For example, a resource that has implementation-specific versioning applied to all changes might use an internal revision number, perhaps combined with a variance identifier for content negotiation, to accurately differentiate between representations. Other implementations might use a collision-resistant hash of representation content, a combination of various file attributes, or a modification timestamp that has sub-second resolution.
An origin server SHOULD send an ETag for any selected representation for which detection of changes can be reasonably and consistently determined, since the entity tag's use in conditional requests and evaluating cache freshness ([CACHING]) can result in a substantial reduction of HTTP network traffic and can be a significant contributor to service scalability and reliability.
8.8.3.2. Comparison
There are two entity tag comparison functions, depending on whether the comparison context allows the use of weak validators or not:
-
Strong comparison: two entity tags are equivalent if both are not weak and their opaque-tags match character by character.
-
Weak comparison: two entity tags are equivalent if their opaque-tags match character by character, regardless of either or both being tagged as weak.
The example below shows the results for a set of entity tag pairs and both the weak and strong comparison function results:
| ETag 1 | ETag 2 | Strong Comparison | Weak Comparison |
|---|---|---|---|
| W/"1" | W/"1" | no match | match |
| W/"1" | W/"2" | no match | no match |
| W/"1" | "1" | no match | match |
| "1" | "1" | match | match |
8.8.3.3. Example: Entity Tags Varying on Content-Negotiated Resources
Consider a resource that is subject to content negotiation (Section 12.1), and where the representations sent in response to a GET request vary based on the Accept-Encoding request header field (Section 12.5.3):
>> Request:
GET /index HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip
>> Response:
HTTP/1.1 200 OK
Date: Fri, 26 Mar 2010 00:05:00 GMT
ETag: "123-a"
Content-Length: 70
Vary: Accept-Encoding
Content-Type: text/plain
Content-Encoding: gzip
[...]
For a different set of request header fields, a different representation might be sent by the server:
>> Request:
GET /index HTTP/1.1
Host: www.example.com
>> Response:
HTTP/1.1 200 OK
Date: Fri, 26 Mar 2010 00:05:00 GMT
ETag: "123-b"
Content-Length: 400
Vary: Accept-Encoding
Content-Type: text/plain
[...]
The difference in entity tags in these two responses indicates that the representations sent contain different data. The strong entity tags used in this instance permit the use of conditional requests for both cache validation and range requests.
In contrast, an origin server that generates a gzip-encoded representation on-the-fly, without checking for any prior version of the representation that might have already been generated, might use weak entity tags to indicate equivalent content:
>> Response:
HTTP/1.1 200 OK
Date: Fri, 26 Mar 2010 00:05:00 GMT
ETag: W/"123"
Content-Length: 70
Vary: Accept-Encoding
Content-Type: text/plain
Content-Encoding: gzip
[...]
In this case, the server assumes that the compressed version of the representation is equivalent to the uncompressed version, while at the same time not tracking versions well enough to determine the strong validator.