9. Security Considerations
This section is meant to inform developers, information providers, and users of known security concerns relevant to HTTP semantics and its use for transferring information over the Internet. Considerations related to message syntax, parsing, and routing are discussed in [RFC7230], and considerations related to HTTP caching are discussed in [RFC7234].
The list of considerations below is not exhaustive. Most security concerns related to HTTP semantics are about securing server-side applications (code behind the HTTP interface), securing user agent processing of content received via HTTP, or secure use of information obtained from HTTP header fields. Many of these concerns are interdependent; a weakness in client software can be used to exploit vulnerabilities in server software, and vice versa.
9.1. Attacks Based on File and Path Names
Origin servers frequently make use of their local file system to manage the mapping from target URI to resource representations. Most file systems are not designed to protect against malicious file or path names. Therefore, an origin server needs to avoid accessing names that have a special significance to the system when mapping the target resource to files, folders, or directories.
For example, UNIX, Microsoft Windows, and other operating systems use ".." as a path component to indicate a directory level above the current one, and they use specially named paths or file names to send data to system devices. Similar naming conventions might exist within other types of storage systems. Likewise, local storage systems have an annoying tendency to prefer user-friendliness over security when handling invalid or unexpected characters, recomposing them in ways that can result in unintended patterns.
Implementations need to be careful not to escape or unescape the same string more than once, since unescaping a previously escaped string can result in unsafe characters. In particular, it is not safe to simply unescape the request-target and use it as a file system path without additional checks.
Note that the restrictions on target URIs in this specification are not sufficient to prevent these kinds of attacks; they must be supplemented by implementation-specific checks.
9.2. Attacks Based on Command, Code, or Query Injection
Origin servers often use parameters within the URI as a means of identifying system services, selecting database entries, or choosing a data source. However, data received in a request cannot be trusted. An attacker could construct any of the request data elements (method, request-target, header fields, or body) to contain data that might be misinterpreted as a command, code, or query when passed through a command invocation, language interpreter, or database interface.
For example, SQL injection is a common attack wherein additional query language is inserted within some part of the request-target or header fields (e.g., Host, Referer, etc.). If the received data is used directly within a SQL SELECT statement, the query language might be interpreted as a database command instead of a simple string value. This type of implementation vulnerability is extremely common, in spite of being easy to prevent.
In general, resource implementations ought to avoid use of request data in the assembly of shell commands or queries. When such use is necessary, the data needs to be properly escaped (as appropriate for the system receiving the data) before being used.
9.3. Disclosure of Personal Information
Clients are often privy to large amounts of personal information, including both information provided by the user to interact with resources (e.g., the user's name, location, mail address, passwords, encryption keys, etc.) and information about the user's browsing activity over time (e.g., history, bookmarks, etc.). Implementations need to prevent unintentional leakage of such information.
9.3.1. Disclosure via Application Data
Applications ought to restrict their disclosure of information to only that which is required to complete the request and avoid the disclosure of information specific to the user or the application's internal structure (Section 5.5.3 and Section 7.4.2).
9.3.2. Disclosure via Referer
The Referer header field allows a client to advertise to the server where the client obtained the request-target, which can reveal information about the user's context or browsing history. In the case where the request-target was provided by a third-party source, the user might desire to keep that information confidential (e.g., a link from medical information). The user agent therefore ought to give the user the option to not send the Referer field, or to send a less revealing (e.g., only the origin) version of the field (Section 5.5.2).
Clients ought not to include a Referer header field in a (non-secure) HTTP request if the referring page was received with a secure protocol.
Authors of services that use the HTTP protocol ought not to use GET and POST requests with form-encoded content to transmit sensitive information such as personally identifying information, account numbers, passwords, etc., since that causes that data to be unencrypted and transmitted in the request-target or content. Service designers ought to use the POST method with message bodies to transmit such sensitive information, taking care to not include such information in the request-target, since that might be exposed in logs, bookmarks, etc.
9.3.3. Disclosure via User-Agent
The User-Agent header field often contains enough information to uniquely identify a specific device, usually when combined with other characteristics, particularly if the user agent sends excessive details about the user's system or extensions. Implementations ought to limit such information (Section 5.5.3).
9.4. Privacy of Server Log Information
A server is in the position to save personal data about a user's requests over time, which might identify their reading patterns or subjects of interest. In particular, log information gathered at an intermediary often contains a history of user agent interaction, across a multitude of sites, that can be traced to individual users.
HTTP log information is confidential in nature; its handling is often constrained by laws and regulations. Log information needs to be securely stored and appropriate guidelines followed for its analysis. Anonymization of personal information within individual entries helps, but is generally not sufficient to prevent real log traces from being re-identified based on correlation with other access characteristics. As such, access traces that are keyed to a specific client ought to be either anonymized or considered confidential.
To minimize the risk of theft or accidental publication, log information ought to be purged as soon as possible.
9.5. Disclosure of Fragment after Redirects
Although fragment identifiers used within URI references are not sent in requests, implementers ought to be aware that they will be visible to the user agent and any extensions or scripts running as a result of the response. In particular, when a redirect occurs and the original request's fragment identifier is inherited by the new reference in Location (Section 7.1.2), this might have security consequences.
9.6. Disclosure of Product Information
The Server and User-Agent header fields often reveal information about the respective sender's software systems. In theory, this can make it easier for an attacker to exploit known security holes; in practice, attackers tend to try all potential holes regardless of the apparent software versions in use.
Proxies that serve as a portal through a network firewall ought to take special precautions regarding the transfer of header information that might identify hosts behind the firewall. The Via header field allows intermediaries to replace sensitive machine names with pseudonyms.
9.7. Browser Fingerprinting
Browser fingerprinting is a set of techniques for identifying a specific user agent over time through its unique set of characteristics. These characteristics might include information related to how it uses the underlying transport protocol, feature capabilities, and scripting environment, though of particular interest here is the set of unique characteristics that might be communicated via HTTP. Fingerprinting is considered a privacy concern because it enables tracking of a user agent's behavior over time (Section 9.4) without the corresponding controls that the user might have over other forms of data, such as cookies.
There are a number of request header fields that might reveal information to servers that is sufficiently unique to enable fingerprinting. The From header field is the most obvious, though it is expected that From will only be sent when self-identification is desired by the user. Likewise, Cookie header fields are deliberately designed to enable re-identification, so fingerprinting concerns only apply when cookies are disabled or restricted by the user agent.
The User-Agent header field might contain enough information to uniquely identify a specific device, usually when combined with other characteristics, particularly if the user agent sends excessive details about the user's system or extensions. However, the source of unique information that is least expected by users is proactive negotiation (Section 3.4.1), including the Accept, Accept-Charset, Accept-Encoding, and Accept-Language header fields.
In addition to the fingerprinting concern, detailed use of the Accept-Language header field can reveal information the user might consider to be of a private nature. For example, understanding a given language set might be strongly correlated to membership in a particular ethnic group. An approach that limits such loss of privacy would be for a user agent to omit the sending of Accept-Language except for sites that have been whitelisted, perhaps via interaction after detecting a Vary header field that indicates language negotiation might be useful.
In environments where proxies are used to enhance privacy, user agents ought to be conservative in sending proactive negotiation header fields. General-purpose user agents that provide a high degree of header field configurability ought to inform users about the loss of privacy that might result if too much detail is provided. As an extreme privacy measure, proxies could filter the proactive negotiation header fields in relayed requests.
9.8. Validator Retention
The validators contained within response metadata (Section 7.2) ought to only be retained by a cache as long as needed for normal processing and expiration of the cached response. Keeping validators around for extended periods can lead to privacy issues because old validators might be used to correlate the activity of a single user across multiple requests, even if the server doesn't explicitly associate validator values with individual users. User agents that are not acting as a cache ought not to retain validators for extended periods.