6. Connection Handling
6.1. Current Practices
Section 4.2.2 of [RFC1035] says:
-
The server should assume that the client will initiate connection closing, and should delay closing its end of the connection until all outstanding client requests have been satisfied.
-
If the server needs to close a dormant connection to reclaim resources, it should wait until the connection has been idle for a period on the order of two minutes. In particular, the server should allow the SOA and AXFR request sequence (which begins a refresh operation) to be made on a single connection. Since the server would be unable to answer queries anyway, a unilateral close or reset may be used instead of graceful close.
Other more modern protocols (e.g., HTTP/1.1 [RFC7230], HTTP/2 [RFC7540]) have support by default for persistent TCP connections for all requests. Connections are then normally closed via a 'connection close' signal from one party.
The description in [RFC1035] is clear that servers should view connections as persistent (particularly after receiving an SOA), but unfortunately does not provide enough detail for an unambiguous interpretation of client behaviour for queries other than a SOA. Additionally, DNS does not yet have a signalling mechanism for connection timeout or close, although some have been proposed.
6.1.1. Clients
There is no clear guidance today in any RFC as to when a DNS client should close a TCP connection, and there are no specific recommendations with regard to DNS client idle timeouts. However, at the time of writing, it is common practice for clients to close the TCP connection after sending a single request (apart from the SOA/AXFR case).
6.1.2. Servers
Many DNS server implementations use a long fixed idle timeout and default to a small number of TCP connections. They also offer little in the way of TCP connection management options. The disadvantages of this include:
-
Operational experience has shown that long server timeouts can easily cause resource exhaustion and poor response under heavy load.
-
Intentionally opening many connections and leaving them idle can trivially create a TCP denial of service (DoS) attack as many DNS servers are poorly equipped to defend against this by modifying their idle timeouts or other connection management policies.
-
A modest number of clients that all concurrently attempt to use persistent connections with non-zero idle timeouts to such a server could unintentionally cause the same DoS problem.
Note that this DoS is only on the TCP service. However, in these cases, it affects not only clients that wish to use TCP for their queries for operational reasons, but all clients that choose to fall back to TCP from UDP after receiving a TC=1 flag.
6.2. Recommendations
The following sections include recommendations that are intended to result in more consistent and scalable implementations of DNS-over-TCP.
6.2.1. Connection Reuse
One perceived disadvantage to DNS over TCP is the added connection setup latency, generally equal to one RTT. To amortise connection setup costs, both clients and servers SHOULD support connection reuse by sending multiple queries and responses over a single persistent TCP connection.
When sending multiple queries over a TCP connection, clients MUST NOT reuse the DNS Message ID of an in-flight query on that connection in order to avoid Message ID collisions. This is especially important if the server could be performing out-of-order processing (see Section 7).
6.2.1.1. Query Pipelining
Due to the historical use of TCP primarily for zone transfer and truncated responses, no existing RFC discusses the idea of pipelining DNS queries over a TCP connection.
In order to achieve performance on par with UDP, DNS clients SHOULD pipeline their queries. When a DNS client sends multiple queries to a server, it SHOULD NOT wait for an outstanding reply before sending the next query. Clients SHOULD treat TCP and UDP equivalently when considering the time at which to send a particular query.
It is likely that DNS servers need to process pipelined queries concurrently and also send out-of-order responses over TCP in order to provide the level of performance possible with UDP transport. If TCP performance is of importance, clients might find it useful to use server processing times as input to server and transport selection algorithms.
DNS servers (especially recursive) MUST expect to receive pipelined queries. The server SHOULD process TCP queries concurrently, just as it would for UDP. The server SHOULD answer all pipelined queries, even if they are received in quick succession. The handling of responses to pipelined queries is covered in Section 7.
6.2.2. Concurrent Connections
To mitigate the risk of unintentional server overload, DNS clients MUST take care to minimize the number of concurrent TCP connections made to any individual server. It is RECOMMENDED that for any given client/server interaction there SHOULD be no more than one connection for regular queries, one for zone transfers, and one for each protocol that is being used on top of TCP (for example, if the resolver was using TLS). However, it is noted that certain primary/secondary configurations with many busy zones might need to use more than one TCP connection for zone transfers for operational reasons (for example, to support concurrent transfers of multiple zones).
Similarly, servers MAY impose limits on the number of concurrent TCP connections being handled for any particular client IP address or subnet. These limits SHOULD be much looser than the client guidelines above, because the server does not know, for example, if a client IP address belongs to a single client, is multiple resolvers on a single machine, or is multiple clients behind a device performing Network Address Translation (NAT).
6.2.3. Idle Timeouts
To mitigate the risk of unintentional server overload, DNS clients MUST take care to minimise the idle time of established DNS-over-TCP sessions made to any individual server. DNS clients SHOULD close the TCP connection of an idle session, unless an idle timeout has been established using some other signalling mechanism, for example, [edns-tcp-keepalive].
To mitigate the risk of unintentional server overload, it is RECOMMENDED that the default server application-level idle period be on the order of seconds, but no particular value is specified. In practice, the idle period can vary dynamically, and servers MAY allow idle connections to remain open for longer periods as resources permit. A timeout of at least a few seconds is advisable for normal operations to support those clients that expect the SOA and AXFR request sequence to be made on a single connection as originally specified in [RFC1035]. Servers MAY use zero timeouts when they are experiencing heavy load or are under attack.
DNS messages delivered over TCP might arrive in multiple segments. A DNS server that resets its idle timeout after receiving a single segment might be vulnerable to a "slow-read attack". For this reason, servers SHOULD reset the idle timeout on the receipt of a full DNS message, rather than on receipt of any part of a DNS message.
6.2.4. Teardown
Under normal operation DNS clients typically initiate connection closing on idle connections; however, DNS servers can close the connection if the idle timeout set by local policy is exceeded. Also, connections can be closed by either end under unusual conditions such as defending against an attack or system failure/reboot.
DNS clients SHOULD retry unanswered queries if the connection closes before receiving all outstanding responses. No specific retry algorithm is specified in this document.
If a DNS server finds that a DNS client has closed a TCP session (or if the session has been otherwise interrupted) before all pending responses have been sent, then the server MUST NOT attempt to send those responses. Of course, the DNS server MAY cache those responses.