9. Additional Considerations
9.1. Service Instances
We use the term "service instance" to refer to software running on a host that can receive connections on some set of { IP address, port } tuples. What makes the software an instance is that regardless of which of these tuples the client uses to connect to it, the client is connected to the same software, running on the same logical node (see Section 9.2), and will receive the same answers and the same keying information.
Service instances are identified from the perspective of the client. If the client is configured with { IP address, port } tuples, it has no way to tell if the service offered at one tuple is the same server that is listening on a different tuple. So in this case, the client treats each different tuple as if it references a different service instance.
In some cases, a client is configured with a hostname and a port number. The port number may be given explicitly, along with the hostname. The port number may be omitted, and assumed to have some default value. The hostname and a port number may be learned from the network, as in the case of DNS SRV records. In these cases, the { hostname, port } tuple uniquely identifies the service instance, subject to the usual case-insensitive DNS comparison of names [RFC1034].
It is possible that two hostnames might point to some common IP addresses; this is a configuration anomaly that the client is not obliged to detect. The effect of this could be that after being told to disconnect, the client might reconnect to the same server because it is represented as a different service instance.
Implementations SHOULD NOT resolve hostnames and then perform the process of matching IP address(es) in order to evaluate whether two entities should be determined to be the "same service instance".
9.2. Anycast Considerations
When an anycast service is configured on a particular IP address and port, it must be the case that although there is more than one physical server responding on that IP address, each such server can be treated as equivalent. What we mean by "equivalent" here is that both servers can provide the same service and, where appropriate, the same authentication information, such as PKI certificates, when establishing connections.
If a change in network topology causes packets in a particular TCP connection to be sent to an anycast server instance that does not know about the connection, the new server will automatically terminate the connection with a TCP reset, since it will have no record of the connection, and then the client can reconnect or stop using the connection as appropriate.
If, after the connection is re-established, the client's assumption that it is connected to the same instance is violated in some way, that would be considered an incorrect behavior in this context. It is, however, out of the possible scope for this specification to make specific recommendations in this regard; that would be up to follow-on documents that describe specific uses of DNS Stateful Operations.
9.3. Connection Sharing
As previously specified for DNS-over-TCP [RFC7766]:
To mitigate the risk of unintentional server overload, DNS clients MUST take care to minimize the number of concurrent TCP connections made to any individual server. It is RECOMMENDED that for any given client/server interaction there SHOULD be no more than one connection for regular queries, one for zone transfers, and one for each protocol that is being used on top of TCP (for example, if the resolver was using TLS). However, it is noted that certain primary/secondary configurations with many busy zones might need to use more than one TCP connection for zone transfers for operational reasons (for example, to support concurrent transfers of multiple zones).
A single server may support multiple services, including DNS Updates [RFC2136], DNS Push Notifications [Push], and other services, for one or more DNS zones. When a client discovers that the target server for several different operations is the same service instance (see Section 9.1), the client SHOULD use a single shared DSO Session for all those operations.
This requirement has two benefits. First, it reduces unnecessary connection load on the DNS server. Second, it avoids the connection startup time that would be spent establishing each new additional connection to the same DNS server.
However, server implementers and operators should be aware that connection sharing may not be possible in all cases. A single host device may be home to multiple independent client software instances that don't coordinate with each other. Similarly, multiple independent client devices behind the same NAT gateway will also typically appear to the DNS server as different source ports on the same client IP address. Because of these constraints, a DNS server MUST be prepared to accept multiple connections from different source ports on the same client IP address.
9.4. Operational Considerations for Middleboxes
Where an application-layer middlebox (e.g., a DNS proxy, forwarder, or session multiplexer) is in the path, care must be taken to avoid a configuration in which DSO traffic is mishandled. The simplest way to avoid such problems is to avoid using middleboxes. When this is not possible, middleboxes should be evaluated to make sure that they behave correctly.
Correct behavior for middleboxes consists of one of the following:
-
The middlebox does not forward DSO messages and responds to DSO messages with a response code other than NOERROR or DSOTYPENI.
-
The middlebox acts as a DSO server and follows this specification in establishing connections.
-
There is a 1:1 correspondence between incoming and outgoing connections such that when a connection is established to the middlebox, it is guaranteed that exactly one corresponding connection will be established from the middlebox to some DNS resolver, and all incoming messages will be forwarded without modification or reordering. An example of this would be a NAT forwarder or TCP connection optimizer (e.g., for a high-latency connection such as a geosynchronous satellite link).
Middleboxes that do not meet one of the above criteria are very likely to fail in unexpected and difficult-to-diagnose ways. For example, a DNS load balancer might unbundle DNS messages from the incoming TCP stream and forward each message from the stream to a different DNS server. If such a load balancer is in use, and the DNS servers it points to implement DSO and are configured to enable DSO, DSO Session establishment will succeed, but no coherent session will exist between the client and the server. If such a load balancer is pointed at a DNS server that does not implement DSO or is configured not to allow DSO, no such problem will exist, but such a configuration risks unexpected failure if new server software is installed that does implement DSO.
It is of course possible to implement a middlebox that properly supports DSO. It is even possible to implement one that implements DSO with long-lived operations. This can be done either by maintaining a 1:1 correspondence between incoming and outgoing connections, as mentioned above, or by terminating incoming sessions at the middlebox but maintaining state in the middlebox about any long-lived operations that are requested. Specifying this in detail is beyond the scope of this document.
9.5. TCP Delayed Acknowledgement Considerations
Most modern implementations of the Transmission Control Protocol (TCP) include a feature called "Delayed Acknowledgement" [RFC1122].
Without this feature, TCP can be very wasteful on the network. For illustration, consider a simple example like remote login using a very simple TCP implementation that lacks delayed acks. When the user types a keystroke, a data packet is sent. When the data packet arrives at the server, the simple TCP implementation sends an immediate acknowledgement. Mere milliseconds later, the server process reads the one byte of keystroke data, and consequently the simple TCP implementation sends an immediate window update. Mere milliseconds later, the server process generates the character echo and sends this data packet immediately too. The simple TCP implementation then sends this data packet immediately too. In this case, this simple TCP implementation sends a burst of three packets almost instantaneously (ack, window update, data).
Clearly it would be more efficient if the TCP implementation were to combine the three separate packets into one, and this is what the delayed ack feature enables.
With delayed ack, the TCP implementation waits after receiving a data packet, typically for 200 ms, and then sends its ack if (a) more data packet(s) arrive, (b) the receiving process generates some reply data, or (c) 200 ms elapse without either of the above occurring.
With delayed ack, remote login becomes much more efficient, generating just one packet instead of three for each character echo.
The logic of delayed ack is that the 200 ms delay cannot do any significant harm. If something at the other end were waiting for something, then the receiving process should generate the reply that the thing at the other end is waiting for, and TCP will then immediately send that reply (combined with the ack and window update). And if the receiving process does not in fact generate any reply for this particular message, then by definition the thing at the other end cannot be waiting for anything. Therefore, the 200 ms delay is harmless.
This assumption may be true unless the sender is using Nagle's algorithm, a similar efficiency feature, created to protect the network from poorly written client software that performs many rapid small writes in succession. Nagle's algorithm allows these small writes to be coalesced into larger, less wasteful packets.
Unfortunately, Nagle's algorithm and delayed ack, two valuable efficiency features, can interact badly with each other when used together [NagleDA].
DSO request messages elicit responses; DSO unidirectional messages and DSO response messages do not.
For DSO request messages, which do elicit responses, Nagle's algorithm and delayed ack work as intended.
For DSO messages that do not elicit responses, the delayed ack mechanism causes the ack to be delayed by 200 ms. The 200 ms delay on the ack can in turn cause Nagle's algorithm to prevent the sender from sending any more data for 200 ms until the awaited ack arrives. On an enterprise Gigabit Ethernet (GigE) backbone with sub-millisecond round-trip times, a 200 ms delay is enormous in comparison.
When this issues is raised, there are two solutions that are often offered, neither of them ideal:
-
Disable delayed ack. For DSO messages that elicit no response, removing delayed ack avoids the needless 200 ms delay and sends back an immediate ack that tells Nagle's algorithm that it should immediately grant the sender permission to send its next packet. Unfortunately, for DSO messages that do elicit a response, removing delayed ack removes the efficiency gains of combining acks with data, and the responder will now send two or three packets instead of one.
-
Disable Nagle's algorithm. When acks are delayed by the delayed ack algorithm, removing Nagle's algorithm prevents the sender from being blocked from sending its next small packet immediately. Unfortunately, on a network with a higher round-trip time, removing Nagle's algorithm removes the efficiency gains of combining multiple small packets into fewer larger ones, with the goal of limiting the number of small packets in flight at any one time.
The problem here is that with DSO messages that elicit no response, the TCP implementation is stuck waiting, unsure if a response is about to be generated or whether the TCP implementation should go ahead and send an ack and window update.
The solution is networking APIs that allow the receiver to inform the TCP implementation that a received message has been read, processed, and no response for this message will be generated. TCP can then stop waiting for a response that will never come, and immediately go ahead and send an ack and window update.
For implementations of DSO, disabling delayed ack is NOT RECOMMENDED because of the harm this can do to the network.
For implementations of DSO, disabling Nagle's algorithm is NOT RECOMMENDED because of the harm this can do to the network.
At the time that this document is being prepared for publication, it is known that at least one TCP implementation provides the ability for the recipient of a TCP message to signal that it is not going to send a response, and hence the delayed ack mechanism can stop waiting. Implementations on operating systems where this feature is available SHOULD make use of it.