10. Server Scheduling

It is generally beneficial for HTTP servers to send all responses as early as possible. However, when serving multiple requests on a single connection, there can be competition between requests for resources such as connection bandwidth. This section describes considerations for how servers schedule the sending order of competing responses when such competition exists.

Server scheduling is a prioritization process based on many inputs; priority signals are only one form of input. Factors such as implementation choice or deployment environment also play a role. Any given connection might have many dynamic permutations. For these reasons, it is not possible to describe a universal scheduling algorithm. This document provides some basic, non-exhaustive recommendations on how servers might act on priority parameters. It does not describe in detail how servers combine priority signals with other factors. Endpoints cannot depend on particular treatment based on priority signals. Expressing priority is only a suggestion.

It is RECOMMENDED that servers respect the urgency parameter (Section 4.1) where possible, sending higher-urgency responses before lower-urgency responses.

The incremental parameter indicates how clients process arriving response bytes. It is RECOMMENDED that servers respect the incremental parameter (Section 4.2) where possible.

Non-incremental responses of the same urgency SHOULD be served by allocating bandwidth in ascending order by stream ID, which corresponds to the order in which clients made the requests. Doing so ensures that clients can use request ordering to influence response ordering.

Incremental responses of the same urgency SHOULD be served by sharing bandwidth among them. The message content of incremental responses is used as it is received in parts or chunks. Clients might benefit more from receiving portions of all such resources rather than the entirety of a single resource. The size of the resource portion needed to improve performance varies. Some resource types place critical elements early; other resources can progressively use information. This scheme does not explicitly specify how servers ought to use size, type, or any other input to decide how to prioritize.

There might be scenarios where servers need to schedule multiple incremental and non-incremental responses at the same urgency level. Strict adherence to scheduling guidance based on urgency and request generation order might result in suboptimal client results, as early non-incremental responses might block serving incremental responses issued later. The following are examples of such challenges:

At the same urgency level, a non-incremental request for a large resource followed by an incremental request for a small resource.
At the same urgency level, an incremental request of uncertain length followed by a non-incremental large resource.

It is RECOMMENDED that servers avoid such starvation where possible. The means of doing so is an implementation decision. For example, servers might preemptively send responses of certain incremental types based on other information such as content size.

Optimal scheduling of server pushes is difficult, especially when pushed resources compete with active concurrent requests. There are many factors that servers can consider when scheduling, such as the type or size of resource being pushed, the priority of the request that triggered the push, the count of active concurrent responses, the priority of other active concurrent responses, and more. There is no general guidance on the best way to apply these. Overly simple servers might push at too high a priority and block client requests or push at too low a priority and delay responses, negating the intended goal of server push.

Priority signals are one factor in server push scheduling. The concept of parameter value defaults applies somewhat differently, as there is no explicit client signal for initial priority. Servers can apply priority signals provided in origin responses; see merging guidance given in Section 8. In the absence of origin signals, applying default parameter values might be suboptimal. Whatever servers decide about how to schedule pushed responses, they can signal the expected priority to clients by including a Priority field in the PUSH_PROMISE or HEADERS frame.

10.1. Intermediaries with Multiple Backend Connections

An intermediary serving an HTTP connection might spread requests across multiple backend connections. When it strictly applies priority ordering rules, lower-priority requests cannot make progress while requests with higher priority are in flight. This blocking can propagate to backend connections, which peers might interpret as the connection stalling. Endpoints commonly implement protection against stalls, such as abruptly closing connections after some period. To reduce the likelihood of this occurring, intermediaries can avoid strictly following priority ordering and instead allocate a small amount of bandwidth to all requests they forward so that each can make some progress over time.

Similarly, servers SHOULD allocate some amount of bandwidth to streams acting as tunnels.

10.1. Intermediaries with Multiple Backend Connections​

10.1. Intermediaries with Multiple Backend Connections