Skip to main content

4. Filtering and Selection Algorithms

A most important factor affecting the accuracy and reliability of time distribution is the complex of algorithms used to reduce the effect of statistical errors and falsetickers due to failure of various subnet components, reference sources or propagation media. The algorithms suggested in this section were developed and refined over several years of operation in the Internet under widely varying topologies, speeds and traffic regimes. While these algorithms are believed the best available at the present time, they are not an integral part of the NTP specification, since other algorithms with similar or superior performance may be devised in future.

However, it is important to observe that not all time servers or clients in an NTP synchronization subnet must implement these algorithms. For instance, simple workstations may dispense with one or both of them in the interests of simplicity if accuracy and reliability requirements justify. Nevertheless, it would be expected that an NTP server providing synchronization to a sizable community, such as a university campus or research laboratory, would be expected to implement these algorithms or others proved to have equivalent functionality. A comprehensive discussion of the design principles and performance is given in [MIL91a].

In order for the NTP filter and selection algorithms to operate effectively, it is useful to have a measure of recent sample variance recorded for each peer. The measure adopted is based on first-order differences, which are easy to compute and effective for the purposes intended. There are two measures, one called the filter dispersion ε_σ and the other the select dispersion ε_ξ. Both are computed as the weighted sum of the clock offsets in a temporary list sorted by synchronization distance. If θ_i (0 <= i < n) is the offset of the ith entry, then the sample difference ε_ij of the ith entry relative to the jth entry is defined ε_ij = |θ_i - θ_j|. The dispersion relative to the jth entry is defined ε_j and computed as the weighted sum:

ε_j = Σ(i=0 to n-1) ε_ij w^(i+1)

where w is a weighting factor chosen to control the influence of synchronization distance in the dispersion budget. In the NTP algorithms w is chosen less than 1/2: w = NTP.FILTER for filter dispersion and w = NTP.SELECT for select dispersion. The (absolute) dispersion ε_σ and ε_ξ as used in the NTP algorithms are defined relative to the 0th entry ε_0.

There are two procedures described in the following, the clock-filter procedure, which is used to select the best offset samples from a given clock, and the clock-selection procedure, which is used to select the best clock among a hierarchical set of clocks.

4.1. Clock-Filter Procedure

The clock-filter procedure is executed upon arrival of an NTP message or other event that results in new data samples. It takes arguments of the form (θ, δ, ε), where θ is a sample clock offset measurement and δ and ε are the associated roundtrip delay and dispersion. It determines the filtered clock offset (peer.offset), roundtrip delay (peer.delay) and dispersion (peer.dispersion). It also updates the dispersion of samples already recorded and saves the current time (peer.update).

The basis of the clock-filter procedure is the filter shift register (peer.filter), which consists of NTP.SHIFT stages, each stage containing a 3-tuple [θ_i, δ_i, ε_i], with indices numbered from zero on the left. The filter is initialized with the value [0, 0, NTP.MAXDISPERSE] in all stages by the clear procedure. New data samples are shifted into the filter at the left end, causing first NULLs then old samples to fall off the right end. The packet procedure provides samples of the form (θ, δ, ε) as new updates arrive, while the transmit procedure provides samples of the form [0, 0, NTP.MAXDISPERSE] when two poll intervals elapse without a fresh update. While the same symbols (θ, δ, ε) are used here for the arguments, clock-filter contents and peer variables, the meaning will be clear from context. The following pseudo-code describes this procedure.

begin clock-filter procedure (θ, δ, ε)

for (i from NTP.SIZE - 1 to 1) begin /* update dispersion */
[θ_i, δ_i, ε_i] &lt;- [θ_(i-1), δ_(i-1), ε_(i-1)];
ε_i = ε_i + φτ;
add [λ_i = ε_i + |δ_i|/2, i] to temporary list;
endfor;
[θ_0, δ_0, ε_0] &lt;- [θ, δ, ε]; /* insert new sample */
add [λ = ε + |δ|/2, 0] to temporary list;
peer.update &lt;- sys.clock; /* reset base time */
sort temporary list by increasing [distance || index];

ε_σ &lt;- 0;
for (i from NTP.SHIFT-1 to 0) /* compute filter dispersion */
if (peer.dispersion_index[i] >= NTP.MAXDISPERSE or
|θ_i - θ_0| > NTP.MAXDISPERSE)
ε_σ &lt;- (ε_σ + NTP.MAXDISPERSE) × NTP.FILTER;
else
ε_σ &lt;- (ε_σ + |θ_i - θ_0|) × NTP.FILTER;

peer.offset &lt;- θ_0; /* update peer variables */
peer.delay &lt;- δ_0;
peer.dispersion &lt;- min(ε_0 + ε_σ, NTP.MAXDISPERSE);
end clock-filter procedure

The peer.offset and peer.delay variables represent the clock offset and roundtrip delay of the local clock relative to the peer clock. Both of these are precision quantities and can usually be averaged over long intervals in order to improve accuracy and stability without bias accumulation (see Appendix H). The peer.dispersion variable represents the maximum error due to measurement error, skew-error accumulation and sample variance. All three variables are used in the clock-selection and clock-combining procedures to select the peer clock(s) used for synchronization and to maximize the accuracy and stability of the indications.

4.2. Clock-Selection Procedure

The clock-selection procedure uses the peer variables THETA, DELTA, EPSILON and τ and is called when these variables change or when the reachability status changes. It consists of two algorithms, the intersection algorithm and the clustering algorithm. The intersection algorithm constructs a list of candidate peers eligible to become the synchronization source, computes a confidence interval for each and casts out falsetickers using a technique adapted from Marzullo and Owicki [MAR85]. The clustering algorithm sorts the list of surviving candidates in order of stratum and synchronization distance and repeatedly casts out outliers on the basis of select dispersion until only the most accurate, precise and stable survivors are left. A bit is set for each survivor to indicate the outcome of the selection process. The system variable sys.peer is set as a pointer to the most likely survivor, if there is one, or to the NULL value if not.

4.2.1. Intersection Algorithm

begin clock-selection procedure

m &lt;- 0;
for (each peer) /* calling all peers */
if (peer.reach != 0 and peer.dispersion &lt; NTP.MAXDISPERSE and
not (peer.stratum > 1 and peer.refid = peer.hostaddr)) begin
LAMBDA &lt;- distance(peer); /* make list entry */
add [THETA - LAMBDA, -1] to endpoint list;
add [THETA, 0] to endpoint list;
add [THETA + LAMBDA, 1] to endpoint list;
m &lt;- m + 1;
endif
endfor
if (m = 0) begin /* skedaddle if no candidates */
sys.peer &lt;- NULL;
sys.stratum &lt;- 0 (undefined);
exit;
endif
sort endpoint list by increasing endpoint||type;

Each peer is examined in turn and added to an endpoint list only if it passes several sanity checks designed to avoid loops and use of exceptionally noisy data. If no peers survive the sanity checks, the procedure exits without finding a synchronization source. For each of m survivors three entries of the form [endpoint, type] are added to the endpoint list: [THETA - LAMBDA, -1], [THETA, 0] and [THETA + LAMBDA, 1]. There will be 3m entries on the list, which is then sorted by increasing endpoint.

The following algorithm is adapted from DTS [DEC89] and is designed to produce the largest single intersection containing only truechimers. The algorithm begins by initializing a value f and counters i and c to zero. Then, starting from the lowest endpoint of the sorted endpoint list, for each entry [endpoint, type] the value of type is subtracted from the counter i, which is the number of intersections. If type is zero, increment the value of c, which is the number of falsetickers (see Appendix H). If i >= m - f for some entry, endpoint of that entry becomes the lower endpoint of the intersection; otherwise, f is increased by one and the above procedure is repeated. Without resetting f or c, a similar procedure is used to find the upper endpoint, except that the value of type is added to the counter. If after both endpoints have been determined c <= f, the procedure continues having found m - f truechimers; otherwise, f is increased by one and the entire procedure is repeated.

    for (f from 0 to f >= m/2) begin        /* calling all truechimers */
c &lt;- 0;
i &lt;- 0;
for (each [endpoint, type] from lowest) begin /* find low endpoint */
i &lt;- i - type;
low &lt;- endpoint;
if (i >= m - f) break;
if (type = 0) c &lt;- c + 1;
endfor;
i &lt;- 0;

for (each [endpoint, type] from highest) begin /* find high endpoint */
i &lt;- i + type;
high &lt;- endpoint;
if (i >= m - f) break;
if (type = 0) c &lt;- c + 1;
endfor;
if (c &lt;= f) break; /* continue until all falsetickers found */
endfor;
if (low > high) begin /* quit if no intersection found */
sys.peer &lt;- NULL;
exit;
endif;

Note that processing continues past this point only if there are more than m/2 intersections. However, it is possible, but not highly likely, that there may be fewer than m/2 truechimers remaining in the intersection.

4.2.2. Clustering Algorithm

In the original DTS algorithm the clock-selection procedure exits at this point with the presumed correct time set midway in the computed intersection [low, high]. However, this can lead to a considerable loss in accuracy and stability, since the individual peer statistics are lost. Therefore, in NTP the candidates that survived the preceding steps are processed further. The candidate list is rebuilt with entries of the form [distance, index], where distance is computed from the (scaled) peer stratum and synchronization distance LAMBDA. The scaling factor provides a mechanism to weight the combination of stratum and distance. Ordinarily, the stratum will dominate, unless one or more of the survivors has an exceptionally high distance. The list is then sorted by increasing distance.

    m &lt;- 0;
for (each peer) begin /* calling all peers */
if (low &lt;= θ &lt;= high) begin
LAMBDA &lt;- distance(peer); /* make list entry */
dist &lt;- peer.stratum × NTP.MAXDISPERSE + LAMBDA
add [dist, peer] to candidate list;
m &lt;- m + 1;
endif;
endfor;
sort candidate list by increasing dist;

The next steps are designed to cast out outliers which exhibit significant dispersions relative to the other members of the candidate list while minimizing wander, especially on high-speed LANs with many time servers. Wander causes needless network overhead, since the poll interval is clamped at sys.poll as each new peer is selected for synchronization and only slowly increases when the peer is no longer selected. It has been the practical experience that the number of candidates surviving to this point can become quite large and can result in significant processor cycles without materially enhancing stability and accuracy. Accordingly, the candidate list is truncated at NTP.MAXCLOCK entries.

Note ε_ξi is the select (sample) dispersion relative to the ith peer represented on the candidate list, which can be calculated in a manner similar to the filter dispersion described previously. The EPSILON_j is the dispersion of the jth peer represented on the list and includes components due to measurement error, skew-error accumulation and filter dispersion. If the maximum ε_ξi is greater than the minimum EPSILON_j and the number of survivors is greater than NTP.MINCLOCK, the ith peer is discarded from the list and the procedure is repeated. If the current synchronization source is one of the survivors and there is no other survivor of lower stratum, then the procedure exits without doing anything further. Otherwise, the synchronization source is set to the first survivor on the candidate list. In the following i, j, k, l are peer indices, with k the index of the current synchronization source (NULL if none) and l the index of the first survivor on the candidate list.

    while begin
for (each survivor [distance, index]) begin /* compute dispersions */
find index i for max ε_ξi;
find index j for min EPSILON_j;
endfor
if (ε_ξi &lt;= EPSILON_j or m &lt;= NTP.MINCLOCK) break;
peer.survivor[i] &lt;- 0; /* discard ith peer */
if (i = k) sys.peer &lt;- NULL;
delete the ith peer from the candidate list;
m &lt;- m - 1;
endwhile
if (peer.survivor[k] = 0 or peer.stratum[k] > peer.stratum[l]) begin
sys.peer &lt;- l; /* new clock source */
call poll-update;
endif
end clock-select procedure;

The algorithm is designed to favor those peers near the head of the candidate list, which are at the lowest stratum and distance and presumably can provide the most accurate and stable time. With proper selection of weight factor v (also called NTP.SELECT), entries will be trimmed from the tail of the list, unless a few outliers disagree significantly with respect to the remaining entries, in which case the outliers are discarded first. The termination condition is designed to avoid needless switching between synchronization sources when not statistically justified, yet maintain a bias toward the low-stratum, low-distance peers.