2. General Requirements

Today, a commonly encountered randomness requirement is to pick a user password, usually a simple character string. Obviously, a password that can be guessed does not provide security. For re-usable passwords, it is desirable that users be able to remember the password. This may make it advisable to use pronounceable character strings or phrases composed of ordinary words. But this affects only the format of the password information, not the requirement that the password be very hard to guess.

Many other requirements come from the cryptographic arena. Cryptographic techniques can be used to provide a variety of services, including confidentiality and authentication. Such services are based on quantities, traditionally called "keys", that are unknown to and unguessable by an adversary.

There are even TCP/IP protocol uses for randomness in picking initial sequence numbers [RFC1948].

Generally speaking, the above examples also illustrate two different types of random quantities that may be wanted. In the case of human-usable passwords, the only important characteristic is that they be unguessable. It is not important that they may be composed of ASCII characters, so the top bit of every byte is zero, for example. On the other hand, for fixed length keys and the like, one normally wants quantities that appear to be truly random, that is, quantities whose bits will pass statistical randomness tests.

In some cases, such as the use of symmetric encryption with the one-time pads or an algorithm like the US Advanced Encryption Standard [AES], the parties who wish to communicate confidentially and/or with authentication must all know the same secret key. In other cases, where asymmetric or "public key" cryptographic techniques are used, keys come in pairs. One key of the pair is private and must be kept secret by one party; the other is public and can be published to the world. It is computationally infeasible to determine the private key from the public key, and knowledge of the public key is of no help to an adversary [ASYMMETRIC]. See general references [SCHNEIER, FERGUSON, KAUFMAN].

The frequency and volume of the requirement for random quantities differs greatly for different cryptographic systems. With pure RSA, random quantities are required only when a new key pair is generated; thereafter, any number of messages can be signed without a further need for randomness. The public key Digital Signature Algorithm devised by the US National Institute of Standards and Technology (NIST) requires good random numbers for each signature [DSS]. And encrypting with a one-time pad (in principle the strongest possible encryption technique) requires randomness of equal volume to all the messages to be processed. See general references [SCHNEIER, FERGUSON, KAUFMAN].

In most of these cases, an adversary can try to determine the "secret" key by trial and error. This is possible as long as the key is enough smaller than the message that the correct key can be uniquely identified. The probability of an adversary succeeding at this must be made acceptably low, depending on the particular application. The size of the space the adversary must search is related to the amount of key "information" present, in an information-theoretic sense [SHANNON]. This depends on the number of different secret values possible and the probability of each value, as follows:

                         -----
                         \
   Bits of information =  \     - p   * log  ( p  )
                          /        i       2    i
                         /
                         -----

where i counts from 1 to the number of possible secret values and p sub i is the probability of the value numbered i. (Because p sub i is less than one, the log will be negative, so each term in the sum will be non-negative.)

If there are 2^n different values of equal probability, then n bits of information are present and an adversary would have to try, on the average, half of the values, or 2^(n-1), before guessing the secret quantity. If the probability of different values is unequal, then there is less information present, and fewer guesses will, on average, be required by an adversary. In particular, any values that an adversary can know to be impossible or of low probability can be initially ignored by the adversary, who will search through the more probable values first.

For example, consider a cryptographic system that uses 128-bit keys. If these keys are derived using a fixed pseudo-random number generator that is seeded with an 8-bit seed, then an adversary needs to search through only 256 keys (by running the pseudo-random number generator with every possible seed), not 2^128 keys as may at first appear to be the case. Only 8 bits of "information" are in these 128-bit keys.

While the above analysis is correct on average, it can be misleading in some cases for cryptographic analysis where what is really important is the work factor for an adversary. For example, assume that there is a pseudo-random number generator generating 128-bit keys, as in the previous paragraph, but that it generates zero half of the time and a random selection from the remaining 2^128 - 1 values the rest of the time. The Shannon equation above says that there are 64 bits of information in one of these key values, but an adversary, simply by trying the value zero, can break the security of half of the uses, albeit a random half. Thus, for cryptographic purposes, it is also useful to look at other measures, such as min-entropy, defined as

   Min-entropy =  - log  ( maximum ( p  ) )
                                      i

where i is as above. Using this equation, we get 1 bit of min-entropy for our new hypothetical distribution, as opposed to 64 bits of classical Shannon entropy.

A continuous spectrum of entropies, sometimes called Renyi entropy, has been defined, specified by the parameter r. Here r = 1 is Shannon entropy and r = infinity is min-entropy. When r = zero, it is just log (n), where n is the number of non-zero probabilities. Renyi entropy is a non-increasing function of r, so min-entropy is always the most conservative measure of entropy and usually the best to use for cryptographic evaluation [LUBY].

Statistically tested randomness in the traditional sense is NOT the same as the unpredictability required for security use.

For example, the use of a widely available constant sequence, such as the random table from the CRC Standard Mathematical Tables, is very weak against an adversary. An adversary who learns of or guesses it can easily break all security, future and past, based on the sequence [CRC]. As another example, using AES with a constant key to encrypt successive integers such as 1, 2, 3, ... will produce output that also has excellent statistical randomness properties but is predictable. On the other hand, taking successive rolls of a six-sided die and encoding the resulting values in ASCII would produce statistically poor output with a substantial unpredictable component. So note that passing or failing statistical tests doesn't reveal whether something is unpredictable or predictable.