Skip to main content

6.1.2. The Fallacy of Selection from a Large Database

Another approach that can give a misleading appearance of unpredictability is to randomly select a quantity from a database and to assume that its strength is related to the total number of bits in the database. For example, typical USENET servers process many megabytes of information per day [USENET_1, USENET_2]. Assume that a random quantity was selected by fetching 32 bytes of data from a random starting point in this data. This does not yield 328 = 256 bits worth of unguessability. Even if much of the data is human language that contains no more than 2 or 3 bits of information per byte, it doesn't yield 322 = 64 bits of unguessability. For an adversary with access to the same Usenet database, the unguessability rests only on the starting point of the selection. That is perhaps a little over a couple of dozen bits of unguessability.

The same argument applies to selecting sequences from the data on a publicly available CD/DVD recording or any other large public database. If the adversary has access to the same database, this "selection from a large volume of data" step buys little. However, if a selection can be made from data to which the adversary has no access, such as system buffers on an active multi-user system, it may be of help.