Skip to main content

1. Introduction

1. Introduction

1.1. Purpose

The purpose of this specification is to define a lossless compressed data format that:

  • is independent of CPU type, operating system, file system, and character set; hence, it can be used for interchange.

  • can be produced or consumed, even for an arbitrarily long, sequentially presented input data stream, using only an a priori bounded amount of intermediate storage; hence, it can be used in data communications or similar structures, such as Unix filters.

  • compresses data with a compression ratio comparable to the best currently available general-purpose compression methods, in particular, considerably better than the gzip program.

  • decompresses much faster than current LZMA implementations.

The data format defined by this specification does not attempt to:

  • allow random access to compressed data.

  • compress specialized data (e.g., raster graphics) as densely as the best currently available specialized algorithms.

This document is the authoritative specification of the brotli compressed data format. It defines the set of valid brotli compressed data streams and a decoder algorithm that produces the uncompressed data stream from a valid brotli compressed data stream.

1.2. Intended Audience

This specification is intended for use by software implementers to compress data into and/or decompress data from the brotli format.

The text of the specification assumes a basic background in programming at the level of bits and other primitive data representations. Familiarity with the technique of Huffman coding is helpful but not required.

This specification uses (heavily) the notations and terminology introduced in the DEFLATE format specification [RFC1951]. For the sake of completeness, we always include the whole text of the relevant parts of RFC 1951; therefore, familiarity with the DEFLATE format is helpful but not required.

The compressed data format defined in this specification is an integral part of the WOFF File Format 2.0 [WOFF2]; therefore, this specification is also intended for implementers of WOFF 2.0 compressors and decompressors.

1.3. Scope

This document specifies a method for representing a sequence of bytes as a (usually shorter) sequence of bits and a method for packing the latter bit sequence into bytes.

1.4. Compliance

Unless otherwise indicated below, a compliant decompressor must be able to accept and decompress any data set that conforms to all the specifications presented here. A compliant compressor must produce data sets that conform to all the specifications presented here.

1.5. Definitions of Terms and Conventions Used

Byte: 8 bits stored or transmitted as a unit (same as an octet). For this specification, a byte is exactly 8 bits, even on machines that store a character on a number of bits different from eight. See below for the numbering of bits within a byte.

String: a sequence of arbitrary bytes.

Bytes stored within a computer do not have a "bit order", since they are always treated as a unit. However, a byte considered as an integer between 0 and 255 does have a most and least significant bit (lsb), and since we write numbers with the most significant digit on the left, we also write bytes with the most significant bit (msb) on the left. In the diagrams below, we number the bits of a byte so that bit 0 is the least significant bit, i.e., the bits are numbered:

+--------+
|76543210|
+--------+

Within a computer, a number may occupy multiple bytes. All multi-byte numbers in the format described here are stored with the least-significant byte first (at the lower memory address). For example, the decimal number 520 is stored as:

    0        1
+--------+--------+
|00001000|00000010|
+--------+--------+
^ ^
| |
| + more significant byte = 2 x 256
+ less significant byte = 8

1.5.1. Packing into Bytes

This document does not address the issue of the order in which bits of a byte are transmitted on a bit-sequential medium, since the final data format described here is byte rather than bit oriented. However, we describe the compressed block format in below as a sequence of data elements of various bit lengths, not a sequence of bytes. We must therefore specify how to pack these data elements into bytes to form the final compressed byte sequence:

  • Data elements are packed into bytes in order of increasing bit number within the byte, i.e., starting with the least-significant bit of the byte.

  • Data elements other than Huffman codes are packed starting with the least-significant bit of the data element.

  • Huffman codes are packed starting with the most-significant bit of the code.

In other words, if one were to print out the compressed data as a sequence of bytes, starting with the first byte at the right margin and proceeding to the left, with the most-significant bit of each byte on the left as usual, one would be able to parse the result from right to left, with fixed-width elements in the correct MSB-to-LSB order and Huffman codes in bit-reversed order (i.e., with the first bit of the code in the relative LSB position).

As an example, consider packing the following data elements into a sequence of 3 bytes:

  • 2-bit value 1 (binary 01)
  • 3-bit value 2 (binary 010)
  • 5-bit value 6 (binary 00110)
  • 4-bit code with bit pattern 1011 (with the first bit being 1, the second bit being 0, the third bit being 1, and the fourth bit being 1)
  • 2-bit value 0 (binary 00)
  • 4-bit code with bit pattern 0011
  • 5-bit value 31 (binary 11111)

The following diagram shows how these elements are packed:

       Byte 0           Byte 1           Byte 2
+---------------+----------------+----------------+
|0|1|1|0|0|0|1|0|1|1|0|1|1|0|0|0|0|1|1|1|1|1|1|1|
+---------------+----------------+----------------+
^ ^ ^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | | | |
| | | | | | | | +-------------|---+ 5-bit value 31
| | | | | | | +-----------------|---|-+ 4-bit code 0011
| | | | | | +-----------------------+ |
| | | | | | | | 2-bit value 0
| | | | | +-----------------------------+-----+ 4-bit code 1011
| | | +---+ 5-bit value 6
| | +-----+ 3-bit value 2
| +---+ 2-bit value 1

Source: RFC 7932, Section 1 Official Text: https://www.rfc-editor.org/rfc/rfc7932.txt