Skip to main content

9. Compressed Data Format

  1. Compressed Data Format

In this section, we describe the format of the compressed data set in terms of the format of the individual data items described in the previous sections.

9.1. Format of the Stream Header

The stream header has only the following one field:

1..7 bits: WBITS, a value in the range 10..24, encoded with the following variable-length code (as it appears in the compressed data, where the bits are parsed from right to left):

Value Bit Pattern


10 0100001 11 0110001 12 1000001 13 1010001 14 1100001 15 1110001 16 0 17 0000001 18 0011 19 0101 20 0111 21 1001 22 1011 23 1101 24 1111

Note that bit pattern 0010001 is invalid and must not be used.

The size of the sliding window, which is the maximum value of any non-dictionary reference backward distance, is given by the following formula:

window size = (1 << WBITS) - 16

9.2. Format of the Meta-Block Header

A compliant compressed data set has at least one meta-block. Each meta-block contains a header with information about the uncompressed length of the meta-block, and a bit signaling if the meta-block is the last one. The format of the meta-block header is the following:

1 bit: ISLAST, set to 1 if this is the last meta-block

1 bit: ISLASTEMPTY, if set to 1, the meta-block is empty; this field is only present if ISLAST bit is set -- if it is 1, then the meta-block and the brotli stream ends at that bit, with any remaining bits in the last byte of the compressed stream filled with zeros (if the fill bits are not zero, then the stream should be rejected as invalid)

2 bits: MNIBBLES, number of nibbles to represent the uncompressed length, encoded with the following fixed-length code:

Value Bit Pattern


0 11 4 00 5 01 6 10

If MNIBBLES is 0, the meta-block is empty, i.e., it does not generate any uncompressed data. In this case, the rest of the meta-block has the following format:

1 bit: reserved, must be zero

2 bits: MSKIPBYTES, number of bytes to represent metadata length

MSKIPBYTES * 8 bits: MSKIPLEN - 1, where MSKIPLEN is the number of metadata bytes; this field is only present if MSKIPBYTES is positive; otherwise, MSKIPLEN is 0 (if MSKIPBYTES is greater than 1, and the last byte is all zeros, then the stream should be rejected as invalid)

0..7 bits: fill bits until the next byte boundary, must be all zeros

MSKIPLEN bytes of metadata, not part of the uncompressed data or the sliding window

MNIBBLES * 4 bits: MLEN - 1, where MLEN is the length of the meta- block uncompressed data in bytes (if MNIBBLES is greater than 4, and the last nibble is all zeros, then the stream should be rejected as invalid)

1 bit: ISUNCOMPRESSED, if set to 1, any bits of compressed data up to the next byte boundary are ignored, and the rest of the meta-block contains MLEN bytes of literal data; this field is only present if the ISLAST bit is not set (if the ignored bits are not all zeros, the stream should be rejected as invalid)

1..11 bits: NBLTYPESL, number of literal block types, encoded with the following variable-length code (as it appears in the compressed data, where the bits are parsed from right to left, so 0110111 has the value 12):

Value Bit Pattern


1 0 2 0001 3..4 x0011 5..8 xx0101 9..16 xxx0111 17..32 xxxx1001 33..64 xxxxx1011 65..128 xxxxxx1101 129..256 xxxxxxx1111

Prefix code over the block type code alphabet for literal block types, appears only if NBLTYPESL >= 2

Prefix code over the block count code alphabet for literal block counts, appears only if NBLTYPESL >= 2

Block count code + extra bits for first literal block count, appears only if NBLTYPESL >= 2

1..11 bits: NBLTYPESI, number of insert-and-copy block types, encoded with the same variable-length code as above

Prefix code over the block type code alphabet for insert-and- copy block types, appears only if NBLTYPESI >= 2

Prefix code over the block count code alphabet for insert-and- copy block counts, appears only if NBLTYPESI >= 2

Block count code + extra bits for first insert-and-copy block count, appears only if NBLTYPESI >= 2

1..11 bits: NBLTYPESD, number of distance block types, encoded with the same variable-length code as above

Prefix code over the block type code alphabet for distance block types, appears only if NBLTYPESD >= 2

Prefix code over the block count code alphabet for distance block counts, appears only if NBLTYPESD >= 2

Block count code + extra bits for first distance block count, appears only if NBLTYPESD >= 2

2 bits: NPOSTFIX, parameter used in the distance coding

4 bits: four most significant bits of NDIRECT, to get the actual value of the parameter NDIRECT, left-shift this four-bit number by NPOSTFIX bits

NBLTYPESL * 2 bits: context mode for each literal block type

1..11 bits: NTREESL, number of literal prefix trees, encoded with the same variable-length code as NBLTYPESL

Literal context map, encoded as described in Section 7.3, appears only if NTREESL >= 2; otherwise, the context map has only zero values

1..11 bits: NTREESD, number of distance prefix trees, encoded with the same variable-length code as NBLTYPESD

Distance context map, encoded as described in Section 7.3, appears only if NTREESD >= 2; otherwise, the context map has only zero values

NTREESL prefix codes for literals

NBLTYPESI prefix codes for insert-and-copy lengths

NTREESD prefix codes for distances

9.3. Format of the Meta-Block Data

The compressed data part of a meta-block consists of a series of commands. Each command has the following format:

Block type code for next insert-and-copy block type, appears only if NBLTYPESI >= 2 and the previous insert-and-copy block count is zero

Block count code + extra bits for next insert-and-copy block count, appears only if NBLTYPESI >= 2 and the previous insert- and-copy block count is zero

Insert-and-copy length, encoded as in Section 5, using the insert- and-copy length prefix code with the current insert-and-copy block type index

Insert length number of literals, with the following format:

Block type code for next literal block type, appears only if NBLTYPESL >= 2 and the previous literal block count is zero

Block count code + extra bits for next literal block count, appears only if NBLTYPESL >= 2 and the previous literal block count is zero

Next byte of the uncompressed data, encoded with the literal prefix code with the index determined by the previous two bytes of the uncompressed data, the current literal block type, and the context map, as described in Section 7.3

Block type code for next distance block type, appears only if NBLTYPESD >= 2 and the previous distance block count is zero

Block count code + extra bits for next distance block count, appears only if NBLTYPESD >= 2 and the previous distance block count is zero

Distance code, encoded as in Section 4, using the distance prefix code with the index determined by the copy length, the current distance block type, and the distance context map, as described in Section 7.3, appears only if the distance code is not an implicit 0, as indicated by the insert-and-copy length code

The number of commands in the meta-block is such that the sum of the uncompressed bytes produced (i.e., the number of literals inserted plus the number of bytes copied from past data or generated from the static dictionary) over all the commands gives the uncompressed length, MLEN encoded in the meta-block header.

If the total number of uncompressed bytes produced after the insert part of the last command equals MLEN, then the copy length of the last command is ignored and will not produce any uncompressed output. In this case, the copy length of the last command can have any value. In any other case, if the number of literals to insert, the copy length, or the resulting dictionary word length would cause MLEN to be exceeded, then the stream should be rejected as invalid.

If the last command of the last non-empty meta-block does not end on a byte boundary, the unused bits in the last byte must be zeros.


Source: RFC 7932 Official Text: https://www.rfc-editor.org/rfc/rfc7932.txt