Skip to main content

6. Encoding of Block-Switch Commands (块切换命令编码)

官方英文原文 (Official English Text)

As described in Section 2, a block-switch command is a pair <block type, block count>. These are encoded in the compressed data part of the meta-block, right before the start of each new block of a particular block category.

Each block type in the compressed data is represented with a block type code, encoded using a prefix code over the block type code alphabet. A block type symbol 0 means that the new block type is the same as the type of the previous block from the same block category, i.e., the block type that preceded the current type, while a block type symbol 1 means that the new block type equals the current block type plus one. If the current block type is the maximal possible, then a block type symbol of 1 results in wrapping to a new block type of 0. Block type symbols 2..257 represent block types 0..255, respectively. The previous and current block types are initialized to 1 and 0, respectively, at the end of the meta-block header.

Since the first block type of each block category is 0, the block type of the first block-switch command is not encoded in the compressed data. If a block category has only one block type, the block count of the first block-switch command is also omitted from the compressed data; otherwise, it is encoded in the meta-block header.

Since the end of the meta-block is detected by the number of uncompressed bytes produced, the block counts for any of the three categories need not count down to exactly zero at the end of the meta-block.

The number of different block types in each block category, denoted by NBLTYPESL, NBLTYPESI, and NBLTYPESD for literals, insert-and-copy lengths, and distances, respectively, is encoded in the meta-block header, and it must equal to the largest block type plus one in that block category. In other words, the set of literal, insert-and-copy length, and distance block types must be [0..NBLTYPESL-1], [0..NBLTYPESI-1], and [0..NBLTYPESD-1], respectively. From this it follows that the alphabet size of literal, insert-and-copy length, and distance block type codes is NBLTYPESL + 2, NBLTYPESI + 2, and NBLTYPESD + 2, respectively.

Each block count in the compressed data is represented with a pair <block count code, extra bits>. The block count code and the extra bits are encoded back-to-back, the block count code is encoded using a prefix code over the block count code alphabet, while the extra bits value is encoded as a fixed-width integer value. The number of extra bits can be 0..24, and it is dependent on the block count code.

The symbols of the block count code alphabet along with the number of extra bits and the range of block counts are as follows:

        Extra              Extra               Extra
Code Bits Lengths Code Bits Lengths Code Bits Lengths
---- ---- ------- ---- ---- ------- ---- ---- -------
0 2 1..4 9 4 65..80 18 7 369..496
1 2 5..8 10 4 81..96 19 8 497..752
2 2 9..12 11 4 97..112 20 9 753..1264
3 2 13..16 12 5 113..144 21 10 1265..2288
4 3 17..24 13 5 145..176 22 11 2289..4336
5 3 25..32 14 5 177..208 23 12 4337..8432
6 3 33..40 15 5 209..240 24 13 8433..16624
7 3 41..48 16 6 241..304 25 24 16625..16793840
8 4 49..64 17 6 305..368

The first block-switch command of each block category is special in the sense that it is encoded in the meta-block header, and as described earlier, the block type code is omitted since it is an implicit zero.


中文翻译 (Chinese Translation)

如第 2 节所述,块切换命令 (Block-Switch Command) 是一对 <block type, block count> (块类型,块计数)。这些在元块的压缩数据部分中编码,就在特定块类别 (Block Category) 的每个新块开始之前.

压缩数据中的每个块类型 (Block Type) 用块类型编码 (Block Type Code) 表示,使用块类型编码字母表 (Block Type Code Alphabet) 上的前缀编码进行编码。块类型符号 (Block Type Symbol) 0 表示新块类型与来自同一块类别的前一个块 (Previous Block) 的类型相同,即先于当前类型的块类型,而块类型符号 1 表示新块类型等于当前块类型 (Current Block Type) 加一。如果当前块类型是最大可能值,则块类型符号 1 导致回绕 (Wrapping) 到新块类型 0。块类型符号 2..257 分别表示块类型 0..255。前一个和当前块类型在元块头部结束时分别初始化为 1 和 0.

由于每个块类别的第一个块类型为 0,因此第一个块切换命令的块类型不在压缩数据中编码。如果块类别只有一个块类型,则第一个块切换命令的块计数也从压缩数据中省略; 否则,它在元块头部中编码.

由于元块的结束由生成的未压缩字节数 (Number of Uncompressed Bytes Produced) 检测到,因此三个类别中任何一个的块计数不需要在元块结束时精确倒计时到零.

每个块类别中不同块类型的数量 (Number of Different Block Types),分别用 NBLTYPESLNBLTYPESINBLTYPESD 表示字面量 (Literals)、插入和复制长度 (Insert-and-Copy Lengths) 以及距离 (Distances),在元块头部中编码,并且它必须等于该块类别中最大块类型加一。换句话说,字面量、插入和复制长度以及距离块类型的集合必须分别为 [0..NBLTYPESL-1][0..NBLTYPESI-1][0..NBLTYPESD-1]。由此可知,字面量、插入和复制长度以及距离块类型编码的字母表大小 (Alphabet Size) 分别为 NBLTYPESL + 2NBLTYPESI + 2NBLTYPESD + 2.

压缩数据中的每个块计数 (Block Count) 用一对 <block count code, extra bits> (块计数编码,额外位) 表示。块计数编码 (Block Count Code) 和额外位背靠背编码,块计数编码使用块计数编码字母表 (Block Count Code Alphabet) 上的前缀编码进行编码,而额外位值编码为固定宽度整数值。额外位的数量可以是 0..24,并且取决于块计数编码.

块计数编码字母表的符号以及额外位的数量和块计数的范围如下:

        Extra              Extra               Extra
Code Bits Lengths Code Bits Lengths Code Bits Lengths
---- ---- ------- ---- ---- ------- ---- ---- -------
0 2 1..4 9 4 65..80 18 7 369..496
1 2 5..8 10 4 81..96 19 8 497..752
2 2 9..12 11 4 97..112 20 9 753..1264
3 2 13..16 12 5 113..144 21 10 1265..2288
4 3 17..24 13 5 145..176 22 11 2289..4336
5 3 25..32 14 5 177..208 23 12 4337..8432
6 3 33..40 15 5 209..240 24 13 8433..16624
7 3 41..48 16 6 241..304 25 24 16625..16793840
8 4 49..64 17 6 305..368

每个块类别的第一个块切换命令是特殊的 (Special),因为它在元块头部中编码,并且如前所述,块类型编码被省略,因为它是隐式零 (Implicit Zero).

关键概念说明

块类型编码机制:

  • 符号 0: 重用前一个块类型
  • 符号 1: 当前块类型 + 1 (带回绕)
  • 符号 2-257: 直接映射到块类型 0-255

初始化状态:

  • 前一个块类型 (Previous): 1
  • 当前块类型 (Current): 0
  • 初始化时机: 元块头部结束

块计数范围:

  • 最小值: 1
  • 最大值: 16,793,840
  • 26 个编码级别
  • 额外位: 2-24 位

字母表大小公式:

  • 字面量块类型: NBLTYPESL + 2
  • 插入和复制长度块类型: NBLTYPESI + 2
  • 距离块类型: NBLTYPESD + 2

特殊处理:

  • 第一个块切换命令的块类型始终为 0,不编码
  • 单块类型类别的第一个块计数也被省略
  • 元块结束时块计数无需精确归零

来源 (Source): RFC 7932, Section 6
官方文本 (Official Text): https://www.rfc-editor.org/rfc/rfc7932.txt