5. Encoding of Literal Insertion Lengths and Copy Lengths (字面插入长度和复制长度编码)
官方英文原文 (Official English Text)
As described in Section 2, the literal insertion lengths and backward copy lengths are encoded using a single prefix code. This section provides the details to this encoding.
Each <insertion length, copy length> pair in the compressed data part of a meta-block is represented with the following triplet:
<insert-and-copy length code, insert extra bits, copy extra bits>
The insert-and-copy length code, the insert extra bits, and the copy extra bits are encoded back-to-back, the insert-and-copy length code is encoded using a prefix code over the insert-and-copy length code alphabet, while the extra bits values are encoded as fixed-width integer values. The number of insert and copy extra bits can be 0..24, and they are dependent on the insert-and-copy length code.
Some of the insert-and-copy length codes also express the fact that the distance symbol of the distance in the same command is 0, i.e., the distance component of the command is the same as that of the previous command. In this case, the distance code and extra bits for the distance are omitted from the compressed data stream.
We describe the insert-and-copy length code alphabet in terms of the (not directly used) insert length code and copy length code alphabets. The symbols of the insert length code alphabet, along with the number of insert extra bits, and the range of the insert lengths are as follows:
Extra Extra Extra
Code Bits Lengths Code Bits Lengths Code Bits Lengths
---- ---- ------- ---- ---- ------- ---- ---- -------
0 0 0 8 2 10..13 16 6 130..193
1 0 1 9 2 14..17 17 7 194..321
2 0 2 10 3 18..25 18 8 322..577
3 0 3 11 3 26..33 19 9 578..1089
4 0 4 12 4 34..49 20 10 1090..2113
5 0 5 13 4 50..65 21 12 2114..6209
6 1 6,7 14 5 66..97 22 14 6210..22593
7 1 8,9 15 5 98..129 23 24 22594..16799809
The symbols of the copy length code alphabet, along with the number of copy extra bits, and the range of copy lengths are as follows:
Extra Extra Extra
Code Bits Lengths Code Bits Lengths Code Bits Lengths
---- ---- ------- ---- ---- ------- ---- ---- -------
0 0 2 8 1 10,11 16 5 70..101
1 0 3 9 1 12,13 17 5 102..133
2 0 4 10 2 14..17 18 6 134..197
3 0 5 11 2 18..21 19 7 198..325
4 0 6 12 3 22..29 20 8 326..581
5 0 7 13 3 30..37 21 9 582..1093
6 0 8 14 4 38..53 22 10 1094..2117
7 0 9 15 4 54..69 23 24 2118..16779333
To convert an insert-and-copy length code to an insert length code and a copy length code, the following table can be used:
Insert
length Copy length code
code 0..7 8..15 16..23
+----------+----------+
| | |
0..7 | 0..63 | 64..127 | <--- distance symbol 0
| | |
+----------+----------+----------+
| | | |
0..7 | 128..191 | 192..255 | 384..447 |
| | | |
+----------+----------+----------+
| | | |
8..15 | 256..319 | 320..383 | 512..575 |
| | | |
+----------+----------+----------+
| | | |
16..23 | 448..511 | 576..639 | 640..703 |
| | | |
+----------+----------+----------+
First, look up the cell with the 64 value range containing the insert-and-copy length code; this gives the insert length code and the copy length code ranges, both 8 values long. The copy length code within its range is determined by bits 0..2 (counted from the lsb) of the insert-and-copy length code. The insert length code within its range is determined by bits 3..5 (counted from the lsb) of the insert-and-copy length code. Given the insert length and copy length codes, the actual insert and copy lengths can be obtained by reading the number of extra bits given by the tables above.
If the insert-and-copy length code is between 0 and 127, the distance code of the command is set to zero (the last distance reused).
中文翻译 (Chinese Translation)
如第 2 节所述,字面插入长度 (Literal Insertion Length) 和后向复制长度 (Backward Copy Length) 使用单个前缀编码 (Single Prefix Code) 进行编码。本节提供此编码的详细信息.
元块压缩数据部分中的每个 <insertion length, copy length> 对(插入长度,复制长度)用以下三元组 (Triplet) 表示:
<insert-and-copy length code, insert extra bits, copy extra bits>
(插入和复制长度编码, 插入额外位, 复制额外位)
插入和复制长度编码 (Insert-and-Copy Length Code)、插入额外位 (Insert Extra Bits) 和复制额外位 (Copy Extra Bits) 背靠背编码,插入和复制长度编码使用插入和复制长度编码字母表 (Insert-and-Copy Length Code Alphabet) 上的前缀编码进行编码,而额外位值编码为固定宽度整数值。插入和复制额外位的数量可以是 0..24,并且它们取决于插入和复制长度编码.
某些插入和复制长度编码还表达这样一个事实: 同一命令中距离的距离符号 (Distance Symbol) 为 0,即命令的距离分量 (Distance Component) 与前一个命令的距离分量相同。在这种情况下,距离的距离编码和额外位从压缩数据流中省略 (Omitted).
我们根据(不直接使用的)插入长度编码 (Insert Length Code) 和复制长度编码 (Copy Length Code) 字母表来描述插入和复制长度编码字母表。插入长度编码字母表的符号 (Symbol),以及插入额外位的数量和插入长度的范围 (Range) 如下:
Extra Extra Extra
Code Bits Lengths Code Bits Lengths Code Bits Lengths
---- ---- ------- ---- ---- ------- ---- ---- -------
0 0 0 8 2 10..13 16 6 130..193
1 0 1 9 2 14..17 17 7 194..321
2 0 2 10 3 18..25 18 8 322..577
3 0 3 11 3 26..33 19 9 578..1089
4 0 4 12 4 34..49 20 10 1090..2113
5 0 5 13 4 50..65 21 12 2114..6209
6 1 6,7 14 5 66..97 22 14 6210..22593
7 1 8,9 15 5 98..129 23 24 22594..16799809
复制长度编码字母表的符号,以及复制额外位的数量和复制长度的范围如下:
Extra Extra Extra
Code Bits Lengths Code Bits Lengths Code Bits Lengths
---- ---- ------- ---- ---- ------- ---- ---- -------
0 0 2 8 1 10,11 16 5 70..101
1 0 3 9 1 12,13 17 5 102..133
2 0 4 10 2 14..17 18 6 134..197
3 0 5 11 2 18..21 19 7 198..325
4 0 6 12 3 22..29 20 8 326..581
5 0 7 13 3 30..37 21 9 582..1093
6 0 8 14 4 38..53 22 10 1094..2117
7 0 9 15 4 54..69 23 24 2118..16779333
要将插入和复制长度编码转换为插入长度编码和复制长度编码,可以使用以下表格:
Insert
length Copy length code
code 0..7 8..15 16..23
+----------+----------+
| | |
0..7 | 0..63 | 64..127 | <--- distance symbol 0
| | |
+----------+----------+----------+
| | | |
0..7 | 128..191 | 192..255 | 384..447 |
| | | |
+----------+----------+----------+
| | | |
8..15 | 256..319 | 320..383 | 512..575 |
| | | |
+----------+----------+----------+
| | | |
16..23 | 448..511 | 576..639 | 640..703 |
| | | |
+----------+----------+----------+
首先,查找包含插入和复制长度编码的 64 值范围的单元格; 这给出了插入长度编码和复制长度编码范围,两者都是 8 个值长。其范围内的复制长度编码由插入和复制长度编码的第 0..2 位(从 lsb 计数)确定。其范围内的插入长度编码由插入和复制长度编码的第 3..5 位(从 lsb 计数)确定。给定插入长度和复制长度编码,可以通过读取上表给出的额外位数 (Number of Extra Bits) 来获得实际的插入和复制长度.
如果插入和复制长度编码在 0 和 127 之间,则命令的距离编码设置为零(重用最后一个距离 (Last Distance Reused))。
关键概念说明
编码结构:
- 单个前缀编码同时表示插入长度和复制长度
- 编码范围 0-703,划分为多个区域
额外位机制:
- 插入长度: 0-24 位额外位
- 复制长度: 0-24 位额外位
- 最大插入长度: 16,799,809
- 最大复制长度: 16,779,333
距离重用优化:
- 编码 0-127: 隐含距离符号为 0
- 自动重用上一个距离,无需编码距离值
位提取规则:
- 位 0-2 (LSB): 复制长度编码在其范围内的偏移
- 位 3-5: 插入长度编码在其范围内的偏移
来源 (Source): RFC 7932, Section 5
官方文本 (Official Text): https://www.rfc-editor.org/rfc/rfc7932.txt