3. MD5 Algorithm Description
We begin by supposing that we have a b-bit message as input, and that we wish to find its message digest. Here b is an arbitrary nonnegative integer; b may be zero, it need not be a multiple of eight, and it may be arbitrarily large. We imagine the bits of the message written down as follows:
m_0 m_1 ... m_{b-1}
The following five steps are performed to compute the message digest of the message.
3.1 Step 1. Append Padding Bits
The message is "padded" (extended) so that its length (in bits) is congruent to 448, modulo 512. That is, the message is extended so that it is just 64 bits shy of being a multiple of 512 bits long. Padding is always performed, even if the length of the message is already congruent to 448, modulo 512.
Padding is performed as follows: a single "1" bit is appended to the message, and then "0" bits are appended so that the length in bits of the padded message becomes congruent to 448, modulo 512. In all, at least one bit and at most 512 bits are appended.
3.2 Step 2. Append Length
A 64-bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step. In the unlikely event that b is greater than 2^64, then only the low-order 64 bits of b are used. (These bits are appended as two 32-bit words and appended low-order word first in accordance with the previous conventions.)
At this point the resulting message (after padding with bits and with b) has a length that is an exact multiple of 512 bits. Equivalently, this message has a length that is an exact multiple of 16 (32-bit) words. Let M[0 ... N-1] denote the words of the resulting message, where N is a multiple of 16.
3.3 Step 3. Initialize MD Buffer
A four-word buffer (A,B,C,D) is used to compute the message digest. Here each of A, B, C, D is a 32-bit register. These registers are initialized to the following values in hexadecimal, low-order bytes first):
word A: 01 23 45 67
word B: 89 ab cd ef
word C: fe dc ba 98
word D: 76 54 32 10
3.4 Step 4. Process Message in 16-Word Blocks
We first define four auxiliary functions that each take as input three 32-bit words and produce as output one 32-bit word.
F(X,Y,Z) = XY v not(X) Z
G(X,Y,Z) = XZ v Y not(Z)
H(X,Y,Z) = X xor Y xor Z
I(X,Y,Z) = Y xor (X v not(Z))
In each bit position F acts as a conditional: if X then Y else Z. The function F could have been defined using + instead of v since XY and not(X)Z will never have 1's in the same bit position.) It is interesting to note that if the bits of X, Y, and Z are independent and unbiased, the each bit of F(X,Y,Z) will be independent and unbiased.
The functions G, H, and I are similar to the function F, in that they act in "bitwise parallel" to produce their output from the bits of X, Y, and Z, in such a manner that if the corresponding bits of X, Y, and Z are independent and unbiased, then each bit of G(X,Y,Z), H(X,Y,Z), and I(X,Y,Z) will be independent and unbiased.
在每个位位置上,F 充当条件运算:如果 X 则 Y 否则 Z。函数 F 本可以使用 + 而不是 v 来定义,因为 XY 和 not(X)Z 永远不会在同一位位置上都有 1。值得注意的是,如果 X、Y 和 Z 的位是独立且无偏的,则 F(X,Y,Z) 的每一位都将是独立且无偏的。
函数 G、H 和 I 与函数 F 类似,它们以"按位并行"的方式从 X、Y 和 Z 的位产生输出,使得如果 X、Y 和 Z 的相应位是独立且无偏的,则 G(X,Y,Z)、H(X,Y,Z) 和 I(X,Y,Z) 的每一位都将是独立且无偏的。注意,函数 H 是其输入的按位"xor"或"奇偶校验"函数。
此步骤使用从正弦函数构造的 64 元素表 T[1 ... 64]。令 T[i] 表示表的第 i 个元素,它等于 4294967296 乘以 abs(sin(i)) 的整数部分,其中 i 以弧度为单位。表的元素在附录中给出。
执行以下操作:
/* 处理每个 16 字块。 */
For i = 0 to N/16-1 do
/* 将块 i 复制到 X 中。 */
For j = 0 to 15 do
Set X[j] to M[i*16+j].
end /* of loop on j */
/* 将 A 保存为 AA,B 保存为 BB,C 保存为 CC,D 保存为 DD。 */
AA = A
BB = B
CC = C
DD = D
/* 第 1 轮。 */
/* 令 [abcd k s i] 表示操作
a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s)。 */
/* 执行以下 16 个操作。 */
[ABCD 0 7 1] [DABC 1 12 2] [CDAB 2 17 3] [BCDA 3 22 4]
[ABCD 4 7 5] [DABC 5 12 6] [CDAB 6 17 7] [BCDA 7 22 8]
[ABCD 8 7 9] [DABC 9 12 10] [CDAB 10 17 11] [BCDA 11 22 12]
[ABCD 12 7 13] [DABC 13 12 14] [CDAB 14 17 15] [BCDA 15 22 16]
/* 第 2 轮。 */
/* 令 [abcd k s i] 表示操作
a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s)。 */
/* 执行以下 16 个操作。 */
[ABCD 1 5 17] [DABC 6 9 18] [CDAB 11 14 19] [BCDA 0 20 20]
[ABCD 5 5 21] [DABC 10 9 22] [CDAB 15 14 23] [BCDA 4 20 24]
[ABCD 9 5 25] [DABC 14 9 26] [CDAB 3 14 27] [BCDA 8 20 28]
[ABCD 13 5 29] [DABC 2 9 30] [CDAB 7 14 31] [BCDA 12 20 32]
/* 第 3 轮。 */
/* 令 [abcd k s t] 表示操作
a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s)。 */
/* 执行以下 16 个操作。 */
[ABCD 5 4 33] [DABC 8 11 34] [CDAB 11 16 35] [BCDA 14 23 36]
[ABCD 1 4 37] [DABC 4 11 38] [CDAB 7 16 39] [BCDA 10 23 40]
[ABCD 13 4 41] [DABC 0 11 42] [CDAB 3 16 43] [BCDA 6 23 44]
[ABCD 9 4 45] [DABC 12 11 46] [CDAB 15 16 47] [BCDA 2 23 48]
/* 第 4 轮。 */
/* 令 [abcd k s t] 表示操作
a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s)。 */
/* 执行以下 16 个操作。 */
[ABCD 0 6 49] [DABC 7 10 50] [CDAB 14 15 51] [BCDA 5 21 52]
[ABCD 12 6 53] [DABC 3 10 54] [CDAB 10 15 55] [BCDA 1 21 56]
[ABCD 8 6 57] [DABC 15 10 58] [CDAB 6 15 59] [BCDA 13 21 60]
[ABCD 4 6 61] [DABC 11 10 62] [CDAB 2 15 63] [BCDA 9 21 64]
/* 然后执行以下加法。(即用该块开始之前的值递增四个寄存器中的每一个。) */
A = A + AA
B = B + BB
C = C + CC
D = D + DD
end /* of loop on i */
3.5 步骤 5. 输出 (Step 5. Output)
作为输出产生的消息摘要是 A、B、C、D。也就是说,我们从 A 的低位字节开始,以 D 的高位字节结束。
这就完成了 MD5 的描述。附录中给出了 C 语言的参考实现。