5. Hashing to a Finite Field

The hash_to_field function hashes a byte string msg of arbitrary length into one or more elements of a field F. This function works in two steps: it first hashes the input byte string to produce a uniformly random byte string, and then interprets this byte string as one or more elements of F.

For the first step, hash_to_field calls an auxiliary function expand_message. This document defines two variants of expand_message: one appropriate for hash functions like SHA-2 [FIPS180-4] or SHA-3 [FIPS202], and another appropriate for extendable-output functions such as SHAKE128 [FIPS202]. Security considerations for each expand_message variant are discussed below (Sections 5.3.1 and 5.3.2).

Implementors MUST NOT use rejection sampling to generate a uniformly random element of F, to ensure that the hash_to_field function is amenable to constant-time implementation. The reason is that rejection sampling procedures are difficult to implement in constant time, and later well-meaning "optimizations" may silently render an implementation non-constant-time. This means that any hash_to_field function based on rejection sampling would be incompatible with constant-time implementation.

The hash_to_field function is also suitable for securely hashing to scalars. For example, when hashing to the scalar field for an elliptic curve (sub)group with prime order r, it suffices to instantiate hash_to_field with target field GF(r).

The hash_to_field function is designed to be indifferentiable from a random oracle [MRH04] when expand_message (Section 5.3) is modeled as a random oracle (see Section 10.5 for details about its indifferentiability). Ensuring indifferentiability requires care; to see why, consider a prime p that is close to 3/4 * 2^256. Reducing a random 256-bit integer modulo this p yields a value that is in the range [0, p / 3] with probability roughly 1/2, meaning that this value is statistically far from uniform in [0, p - 1].

To control bias, hash_to_field instead uses random integers whose length is at least ceil(log2(p)) + k bits, where k is the target security level for the suite in bits. Reducing such integers mod p gives bias at most 2^-k for any p; this bias is appropriate when targeting k-bit security. For each such integer, hash_to_field uses expand_message to obtain L uniform bytes, where

L = ceil((ceil(log2(p)) + k) / 8)

These uniform bytes are then interpreted as an integer via OS2IP. For example, for a 255-bit prime p, and k = 128-bit security, L = ceil((255 + 128) / 8) = 48 bytes.

Note that k is an upper bound on the security level for the corresponding curve. See Section 10.8 for more details and Section 8.9 for guidelines on choosing k for a given curve.

5.1. Efficiency Considerations in Extension Fields

The hash_to_field function described in this section is inefficient for certain extension fields. Specifically, when hashing to an element of the extension field GF(p^m), hash_to_field requires expanding msg into m * L bytes (for L as defined above). For extension fields where log2(p) is significantly smaller than the security level k, this approach is inefficient: it requires expand_message to output roughly m * log2(p) + m * k bits, whereas m * log2(p) + k bytes suffices to generate an element of GF(p^m) with bias at most 2^-k. In such cases, applications MAY use an alternative hash_to_field function, provided it meets the following security requirements:

The function MUST output one or more field elements that are uniformly random except with bias at most 2^-k.
The function MUST NOT use rejection sampling.
The function SHOULD be amenable to straight-line implementations.

For example, Pornin [P20] describes a method for hashing to GF(9767^19) that meets these requirements while using fewer output bits from expand_message than hash_to_field would for that field.

5.2. hash_to_field Implementation

The following procedure implements hash_to_field.

The expand_message parameter to this function MUST conform to the requirements given in Section 5.3. Section 3.1 discusses the REQUIRED method for constructing DST, the domain separation tag. Note that hash_to_field may fail (ABORT) if expand_message fails.

hash_to_field(msg, count)

Parameters:
- DST, a domain separation tag (see Section 3.1).
- F, a finite field of characteristic p and order q = p^m.
- p, the characteristic of F (see immediately above).
- m, the extension degree of F, m >= 1 (see immediately above).
- L = ceil((ceil(log2(p)) + k) / 8), where k is the security
  parameter of the suite (e.g., k = 128).
- expand_message, a function that expands a byte string and
  domain separation tag into a uniformly random byte string
  (see Section 5.3).

Input:
- msg, a byte string containing the message to hash.
- count, the number of elements of F to output.

Output:
- (u_0, ..., u_(count - 1)), a list of field elements.

Steps:
1. len_in_bytes = count * m * L
2. uniform_bytes = expand_message(msg, DST, len_in_bytes)
3. for i in (0, ..., count - 1):
4.   for j in (0, ..., m - 1):
5.     elm_offset = L * (j + i * m)
6.     tv = substr(uniform_bytes, elm_offset, L)
7.     e_j = OS2IP(tv) mod p
8.   u_i = (e_0, ..., e_(m - 1))
9. return (u_0, ..., u_(count - 1))

5.3. expand_message

expand_message is a function that generates a uniformly random byte string. It takes three arguments:

msg, a byte string containing the message to hash,
DST, a byte string that acts as a domain separation tag, and
len_in_bytes, the number of bytes to be generated.

This document defines the following two variants of expand_message:

expand_message_xmd (Section 5.3.1) is appropriate for use with a wide range of hash functions, including SHA-2 [FIPS180-4], SHA-3 [FIPS202], BLAKE2 [RFC7693], and others.
expand_message_xof (Section 5.3.2) is appropriate for use with extendable-output functions (XOFs), including functions in the SHAKE [FIPS202] or BLAKE2X [BLAKE2X] families.

These variants should suffice for the vast majority of use cases, but other variants are possible; Section 5.3.4 discusses requirements.

5.3.1. expand_message_xmd

The expand_message_xmd function produces a uniformly random byte string using a cryptographic hash function H that outputs b bits. For security, H MUST meet the following requirements:

The number of bits output by H MUST be b >= 2 * k, where k is the target security level in bits, and b MUST be divisible by 8. The first requirement ensures k-bit collision resistance; the second ensures uniformity of expand_message_xmd's output.
H MAY be a Merkle-Damgaard hash function like SHA-2. In this case, security holds when the underlying compression function is modeled as a random oracle [CDMP05]. (See Section 10.6 for discussion.)
H MAY be a sponge-based hash function like SHA-3 or BLAKE2. In this case, security holds when the inner function is modeled as a random transformation or as a random permutation [BDPV08].
Otherwise, H MUST be a hash function that has been proved indifferentiable from a random oracle [MRH04] under a reasonable cryptographic assumption.

SHA-2 [FIPS180-4] and SHA-3 [FIPS202] are typical and RECOMMENDED choices. As an example, for the 128-bit security level, b >= 256 bits and either SHA-256 or SHA3-256 would be an appropriate choice.

The hash function H is assumed to work by repeatedly ingesting fixed-length blocks of data. The length in bits of these blocks is called the input block size (s). As examples, s = 1024 for SHA-512 [FIPS180-4] and s = 576 for SHA3-512 [FIPS202]. For correctness, H requires b <= s.

The following procedure implements expand_message_xmd.

expand_message_xmd(msg, DST, len_in_bytes)

Parameters:
- H, a hash function (see requirements above).
- b_in_bytes, b / 8 for b the output size of H in bits.
  For example, for b = 256, b_in_bytes = 32.
- s_in_bytes, the input block size of H, measured in bytes (see
  discussion above). For example, for SHA-256, s_in_bytes = 64.

Input:
- msg, a byte string.
- DST, a byte string of at most 255 bytes.
  See below for information on using longer DSTs.
- len_in_bytes, the length of the requested output in bytes,
  not greater than the lesser of (255 * b_in_bytes) or 2^16-1.

Output:
- uniform_bytes, a byte string.

Steps:
1.  ell = ceil(len_in_bytes / b_in_bytes)
2.  ABORT if ell > 255 or len_in_bytes > 65535 or len(DST) > 255
3.  DST_prime = DST || I2OSP(len(DST), 1)
4.  Z_pad = I2OSP(0, s_in_bytes)
5.  l_i_b_str = I2OSP(len_in_bytes, 2)
6.  msg_prime = Z_pad || msg || l_i_b_str || I2OSP(0, 1) || DST_prime
7.  b_0 = H(msg_prime)
8.  b_1 = H(b_0 || I2OSP(1, 1) || DST_prime)
9.  for i in (2, ..., ell):
10.    b_i = H(strxor(b_0, b_(i - 1)) || I2OSP(i, 1) || DST_prime)
11. uniform_bytes = b_1 || ... || b_ell
12. return substr(uniform_bytes, 0, len_in_bytes)

Note that the string Z_pad (step 6) is prefixed to msg before computing b_0 (step 7). This is necessary for security when H is a Merkle-Damgaard hash, e.g., SHA-2 (see Section 10.6). Hashing this additional data means that the cost of computing b_0 is higher than the cost of simply computing H(msg). In most settings, this overhead is negligible, because the cost of evaluating H is much less than the other costs involved in hashing to a curve.

It is possible, however, to entirely avoid this overhead by taking advantage of the fact that Z_pad depends only on H, and not on the arguments to expand_message_xmd. To do so, first precompute and save the internal state of H after ingesting Z_pad. Then, when computing b_0, initialize H using the saved state. Further details are implementation dependent and are beyond the scope of this document.

5.3.2. expand_message_xof

The expand_message_xof function produces a uniformly random byte string using an extendable-output function (XOF) H. For security, H MUST meet the following criteria:

The collision resistance of H MUST be at least k bits.
H MUST be an XOF that has been proved indifferentiable from a random oracle under a reasonable cryptographic assumption.

The SHAKE XOF family [FIPS202] is a typical and RECOMMENDED choice. As an example, for 128-bit security, SHAKE128 would be an appropriate choice.

The following procedure implements expand_message_xof.

expand_message_xof(msg, DST, len_in_bytes)

Parameters:
- H(m, d), an extendable-output function that processes
  input message m and returns d bytes.

Input:
- msg, a byte string.
- DST, a byte string of at most 255 bytes.
  See below for information on using longer DSTs.
- len_in_bytes, the length of the requested output in bytes.

Output:
- uniform_bytes, a byte string.

Steps:
1. ABORT if len_in_bytes > 65535 or len(DST) > 255
2. DST_prime = DST || I2OSP(len(DST), 1)
3. msg_prime = msg || I2OSP(len_in_bytes, 2) || DST_prime
4. uniform_bytes = H(msg_prime, len_in_bytes)
5. return uniform_bytes

5.3.3. Using DSTs Longer than 255 Bytes

The expand_message variants defined in this section accept domain separation tags of at most 255 bytes. If applications require a domain separation tag longer than 255 bytes, e.g., because of requirements imposed by an invoking protocol, implementors MUST compute a short domain separation tag by hashing, as follows:

For expand_message_xmd using hash function H, DST is computed as

DST = H("H2C-OVERSIZE-DST-" || a_very_long_DST)

For expand_message_xof using extendable-output function H, DST is computed as

DST = H("H2C-OVERSIZE-DST-" || a_very_long_DST, ceil(2 * k / 8))

Here, a_very_long_DST is the DST whose length is greater than 255 bytes, "H2C-OVERSIZE-DST-" is a 17-byte ASCII string literal, and k is the target security level in bits.

5.3.4. Defining Other expand_message Variants

When defining a new expand_message variant, the most important consideration is that hash_to_field models expand_message as a random oracle. Thus, implementors SHOULD prove indifferentiability from a random oracle under an appropriate assumption about the underlying cryptographic primitives; see Section 10.5 for more information.

In addition, expand_message variants:

MUST give collision resistance commensurate with the security level of the target elliptic curve.
MUST be built on primitives designed for use in applications requiring cryptographic randomness. As examples, a secure stream cipher is an appropriate primitive, whereas a Mersenne twister pseudorandom number generator [MT98] is not.
MUST NOT use rejection sampling.
MUST give independent values for distinct (msg, DST, length) inputs. Meeting this requirement is subtle. As a simplified example, hashing msg || DST does not work, because in this case distinct (msg, DST) pairs whose concatenations are equal will return the same output (e.g., ("AB", "CDEF") and ("ABC", "DEF")). The variants defined in this document use a suffix-free encoding of DST to avoid this issue.
MUST use the domain separation tag DST to ensure that invocations of cryptographic primitives inside of expand_message are domain-separated from invocations outside of expand_message. For example, if the expand_message variant uses a hash function H, an encoding of DST MUST be added either as a prefix or a suffix of the input to each invocation of H. Adding DST as a suffix is the RECOMMENDED approach.
SHOULD read msg exactly once, for efficiency when msg is long.

In addition, each expand_message variant MUST specify a unique EXP_TAG that identifies that variant in a Suite ID. See Section 8.10 for more information.

5.1. Efficiency Considerations in Extension Fields​

5.2. hash_to_field Implementation​

5.3. expand_message​

5.3.1. expand_message_xmd​

5.3.2. expand_message_xof​

5.3.3. Using DSTs Longer than 255 Bytes​

5.3.4. Defining Other expand_message Variants​