Skip to main content

3. Encoding Byte Strings to Elliptic Curves

This section presents a general framework and interface for encoding byte strings to points on an elliptic curve. The constructions in this section rely on three basic functions:

  • The function hash_to_field hashes arbitrary-length byte strings to a list of one or more elements of a finite field F; its implementation is defined in Section 5.
hash_to_field(msg, count)

Input:
- msg, a byte string containing the message to hash.
- count, the number of elements of F to output.

Output:
- (u_0, ..., u_(count - 1)), a list of field elements.

Steps: defined in Section 5.
  • The function map_to_curve calculates a point on the elliptic curve E from an element of the finite field F over which E is defined. Section 6 describes mappings for a range of curve families.
map_to_curve(u)

Input: u, an element of field F.
Output: Q, a point on the elliptic curve E.
Steps: defined in Section 6.
  • The function clear_cofactor sends any point on the curve E to the subgroup G of E. Section 7 describes methods to perform this operation.
clear_cofactor(Q)

Input: Q, a point on the elliptic curve E.
Output: P, a point in G.
Steps: defined in Section 7.

The two encodings (Section 2.2.2) defined in this section have the same interface and are both random-oracle encodings (Section 2.2.3). Both are implemented as a composition of the three basic functions above. The difference between the two is that their outputs are sampled from different distributions:

  • encode_to_curve is a nonuniform encoding from byte strings to points in G. That is, the distribution of its output is not uniformly random in G: the set of possible outputs of encode_to_curve is only a fraction of the points in G, and some points in this set are more likely to be output than others. Section 10.4 gives a more precise definition of encode_to_curve's output distribution.
encode_to_curve(msg)

Input: msg, an arbitrary-length byte string.
Output: P, a point in G.

Steps:
1. u = hash_to_field(msg, 1)
2. Q = map_to_curve(u[0])
3. P = clear_cofactor(Q)
4. return P
  • hash_to_curve is a uniform encoding from byte strings to points in G. That is, the distribution of its output is statistically close to uniform in G.

This function is suitable for most applications requiring a random oracle returning points in G, when instantiated with any of the map_to_curve functions described in Section 6. See Section 10.1 for further discussion.

hash_to_curve(msg)

Input: msg, an arbitrary-length byte string.
Output: P, a point in G.

Steps:
1. u = hash_to_field(msg, 2)
2. Q0 = map_to_curve(u[0])
3. Q1 = map_to_curve(u[1])
4. R = Q0 + Q1 # Point addition
5. P = clear_cofactor(R)
6. return P

Each hash-to-curve suite in Section 8 instantiates one of these encoding functions for a specific elliptic curve.

3.1. Domain Separation Requirements

All uses of the encoding functions defined in this document MUST include domain separation (Section 2.2.5) to avoid interfering with other uses of similar functionality.

Applications that instantiate multiple, independent instances of either hash_to_curve or encode_to_curve MUST enforce domain separation between those instances. This requirement applies in both the case of multiple instances targeting the same curve and the case of multiple instances targeting different curves. (This is because the internal hash_to_field primitive (Section 5) requires domain separation to guarantee independent outputs.)

Domain separation is enforced with a domain separation tag (DST), which is a byte string constructed according to the following requirements:

  1. Tags MUST be supplied as the DST parameter to hash_to_field, as described in Section 5.

  2. Tags MUST have nonzero length. A minimum length of 16 bytes is RECOMMENDED to reduce the chance of collisions with other applications.

  3. Tags SHOULD begin with a fixed identification string that is unique to the application.

  4. Tags SHOULD include a version number.

  5. For applications that define multiple ciphersuites, each ciphersuite's tag MUST be different. For this purpose, it is RECOMMENDED to include a ciphersuite identifier in each tag.

  6. For applications that use multiple encodings, to either the same curve or different curves, each encoding MUST use a different tag. For this purpose, it is RECOMMENDED to include the encoding's Suite ID (Section 8) in the domain separation tag. For independent encodings based on the same suite, each tag SHOULD also include a distinct identifier, e.g., "ENC1" and "ENC2".

As an example, consider a fictional application named Quux that defines several different ciphersuites, each for a different curve. A reasonable choice of tag is "QUUX-V-CS-", where and are two-digit numbers indicating the version and ciphersuite, respectively, and is the Suite ID of the encoding used in ciphersuite .

As another example, consider a fictional application named Baz that requires two independent random oracles to the same curve. Reasonable choices of tags for these oracles are "BAZ-V-CS--ENC1" and "BAZ-V-CS--ENC2", respectively, where , , and are as described above.

The example tags given above are assumed to be ASCII-encoded byte strings without null termination, which is the RECOMMENDED format. Other encodings can be used, but in all cases the encoding as a sequence of bytes MUST be specified unambiguously.