3. Specification of the CBOR Encoding
A CBOR data item (Section 2) is encoded to or decoded from a byte string carrying a well-formed encoded data item as described in this section. The encoding is summarized in Table 7 in Appendix B, indexed by the initial byte. An encoder MUST produce only well-formed encoded data items. A decoder MUST NOT return a decoded data item when it encounters input that is not a well-formed encoded CBOR data item (this does not detract from the usefulness of diagnostic and recovery tools that might make available some information from a damaged encoded CBOR data item).
The initial byte of each encoded data item contains both information about the major type (the high-order 3 bits, described in Section 3.1) and additional information (the low-order 5 bits). With a few exceptions, the additional information's value describes how to load an unsigned integer "argument":
Less than 24: The argument's value is the value of the additional information.
24, 25, 26, or 27: The argument's value is held in the following 1, 2, 4, or 8 bytes, respectively, in network byte order. For major type 7 and additional information value 25, 26, 27, these bytes are not used as an integer argument, but as a floating-point value (see Section 3.3).
28, 29, 30: These values are reserved for future additions to the CBOR format. In the present version of CBOR, the encoded item is not well-formed.
31: No argument value is derived. If the major type is 0, 1, or 6, the encoded item is not well-formed. For major types 2 to 5, the item's length is indefinite, and for major type 7, the byte does not constitute a data item at all but terminates an indefinite-length item; all are described in Section 3.2.
The initial byte and any additional bytes consumed to construct the argument are collectively referred to as the head of the data item.
The meaning of this argument depends on the major type. For example, in major type 0, the argument is the value of the data item itself (and in major type 1, the value of the data item is computed from the argument); in major type 2 and 3, it gives the length of the string data in bytes that follow; and in major types 4 and 5, it is used to determine the number of data items enclosed.
If the encoded sequence of bytes ends before the end of a data item, that item is not well-formed. If the encoded sequence of bytes still has bytes remaining after the outermost encoded item is decoded, that encoding is not a single well-formed CBOR item. Depending on the application, the decoder may either treat the encoding as not well-formed or just identify the start of the remaining bytes to the application.
A CBOR decoder implementation can be based on a jump table with all 256 defined values for the initial byte (Table 7). A decoder in a constrained implementation can instead use the structure of the initial byte and following bytes for more compact code (see Appendix C for a rough impression of how this could look).
3.1. Major Types
The following lists the major types and the additional information and other bytes associated with the type.
Major type 0: An unsigned integer in the range 0..2^(64)-1 inclusive. The value of the encoded item is the argument itself. For example, the integer 10 is denoted as the one byte 0b000_01010 (major type 0, additional information 10). The integer 500 would be 0b000_11001 (major type 0, additional information 25) followed by the two bytes 0x01f4, which is 500 in decimal.
Major type 1: A negative integer in the range -2^(64)..-1 inclusive. The value of the item is -1 minus the argument. For example, the integer -500 would be 0b001_11001 (major type 1, additional information 25) followed by the two bytes 0x01f3, which is 499 in decimal.
Major type 2: A byte string. The number of bytes in the string is equal to the argument. For example, a byte string whose length is 5 would have an initial byte of 0b010_00101 (major type 2, additional information 5 for the length), followed by 5 bytes of binary content. A byte string whose length is 500 would have 3 initial bytes of 0b010_11001 (major type 2, additional information 25 to indicate a two-byte length) followed by the two bytes 0x01f4 for a length of 500, followed by 500 bytes of binary content.
Major type 3: A text string (Section 2) encoded as UTF-8 [RFC3629]. The number of bytes in the string is equal to the argument. A string containing an invalid UTF-8 sequence is well-formed but invalid (Section 1.2). This type is provided for systems that need to interpret or display human-readable text, and allows the differentiation between unstructured bytes and text that has a specified repertoire (that of Unicode) and encoding (UTF-8). In contrast to formats such as JSON, the Unicode characters in this type are never escaped. Thus, a newline character (U+000A) is always represented in a string as the byte 0x0a, and never as the bytes 0x5c6e (the characters "" and "n") nor as 0x5c7530303061 (the characters "", "u", "0", "0", "0", and "a").
Major type 4: An array of data items. In other formats, arrays are also called lists, sequences, or tuples (a "CBOR sequence" is something slightly different, though [RFC8742]). The argument is the number of data items in the array. Items in an array do not need to all be of the same type. For example, an array that contains 10 items of any type would have an initial byte of 0b100_01010 (major type 4, additional information 10 for the length) followed by the 10 remaining items.
Major type 5: A map of pairs of data items. Maps are also called tables, dictionaries, hashes, or objects (in JSON). A map is comprised of pairs of data items, each pair consisting of a key that is immediately followed by a value. The argument is the number of pairs of data items in the map. For example, a map that contains 9 pairs would have an initial byte of 0b101_01001 (major type 5, additional information 9 for the number of pairs) followed by the 18 remaining items. The first item is the first key, the second item is the first value, the third item is the second key, and so on. Because items in a map come in pairs, their total number is always even: a map that contains an odd number of items (no value data present after the last key data item) is not well-formed. A map that has duplicate keys may be well-formed, but it is not valid, and thus it causes indeterminate decoding; see also Section 5.6.
Major type 6: A tagged data item ("tag") whose tag number, an integer in the range 0..2^(64)-1 inclusive, is the argument and whose enclosed data item (tag content) is the single encoded data item that follows the head. See Section 3.4.
Major type 7: Floating-point numbers and simple values, as well as the "break" stop code. See Section 3.3.
These eight major types lead to a simple table showing which of the 256 possible values for the initial byte of a data item are used (Table 7).
In major types 6 and 7, many of the possible values are reserved for future specification. See Section 9 for more information on these values.
Table 1 summarizes the major types defined by CBOR, ignoring Section 3.2 for now. The number N in this table stands for the argument.
| Major Type | Meaning | Content |
|---|---|---|
| 0 | unsigned integer N | - |
| 1 | negative integer -1-N | - |
| 2 | byte string | N bytes |
| 3 | text string | N bytes (UTF-8 text) |
| 4 | array | N data items (elements) |
| 5 | map | 2N data items (key/value pairs) |
| 6 | tag of number N | 1 data item |
| 7 | simple/float | - |
Table 1: Overview over the Definite-Length Use of CBOR Major Types (N = Argument)