Skip to main content

2. CBOR Data Models

CBOR is explicit about its generic data model, which defines the set of all data items that can be represented in CBOR. Its basic generic data model is extensible by the registration of "simple values" and tags. Applications can then create a subset of the resulting extended generic data model to build their specific data models.

Within environments that can represent the data items in the generic data model, generic CBOR encoders and decoders can be implemented (which usually involves defining additional implementation data types for those data items that do not already have a natural representation in the environment). The ability to provide generic encoders and decoders is an explicit design goal of CBOR; however, many applications will provide their own application-specific encoders and/or decoders.

In the basic (unextended) generic data model defined in Section 3, a data item is one of the following:

  • an integer in the range -2^(64)..2^(64)-1 inclusive

  • a simple value, identified by a number between 0 and 255, but distinct from that number itself

  • a floating-point value, distinct from an integer, out of the set representable by IEEE 754 binary64 (including non-finites) [IEEE754]

  • a sequence of zero or more bytes ("byte string")

  • a sequence of zero or more Unicode code points ("text string")

  • a sequence of zero or more data items ("array")

  • a mapping (mathematical function) from zero or more data items ("keys") each to a data item ("values"), ("map")

  • a tagged data item ("tag"), comprising a tag number (an integer in the range 0..2^(64)-1) and the tag content (a data item)

Note that integer and floating-point values are distinct in this model, even if they have the same numeric value.

Also note that serialization variants are not visible at the generic data model level. This deliberate absence of visibility includes the number of bytes of the encoded floating-point value. It also includes the choice of encoding for an "argument" (see Section 3) such as the encoding for an integer, the encoding for the length of a text or byte string, the encoding for the number of elements in an array or pairs in a map, or the encoding for a tag number.

2.1. Extended Generic Data Models

This basic generic data model has been extended in this document by the registration of a number of simple values and tag numbers, such as:

  • "false", "true", "null", and "undefined" (simple values identified by 20..23, Section 3.3)

  • integer and floating-point values with a larger range and precision than the above (tag numbers 2 to 5, Section 3.4)

  • application data types such as a point in time or date/time string defined in RFC 3339 (tag numbers 1 and 0, Section 3.4)

Additional elements of the extended generic data model can be (and have been) defined via the IANA registries created for CBOR. Even if such an extension is unknown to a generic encoder or decoder, data items using that extension can be passed to or from the application by representing them at the application interface within the basic generic data model, i.e., as generic simple values or generic tags.

In other words, the basic generic data model is stable as defined in this document, while the extended generic data model expands by the registration of new simple values or tag numbers, but never shrinks.

While there is a strong expectation that generic encoders and decoders can represent "false", "true", and "null" ("undefined" is intentionally omitted) in the form appropriate for their programming environment, the implementation of the data model extensions created by tags is truly optional and a matter of implementation quality.

2.2. Specific Data Models

The specific data model for a CBOR-based protocol usually takes a subset of the extended generic data model and assigns application semantics to the data items within this subset and its components. When documenting such specific data models and specifying the types of data items, it is preferable to identify the types by their generic data model names ("negative integer", "array") instead of referring to aspects of their CBOR representation ("major type 1", "major type 4").

Specific data models can also specify value equivalency (including values of different types) for the purposes of map keys and encoder freedom. For example, in the generic data model, a valid map MAY have both "0" and "0.0" as keys, and an encoder MUST NOT encode "0.0" as an integer (major type 0, Section 3.1). However, if a specific data model declares that floating-point and integer representations of integral values are equivalent, using both map keys "0" and "0.0" in a single map would be considered duplicates, even while encoded as different major types, and so invalid; and an encoder could encode integral-valued floats as integers or vice versa, perhaps to save encoded bytes.