2. The Style of Data Structure Specification
CDDL focuses on styles of specification that are in use in the community employing the data model as pioneered by JSON and now refined in CBOR.
There are a number of more or less atomic elements of a CBOR data model, such as numbers, simple values (false, true, nil), text strings, and byte strings; CDDL does not focus on specifying their structure. CDDL of course also allows adding a CBOR tag to a data item.
Beyond those atomic elements, further components of a data structure definition language are the datatypes used for composition: arrays and maps in CBOR (called "arrays" and "objects" in JSON). While these are only two representation formats, they are used to specify four loosely distinguishable styles of composition:
-
A vector: an array of elements that are mostly of the same semantics. The set of signatures associated with a signed data item is a typical application of a vector.
-
A record: an array the elements of which have different, positionally defined semantics, as detailed in the data structure definition. A 2D point, specified as an array of an x coordinate (which comes first) and a y coordinate (coming second), is an example of a record, as is the pair of exponent (first) and mantissa (second) in a CBOR decimal fraction.
-
A table: a map from a domain of map keys to a domain of map values, that are mostly of the same semantics. A set of language tags, each mapped to a text string translated to that specific language, is an example of a table. The key domain is usually not limited to a specific set by the specification but is open for the application, e.g., in a table mapping IP addresses to Media Access Control (MAC) addresses, the specification does not attempt to foresee all possible IP addresses. In a language such as JavaScript, a "Map" (as opposed to a plain "Object") would often be employed to achieve the generality of the key domain.
-
A struct: a map from a domain of map keys as defined by the specification to a domain of map values the semantics of each of which is bound to a specific map key. This is what many people have in mind when they think about JSON objects; CBOR adds the ability to use map keys that are not just text strings. Structs can be used to solve problems similar to those records are used for; the use of explicit map keys facilitates optionality and extensibility.
Two important concepts provide the foundation for CDDL:
-
Instead of defining all four types of composition in CDDL separately, or even defining one kind for arrays (vectors and records) and one kind for maps (tables and structs), there is only one kind of composition in CDDL: the group (Section 2.1).
-
The other important concept is that of a type. The entire CDDL specification defines a type (the one defined by its first rule), which formally is the set of CBOR data items that are acceptable as "instances" for this specification. CDDL predefines a number of basic types such as "uint" (unsigned integer) or "tstr" (text string), often making use of a simple formal notation for CBOR data items. Each value that can be expressed as a CBOR data item is also a type in its own right, e.g., "1". A type can be built as a choice of other types, e.g., an "int" is either a "uint" or a "nint" (negative integer). Finally, a type can be built as an array or a map from a group.
The rest of this section introduces a number of basic concepts of CDDL, and Section 3 defines additional syntax. Appendix C gives a concise summary of the semantics of CDDL.
2.1. Groups and Composition in CDDL
CDDL groups are lists of group entries, each of which can be a name/value pair or a more complex group expression (which then in turn stands for a sequence of name/value pairs). A CDDL group is a production in a grammar that matches certain sequences of name/value pairs but not others. The grammar is based on the concepts of Parsing Expression Grammars (PEGs) (see Appendix A).
In an array context, only the value of the name/value pair is represented; the name is annotation only (and can be left off from the group specification if not needed). In a map context, the names become the map keys ("member keys").
In an array context, the actual sequence of elements in the group is important, as that sequence is the information that allows associating actual array elements with entries in the group. In a map context, the sequence of entries in a group is not relevant (but there is still a need to write down group entries in a sequence).
An array matches a specification given as a group when the group matches a sequence of name/value pairs the value parts of which exactly match the elements of the array in order.
A map matches a specification given as a group when the group matches a sequence of name/value pairs such that all of these name/value pairs are present in the map and the map has no name/value pair that is not covered by the group.
A simple example of using a group directly in a map definition is:
person = {
age: int,
name: tstr,
employer: tstr,
}
Figure 1: Using a Group Directly in a Map
The three entries of the group are written between the curly braces that create the map: here, "age", "name", and "employer" are the names that turn into the map key text strings, and "int" and "tstr" (text string) are the types of the map values under these keys.
A group by itself (without creating a map around it) can be placed in (round) parentheses and given a name by using it in a rule:
pii = (
age: int,
name: tstr,
employer: tstr,
)
Figure 2: A Basic Group
This separate, named group definition allows us to rephrase Figure 1 as:
person = {
pii
}
Figure 3: Using a Group by Name
Note that the (curly) braces signify the creation of a map; the groups themselves are neutral as to whether they will be used in a map or an array.
As shown in Figure 1, the parentheses for groups are optional when there is some other set of brackets present. Note that they can still be used, leading to this not-so-realistic, but perfectly valid, example:
person = {(
age: int,
name: tstr,
employer: tstr,
)}
Figure 4: Using a Parenthesized Group in a Map
Groups can be used to factor out common parts of structs, e.g., instead of writing specifications in copy/paste style, such as in Figure 5, one can factor out the common subgroup, choose a name for it, and write only the specific parts into the individual maps (Figure 6).
person = {
age: int,
name: tstr,
employer: tstr,
}
dog = {
age: int,
name: tstr,
leash-length: float,
}
Figure 5: Maps with Copy/Paste
person = {
identity,
employer: tstr,
}
dog = {
identity,
leash-length: float,
}
identity = (
age: int,
name: tstr,
)
Figure 6: Using a Group for Factorization
Note that the lists inside the braces in the above definitions constitute (anonymous) groups, while "identity" is a named group, which can then be included as part of other groups (anonymous as in the example, or themselves named).
2.1.1. Usage
Groups are the instrument used in composing data structures with CDDL. It is a matter of style in defining those structures whether to define groups (anonymously) right in their contexts or whether to define them in a separate rule and to reference them with their respective name (possibly more than once).
With this, one is allowed to define all small parts of their data structures and compose bigger protocol data units with those or to have only one big protocol data unit that has all definitions ad hoc where needed.
2.1.2. Syntax
The composition syntax is intended to be concise and easy to read:
-
The start and end of a group can be marked by "(" and ")".
-
Definitions of entries inside of a group are noted as follows: keytype => valuetype, (read "keytype maps to valuetype"). The comma is actually optional (not just in the final entry), but it is considered good style to set it. The double arrow can be replaced by a colon in the common case of directly using a text string or integer literal as a key; see Section 3.5.1. This is also the common way of naming elements of an array just for documentation; see Section 3.4.
A basic entry consists of a keytype and a valuetype, both of which are types (Section 2.2); this entry matches any name/value pair the name of which is in the keytype and the value of which is in the valuetype.
A group defined as a sequence of group entries matches any sequence of name/value pairs that is composed by concatenation in order of what the entries match.
A group definition can also contain choices between groups; see Section 2.2.2.
2.2. Types
2.2.1. Values
Values such as numbers and strings can be used in place of a type. (For instance, this is a very common thing to do for a key type, common enough that CDDL provides additional convenience syntax for this.)
The value notation is based on the C language, but does not offer all the syntactic variations (see Appendix B for details). The value notation for numbers inherits from C the distinction between integer values (no fractional part or exponent given -- NR1 [ISO6093]; "NR" stands for "numerical representation") and floating-point values (where a fractional part, an exponent, or both are present -- NR2 or NR3), so the type "1" does not include any floating-point numbers while the types "1e3" and "1.5" are both floating-point numbers and do not include any integer numbers.
2.2.2. Choices
Many places that allow a type also allow a choice between types, delimited by a "/" (slash). The entire choice construct can be put into parentheses if this is required to make the construction unambiguous (please see Appendix B for details of the CDDL grammar).
Choices of values can be used to express enumerations:
attire = "bow tie" / "necktie" / "Internet attire"
protocol = 6 / 17
Analogous to types, CDDL also allows choices between groups, delimited by a "//" (double slash). Note that the "//" operator binds much more weakly than the other CDDL operators, so each line within "delivery" in the following example is its own alternative in the group choice:
address = { delivery }
delivery = (
street: tstr, ? number: uint, city //
po-box: uint, city //
per-pickup: true
)
city = (
name: tstr, zip-code: uint
)
A group choice matches the union of the sets of name/value pair sequences that the alternatives in the choice can.
For both type choices and group choices, additional alternatives can be added to a rule later in separate rules by using "/=" and "//=", respectively, instead of "=":
attire /= "swimwear"
delivery //= (
lat: float, long: float, drone-type: tstr
)
It is not an error if a name is first used with a "/=" or "//=" (there is no need to "create it" with "=").
2.2.2.1. Ranges
Instead of naming all the values that make up a choice, CDDL allows building a range out of two values that are in an ordering relationship: a lower bound (first value) and an upper bound (second value). A range can be inclusive of both bounds given (denoted by joining two values by ".."), or it can include the lower bound and exclude the upper bound (denoted by instead using "..."). If the lower bound exceeds the upper bound, the resulting type is the empty set (this behavior can be desirable when generics (Section 3.10) are being used).
device-address = byte
max-byte = 255
byte = 0..max-byte ; inclusive range
first-non-byte = 256
byte1 = 0...first-non-byte ; byte1 is equivalent to byte
CDDL currently only allows ranges between integers (matching integer values) or between floating-point values (matching floating-point values). If both are needed in a type, a type choice between the two kinds of ranges can be (clumsily) used:
int-range = 0..10 ; only integers match
float-range = 0.0..10.0 ; only floats match
BAD-range1 = 0..10.0 ; NOT DEFINED
BAD-range2 = 0.0..10 ; NOT DEFINED
numeric-range = int-range / float-range
(See also the control operators .lt/.ge and .le/.gt in Section 3.8.6.)
Note that the dot is a valid name continuation character in CDDL, so
min..max
is not a range expression but a single name. When using a name as the left-hand side of a range operator, use spacing as in
min .. max
to separate off the range operator.
2.2.2.2. Turning a Group into a Choice
Some choices are built out of large numbers of values, often integers, each of which is best given a semantic name in the specification. Instead of naming each of these integers and then accumulating them into a choice, CDDL allows building a choice from a group by prefixing it with an "&" character:
terminal-color = &basecolors
basecolors = (
black: 0, red: 1, green: 2, yellow: 3,
blue: 4, magenta: 5, cyan: 6, white: 7,
)
extended-color = &(
basecolors,
orange: 8, pink: 9, purple: 10, brown: 11,
)
As with the use of groups in arrays (Section 3.4), the member names have only documentary value (in particular, they might be used by a tool when displaying integers that are taken from that choice).
2.2.3. Representation Types
CDDL allows the specification of a data item type by referring to the CBOR representation (specifically, to major types and additional information; see Section 2 of [RFC7049]). How this is used should be evident from the prelude (Appendix D): a hash mark ("#") optionally followed by a number from 0 to 7 identifying the major type, which then can be followed by a dot and a number specifying the additional information. This construction specifies the set of values that can be serialized in CBOR (i.e., "any"), by the given major type if one is given, or by the given major type with the additional information if both are given. Where a major type of 6 (Tag) is used, the type of the tagged item can be specified by appending it in parentheses.
Note that although this notation is based on the CBOR serialization, it is about a set of values at the data model level, e.g., "#7.25" specifies the set of values that can be represented as half-precision floats; it does not mandate that these values also do have to be serialized as half-precision floats: CDDL does not provide any language means to restrict the choice of serialization variants. This also enables the use of CDDL with JSON, which uses a fundamentally different way of serializing (some of) the same values.
It may be necessary to make use of representation types outside the prelude, e.g., a specification could start by making use of an existing tag in a more specific way or could define a new tag not defined in the prelude:
my_breakfast = #6.55799(breakfast) ; cbor-any is too general!
breakfast = cereal / porridge
cereal = #6.998(tstr)
porridge = #6.999([liquid, solid])
liquid = milk / water
milk = 0
water = 1
solid = tstr
2.2.4. Root Type
There is no special syntax to identify the root of a CDDL data structure definition: that role is simply taken by the first rule defined in the file.
This is motivated by the usual top-down approach for defining data structures, decomposing a big data structure unit into smaller parts; however, except for the root type, there is no need to strictly follow this sequence.
(Note that there is no way to use a group as a root -- it must be a type.)