Skip to main content

4. Data Formats

IMAP4rev2 uses textual commands and responses. Data in IMAP4rev2 can be in one of several forms: atom, number, string, parenthesized list, or NIL. Note that a particular data item may take more than one form; for example, a data item defined as using "astring" syntax may be either an atom or a string.

4.1 Atom

An atom consists of one or more non-special characters.

4.1.1 Sequence Set and UID Set

A set of messages can be referenced by a sequence set containing either message sequence numbers or unique identifiers. See Section 9 for details. A sequence set can contain ranges of sequence numbers (such as "5:50"), an enumeration of specific sequence numbers, or a combination of the above. A sequence set can use the special symbol "*" to represent the maximum sequence number in the mailbox. A sequence set never contains unique identifiers.

A "UID set" is similar to the sequence set, but uses unique identifiers instead of message sequence numbers, and is not permitted to contain the special symbol "*".

4.2 Number

A number consists of one or more digit characters and represents a numeric value.

4.3 String

A string is in one of three forms: synchronizing literal, non-synchronizing literal, or quoted string. The synchronizing literal form is the general form of a string, without limitation on the characters the string may include. The non-synchronizing literal form is also the general form, but it has a length restriction. The quoted string form is an alternative that avoids the overhead of processing a literal, but has limitations on the characters that may be used.

When the distinction between synchronizing and non-synchronizing literals is not important, this document only uses the term "literal".

A synchronizing literal is a sequence of zero or more octets (including CR and LF), prefix-quoted with an octet count in the form of an open brace ("\\{"), the number of octets, a close brace ("\\}"), and a CRLF. In the case of synchronizing literals transmitted from server to client, the CRLF is immediately followed by the octet data. In the case of synchronizing literals transmitted from client to server, the client MUST wait to receive a command continuation request before sending the octet data.

The non-synchronizing literal is an alternative form of synchronizing literal and may be used from client to server anywhere a synchronizing literal is permitted. The non-synchronizing literal form MUST NOT be sent from server to client. The non-synchronizing literal is distinguished from the synchronizing literal by having a plus ("+") between the octet count and the closing brace ("\\}"). Unless otherwise specified in an IMAP extension, non-synchronizing literals MUST NOT be larger than 4096 octets.

A quoted string is a sequence of zero or more Unicode characters, excluding CR and LF, encoded in UTF-8, with double quote (") characters at each end.

The empty string is represented as "" (a quoted string with zero characters), as \\{0\\} followed by a CRLF, or as \\{0+\\} followed by a CRLF.

4.3.1 8-Bit and Binary Strings

8-bit textual and binary mail is supported through the use of a MIME-IMB content transfer encoding. IMAP4rev2 implementations MAY transmit 8-bit or multi-octet characters in literals but SHOULD do so only when the CHARSET is identified.

IMAP4rev2 is compatible with I18N-HDRS. As a result, the identified charset for header-field values with 8-bit content is UTF-8. IMAP4rev2 implementations MUST accept and MAY transmit UTF-8 text in quoted-strings as long as the string does not contain NUL, CR, or LF.

Although a BINARY content transfer encoding is defined, unencoded binary strings are not permitted, unless returned in a <literal8> in response to a BINARY.PEEK or BINARY FETCH data item. A "binary string" is any string with NUL characters.

4.4 Parenthesized List

Data structures are represented as a "parenthesized list"; a sequence of data items, delimited by space, and bounded at each end by parentheses. A parenthesized list can contain other parenthesized lists, using multiple levels of parentheses to indicate nesting.

The empty list is represented as () -- a parenthesized list with no members.

4.5 NIL

The special form "NIL" represents the non-existence of a particular data item that is represented as a string or parenthesized list, as distinct from the empty string "" or the empty parenthesized list ().

Note: NIL is never used for any data item that takes the form of an atom. For example, a mailbox name of "NIL" is a mailbox named NIL as opposed to a non-existent mailbox name.

Examples

The following LIST response:

* LIST () "/" NIL

is equivalent to:

* LIST () "/" "NIL"

However, the following response:

* FETCH 1 (BODY[1] NIL)

is not equivalent to:

* FETCH 1 (BODY[1] "NIL")

The former indicates absence of the body part, while the latter means that it contains a string with the three characters "NIL".