Skip to main content

2. Definitions, Conventions, and Generic BNF Grammar

Although MIME mechanisms are specified in prose throughout this document set, many are also described formally using the augmented BNF notation of RFC 822. Implementers need to be familiar with this notation in order to understand this document set, and are referred to RFC 822 for a complete explanation of the augmented BNF notation.

Some of the augmented BNF in this document set makes named reference to syntax rules defined in RFC 822. A complete formal grammar is therefore obtained by combining the collected grammar appendices from each document in this set with that from RFC 822 plus the modifications to RFC 822 defined in RFC 1123 (which specifically changes the syntax for return, date, and mailbox).

All numeric and octet values in this document set are given in decimal notation. All media type values, subtype values, and parameter names as defined are case-insensitive. However, parameter values are case-sensitive unless otherwise specified for the specific parameter.

FORMAT NOTE: Notes like this one provide additional nonessential information which readers may skip without missing anything essential. The primary purpose of these non-essential notes is to convey information about the rationale of this document set, or to place these documents in the proper historical or evolutionary context. In particular, such information may be skipped by those who are focused solely on building a compliant implementation, but may be of use to those who wish to understand why certain design choices were made.

2.1. CRLF

The term CRLF, in this document set, refers to the sequence of octets corresponding to the two US-ASCII characters CR (decimal value 13) and LF (decimal value 10) which, taken together, in this order, denote a line break in RFC 822 mail.

2.2. Character Set

The term "character set" is used in MIME to refer to a method of converting a sequence of octets into a sequence of characters. Note that unconditional and unambiguous conversion in the other direction is not required, in that not all characters may be representable by a given character set and a character set may provide more than one sequence of octets to represent a particular sequence of characters.

This definition is intended to allow various kinds of character encodings, from simple single-table mappings such as US-ASCII to complex table switching methods such as those that use ISO 2022's techniques. However, the definition associated with a MIME character set name MUST fully specify the mapping to be performed. In particular, use of external profiling information to determine the exact mapping is not permitted.

NOTE: The term "character set" was originally used to describe such things as US-ASCII and ISO-8859-1 which consist of a small set of characters and a simple one-to-one mapping from single octets to single characters. Multi-octet coded character sets and switching techniques make the situation much more complicated. For example, some communities use the term "character encoding" for what MIME calls a "character set", while using the phrase "coded character set" to denote an abstract mapping from integers (not octets) to characters.

2.3. Message

The term "message", when not further qualified, means either the (complete or "top-level") message being transferred on a network, or a message encapsulated in a body of type "message/rfc822" or "message/partial".

2.4. Entity

The term "entity", refers specifically to the MIME-defined header fields and contents of either a message or one of the parts in a multipart body. The specification of such entities is the essence of MIME. Since the contents of an entity are often called the "body", it makes sense to speak of the body of an entity. Any sort of field may be present in the header of an entity, but only those fields whose names begin with "content-" actually have any MIME-related meaning. Note that this does not mean they have no meaning at all -- an entity that is also a message has non-MIME header fields whose meaning is defined by RFC 822.

2.5. Body Part

The term "body part" refers to an entity inside of a multipart entity.

2.6. Body

The term "body", when not further qualified, means the body of an entity, i.e. the body of either a message or of a body part.

NOTE: The previous four definitions are clearly circular. This is unavoidable, as the overall structure of a MIME message is indeed recursive.

2.7. 7bit Data

"7bit data" refers to data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences [RFC-821]. No octets with decimal values greater than 127 are allowed and neither are NULs (octets with decimal value 0). CR (decimal value 13) and LF (decimal value 10) octets only occur as part of CRLF line separation sequences.

2.8. 8bit Data

"8bit data" refers to data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences [RFC-821], but octets with decimal values greater than 127 may be used. As with "7bit data" CR and LF octets only occur as part of CRLF line separation sequences and no NULs are allowed.

2.9. Binary Data

"Binary data" refers to data where any sequence of octets whatsoever is allowed.

2.10. Lines

"Lines" are defined as sequences of octets separated by a CRLF sequences. This is consistent with both RFC 821 and RFC 822. "Lines" only refers to a unit of data in a message, which may or may not correspond to something that is actually displayed by a user agent.


Terminology Summary:

TermDescription
Character SetMethod of converting octet sequences to character sequences
MessageRFC 822 message or encapsulated message
EntityMIME header fields and content
Body PartEntity within a multipart entity
BodyContent of an entity
7bit DataUS-ASCII only, no high-bit octets, short lines
8bit DataHigh-bit octets allowed, short lines
Binary DataAny octet sequence allowed
LinesCRLF-separated octet sequences

Key Concepts:

  • MIME structure is recursive (entities can contain entities)
  • Parameter names are case-insensitive, but values are usually case-sensitive
  • CRLF is the only form of line separator
  • Data is classified into 7bit, 8bit, and binary types