3. Syntax

3.1. Introduction

The syntax as given in this section defines the legal syntax of Internet messages. Messages that are conformant to this specification MUST conform to the syntax in this section. If there are options in this section where one option SHOULD be generated, that is indicated either in the prose or in a comment next to the syntax.

For the defined expressions, a short description of the syntax and use is given, followed by the ABNF, followed by a semantic analysis. Primitive tokens that are used but otherwise not specified in this document are taken from the "Core Rules" of Appendix B.1 of RFC 5234: CR, LF, CRLF, HTAB, SP, WSP, DQUOTE, DIGIT, ALPHA, and VCHAR.

In some of the definitions, there will be nonterminals whose names start with "obs-". These "obs-" elements refer to tokens defined in the obsolete syntax in Section 4. In all cases, these productions should be ignored for the purposes of generating legal Internet messages and MUST NOT be used as part of such messages. However, when interpreting messages, these tokens MUST be honored as part of the legal syntax. In this sense, Section 3 defines a grammar for generation of messages, with "obs-" elements that are to be ignored, while Section 4 adds to that grammar to specify a grammar for interpretation of messages.

Core Concepts:

Concept	Description
Message Generation	Section 3 syntax, ignore `obs-` elements
Message Parsing	Section 3 + Section 4, include `obs-` elements
obs- elements	Obsolete syntax, MUST NOT generate but MUST parse

3.2. Lexical Tokens

The following rules are used to define an underlying lexical analyzer that feeds tokens to the higher-level parsers. This section defines the tokens used in structured header field bodies.

Note: Readers of this specification need to pay special attention to how these lexical tokens are used in both the lower-level and higher-level syntax later in the document. In particular, white space and comment tokens defined in Section 3.2.2 are used in the definitions of the lower-level tokens defined here, and those lower-level tokens are in turn used as parts of the higher-level tokens defined later. Therefore, even though white space and comments may not explicitly appear in a particular definition, they may be allowed between tokens in a higher-level construct.

3.2.1. Quoted characters

Some characters are reserved for special interpretation, such as delimiting lexical tokens. To permit use of these characters as uninterpreted data, a quoting mechanism is provided.

quoted-pair     =   ("\" (VCHAR / WSP)) / obs-qp

Where any quoted-pair appears, it is to be interpreted as the single character obtained by removing the backslash. That is, the "" character that appears as part of a quoted-pair is semantically "invisible".

Note: The "" character may appear in a message where it is not part of a quoted-pair. A "" character that does not appear in a quoted-pair is not semantically invisible. The only places in this specification where quoted-pair currently appears are in ccontent, qcontent, and in obs-dtext in Section 4.

Examples:

Quoted backslash: "\\"  → Interpreted as: \
Quoted quote: "\""      → Interpreted as: "
Quoted space: "\ "      → Interpreted as: (space)

3.2.2. Folding White Space and Comments

White space characters, including those used in folding (described in Section 2.2.3), may appear between many elements in header field bodies. Also, strings of characters that are treated as comments may be included in structured field bodies as characters enclosed in parentheses. The following define the folding white space (FWS) and comment constructs.

Strings of characters enclosed in parentheses are considered comments so long as they do not appear within a "quoted-string", as defined in Section 3.2.4. Comments may nest.

There are several places in this specification where comments and FWS may be freely inserted. To accommodate that syntax, an additional "CFWS" token is defined for places where comments and/or FWS may appear. However, where CFWS appears in this specification, it MUST NOT be inserted in such a way that any line of a folded header field is made up entirely of WSP characters and nothing else.

FWS             =   ([*WSP CRLF] 1*WSP) /  obs-FWS
                                       ; Folding white space

ctext           =   %d33-39 /          ; Printable US-ASCII
                    %d42-91 /          ;  characters not including
                    %d93-126 /         ;  "(", ")", or "\"
                    obs-ctext

ccontent        =   ctext / quoted-pair / comment

comment         =   "(" *([FWS] ccontent) [FWS] ")"

CFWS            =   (1*([FWS] comment) [FWS]) / FWS

Throughout this specification, where FWS (the folding white space token) appears, it indicates a place where folding, as discussed in Section 2.2.3, may take place. Wherever folding appears in a message (that is, a header field body containing a CRLF followed by any WSP), unfolding (removal of the CRLF) is performed before any further semantic analysis is performed on that header field. That is, any CRLF that appears in FWS is semantically "invisible".

Comment Example:

From: Pete (A nice \) chap) ``&lt;[email protected]&gt;``
        ↑                  ↑
        └── Comment start  └── Escaped parenthesis

Parsed display name: Pete
Actual mailbox: [email protected]

3.2.3. Atom

Several productions in structured header field bodies are simply strings of certain basic characters. Such productions are called atoms.

Some structured header field bodies also allow the period character (".", ASCII value 46) within runs of atext. An additional "dot-atom" token is defined for those purposes.

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                    "!" / "#" /        ;  characters not including
                    "$" / "%" /        ;  specials.  Used for atoms.
                    "&" / "'" /
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~"

atom            =   [CFWS] 1*atext [CFWS]

dot-atom-text   =   1*atext *("." 1*atext)

dot-atom        =   [CFWS] dot-atom-text [CFWS]

specials        =   "(" / ")" /        ; Special characters that do
                    "&lt;" / ">" /        ;  not appear in atext
                    "[" / "]" /
                    ":" / ";" /
                    "@" / "\" /
                    "," / "." /
                    DQUOTE

Both atom and dot-atom are interpreted as a single unit, comprised of the string of characters that make it up. Semantically, the optional comments and FWS surrounding the rest of the characters are not part of the atom; the atom is only the run of atext characters in an atom, or the atext and "." characters in a dot-atom.

Examples:

Atom examples:
- john
- example
- user_name
- info+tag

Dot-atom examples:
- john.doe
- first.middle.last
- [email protected] (for mailbox local part)

3.2.4. Quoted Strings

Strings of characters that include characters other than those allowed in atoms can be represented in a quoted string format, where the characters are surrounded by quote (DQUOTE, ASCII value 34) characters.

qtext           =   %d33 /             ; Printable US-ASCII
                    %d35-91 /          ;  characters not including
                    %d93-126 /         ;  "\" or the quote character
                    obs-qtext

qcontent        =   qtext / quoted-pair

quoted-string   =   [CFWS]
                    DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                    [CFWS]

A quoted-string is treated as a unit. That is, quoted-string is identical to atom, semantically. Since a quoted-string may contain FWS, folding is permitted. Also note that since quoted-pair is allowed in a quoted-string, the quote and backslash characters may appear in a quoted-string so long as they appear as a quoted-pair.

Examples:

"Joe Q. Public"           → Joe Q. Public
"First Last"              → First Last
"Giant; \"Big\" Box"      → Giant; "Big" Box

3.2.5. Miscellaneous Tokens

Three additional tokens are defined: word and phrase for combinations of atoms and/or quoted-strings, and unstructured for unstructured header fields and in some places within structured header fields.

word            =   atom / quoted-string

phrase          =   1*word / obs-phrase

unstructured    =   (*([FWS] VCHAR) *WSP) / obs-unstruct

3.3. Date and Time Specification

Date and time values occur in several header fields. This section specifies the syntax for a full date and time specification. Though folding white space is permitted throughout the date-time specification, it is RECOMMENDED that a single space be used in each place that FWS appears (whether it is required or optional); some older implementations will not interpret longer sequences of folding white space correctly.

date-time       =   [ day-of-week "," ] date time [CFWS]

day-of-week     =   ([FWS] day-name) / obs-day-of-week

day-name        =   "Mon" / "Tue" / "Wed" / "Thu" /
                    "Fri" / "Sat" / "Sun"

date            =   day month year

day             =   ([FWS] 1*2DIGIT FWS) / obs-day

month           =   "Jan" / "Feb" / "Mar" / "Apr" /
                    "May" / "Jun" / "Jul" / "Aug" /
                    "Sep" / "Oct" / "Nov" / "Dec"

year            =   (FWS 4*DIGIT FWS) / obs-year

time            =   time-of-day zone

time-of-day     =   hour ":" minute [ ":" second ]

hour            =   2DIGIT / obs-hour

minute          =   2DIGIT / obs-minute

second          =   2DIGIT / obs-second

zone            =   (FWS ( "+" / "-" ) 4DIGIT) / obs-zone

Date-Time Examples:

Full format:
Date: Fri, 21 Nov 1997 09:55:06 -0600
Date: Mon, 20 Dec 2025 10:00:00 +0800

Without day of week:
Date: 21 Nov 1997 09:55:06 -0600

UTC time:
Date: 21 Nov 1997 15:55:06 +0000

3.4. Address Specification

Addresses occur in several message header fields to indicate senders and recipients of messages. An address may either be an individual mailbox or a group of mailboxes.

address         =   mailbox / group

mailbox         =   name-addr / addr-spec

name-addr       =   [display-name] angle-addr

angle-addr      =   [CFWS] "`&lt;" addr-spec ">`" [CFWS] /
                    obs-angle-addr

group           =   display-name ":" [group-list] ";" [CFWS]

display-name    =   phrase

mailbox-list    =   (mailbox *("," mailbox)) / obs-mbox-list

address-list    =   (address *("," address)) / obs-addr-list

group-list      =   mailbox-list / CFWS / obs-group-list

Address Examples:

Simple form (address only):
[email protected]

Full form (with display name):
Alice Smith ``&lt;[email protected]&gt;``
"Joe Q. Public" ``&lt;[email protected]&gt;``

Multiple recipients:
To: [email protected], [email protected]

Group address:
To: Development Team: [email protected], [email protected];

3.4.1. Addr-Spec Specification

An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain.

addr-spec       =   local-part "@" domain

local-part      =   dot-atom / quoted-string / obs-local-part

domain          =   dot-atom / domain-literal / obs-domain

domain-literal  =   [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]

dtext           =   %d33-90 /          ; Printable US-ASCII
                    %d94-126 /         ;  characters not including
                    obs-dtext          ;  "[", "]", or "\"

Address Examples:

Standard format:
[email protected]
[email protected]

With quoted local part:
"joe smith"@example.com

Domain literal (IP address):
user@[192.0.2.1]

Chapter 3 Summary

Key Syntax Elements

Message Structure Hierarchy:

Message
├── Lexical Tokens
│   ├── atom, quoted-string
│   ├── word, phrase
│   └── comment, FWS
├── Date-Time
│   └── Day, DD Mon YYYY HH:MM:SS +ZZZZ
└── Address
    ├── mailbox: name ``&lt;user@domain&gt;``
    └── group: name: addr1, addr2;

Implementation Checklist

Correctly handle folding white space (FWS)
Support comments (nested parentheses)
Parse quoted strings and quoted pairs
Validate date-time format and validity
Parse mailbox and group addresses
Handle local-part and domain
Support obsolete syntax (parse only, don't generate)

Next: 4. Obsolete Syntax

Previous: 2. Lexical Analysis of Messages

3.1. Introduction​

3.2. Lexical Tokens​

3.2.1. Quoted characters​

3.2.2. Folding White Space and Comments​

3.2.3. Atom​

3.2.4. Quoted Strings​

3.2.5. Miscellaneous Tokens​

3.3. Date and Time Specification​

3.4. Address Specification​

3.4.1. Addr-Spec Specification​

Chapter 3 Summary​

Key Syntax Elements​

Implementation Checklist​

3.1. Introduction

3.2. Lexical Tokens

3.2.1. Quoted characters

3.2.2. Folding White Space and Comments

3.2.3. Atom

3.2.4. Quoted Strings

3.2.5. Miscellaneous Tokens

3.3. Date and Time Specification

3.4. Address Specification

3.4.1. Addr-Spec Specification

Chapter 3 Summary

Key Syntax Elements

Implementation Checklist