3. Syntax
3.1. Introduction
The syntax as given in this section defines the legal syntax of Internet messages. Messages that are conformant to this specification MUST conform to the syntax in this section. If there are options in this section where one option SHOULD be generated, that is indicated either in the prose or in a comment next to the syntax.
For the defined expressions, a short description of the syntax and use is given, followed by the ABNF, followed by a semantic analysis. Primitive tokens that are used but otherwise not specified in this document are taken from the "Core Rules" of Appendix B.1 of RFC 5234: CR, LF, CRLF, HTAB, SP, WSP, DQUOTE, DIGIT, ALPHA, and VCHAR.
In some of the definitions, there will be nonterminals whose names start with "obs-". These "obs-" elements refer to tokens defined in the obsolete syntax in Section 4. In all cases, these productions should be ignored for the purposes of generating legal Internet messages and MUST NOT be used as part of such messages. However, when interpreting messages, these tokens MUST be honored as part of the legal syntax. In this sense, Section 3 defines a grammar for generation of messages, with "obs-" elements that are to be ignored, while Section 4 adds to that grammar to specify a grammar for interpretation of messages.
Core Concepts:
| Concept | Description |
|---|---|
| Message Generation | Section 3 syntax, ignore obs- elements |
| Message Parsing | Section 3 + Section 4, include obs- elements |
| obs- elements | Obsolete syntax, MUST NOT generate but MUST parse |
3.2. Lexical Tokens
The following rules are used to define an underlying lexical analyzer that feeds tokens to the higher-level parsers. This section defines the tokens used in structured header field bodies.
Note: Readers of this specification need to pay special attention to how these lexical tokens are used in both the lower-level and higher-level syntax later in the document. In particular, white space and comment tokens defined in Section 3.2.2 are used in the definitions of the lower-level tokens defined here, and those lower-level tokens are in turn used as parts of the higher-level tokens defined later. Therefore, even though white space and comments may not explicitly appear in a particular definition, they may be allowed between tokens in a higher-level construct.
3.2.1. Quoted characters
Some characters are reserved for special interpretation, such as delimiting lexical tokens. To permit use of these characters as uninterpreted data, a quoting mechanism is provided.
quoted-pair = ("\" (VCHAR / WSP)) / obs-qp
Where any quoted-pair appears, it is to be interpreted as the single character obtained by removing the backslash. That is, the "" character that appears as part of a quoted-pair is semantically "invisible".
Note: The "" character may appear in a message where it is not part of a quoted-pair. A "" character that does not appear in a quoted-pair is not semantically invisible. The only places in this specification where quoted-pair currently appears are in ccontent, qcontent, and in obs-dtext in Section 4.
Examples:
Quoted backslash: "\\" → Interpreted as: \
Quoted quote: "\"" → Interpreted as: "
Quoted space: "\ " → Interpreted as: (space)
3.2.2. Folding White Space and Comments
White space characters, including those used in folding (described in Section 2.2.3), may appear between many elements in header field bodies. Also, strings of characters that are treated as comments may be included in structured field bodies as characters enclosed in parentheses. The following define the folding white space (FWS) and comment constructs.
Strings of characters enclosed in parentheses are considered comments so long as they do not appear within a "quoted-string", as defined in Section 3.2.4. Comments may nest.
There are several places in this specification where comments and FWS may be freely inserted. To accommodate that syntax, an additional "CFWS" token is defined for places where comments and/or FWS may appear. However, where CFWS appears in this specification, it MUST NOT be inserted in such a way that any line of a folded header field is made up entirely of WSP characters and nothing else.
FWS = ([*WSP CRLF] 1*WSP) / obs-FWS
; Folding white space
ctext = %d33-39 / ; Printable US-ASCII
%d42-91 / ; characters not including
%d93-126 / ; "(", ")", or "\"
obs-ctext
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"
CFWS = (1*([FWS] comment) [FWS]) / FWS
Throughout this specification, where FWS (the folding white space token) appears, it indicates a place where folding, as discussed in Section 2.2.3, may take place. Wherever folding appears in a message (that is, a header field body containing a CRLF followed by any WSP), unfolding (removal of the CRLF) is performed before any further semantic analysis is performed on that header field. That is, any CRLF that appears in FWS is semantically "invisible".
Comment Example:
From: Pete (A nice \) chap) ``<[email protected]>``
↑ ↑
└── Comment start └── Escaped parenthesis
Parsed display name: Pete
Actual mailbox: [email protected]
3.2.3. Atom
Several productions in structured header field bodies are simply strings of certain basic characters. Such productions are called atoms.
Some structured header field bodies also allow the period character (".", ASCII value 46) within runs of atext. An additional "dot-atom" token is defined for those purposes.
atext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; characters not including
"$" / "%" / ; specials. Used for atoms.
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
atom = [CFWS] 1*atext [CFWS]
dot-atom-text = 1*atext *("." 1*atext)
dot-atom = [CFWS] dot-atom-text [CFWS]
specials = "(" / ")" / ; Special characters that do
"<" / ">" / ; not appear in atext
"[" / "]" /
":" / ";" /
"@" / "\" /
"," / "." /
DQUOTE
Both atom and dot-atom are interpreted as a single unit, comprised of the string of characters that make it up. Semantically, the optional comments and FWS surrounding the rest of the characters are not part of the atom; the atom is only the run of atext characters in an atom, or the atext and "." characters in a dot-atom.
Examples:
Atom examples:
- john
- example
- user_name
- info+tag
Dot-atom examples:
- john.doe
- first.middle.last
- [email protected] (for mailbox local part)
3.2.4. Quoted Strings
Strings of characters that include characters other than those allowed in atoms can be represented in a quoted string format, where the characters are surrounded by quote (DQUOTE, ASCII value 34) characters.
qtext = %d33 / ; Printable US-ASCII
%d35-91 / ; characters not including
%d93-126 / ; "\" or the quote character
obs-qtext
qcontent = qtext / quoted-pair
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
A quoted-string is treated as a unit. That is, quoted-string is identical to atom, semantically. Since a quoted-string may contain FWS, folding is permitted. Also note that since quoted-pair is allowed in a quoted-string, the quote and backslash characters may appear in a quoted-string so long as they appear as a quoted-pair.
Examples:
"Joe Q. Public" → Joe Q. Public
"First Last" → First Last
"Giant; \"Big\" Box" → Giant; "Big" Box
3.2.5. Miscellaneous Tokens
Three additional tokens are defined: word and phrase for combinations of atoms and/or quoted-strings, and unstructured for unstructured header fields and in some places within structured header fields.
word = atom / quoted-string
phrase = 1*word / obs-phrase
unstructured = (*([FWS] VCHAR) *WSP) / obs-unstruct
3.3. Date and Time Specification
Date and time values occur in several header fields. This section specifies the syntax for a full date and time specification. Though folding white space is permitted throughout the date-time specification, it is RECOMMENDED that a single space be used in each place that FWS appears (whether it is required or optional); some older implementations will not interpret longer sequences of folding white space correctly.
date-time = [ day-of-week "," ] date time [CFWS]
day-of-week = ([FWS] day-name) / obs-day-of-week
day-name = "Mon" / "Tue" / "Wed" / "Thu" /
"Fri" / "Sat" / "Sun"
date = day month year
day = ([FWS] 1*2DIGIT FWS) / obs-day
month = "Jan" / "Feb" / "Mar" / "Apr" /
"May" / "Jun" / "Jul" / "Aug" /
"Sep" / "Oct" / "Nov" / "Dec"
year = (FWS 4*DIGIT FWS) / obs-year
time = time-of-day zone
time-of-day = hour ":" minute [ ":" second ]
hour = 2DIGIT / obs-hour
minute = 2DIGIT / obs-minute
second = 2DIGIT / obs-second
zone = (FWS ( "+" / "-" ) 4DIGIT) / obs-zone
Date-Time Examples:
Full format:
Date: Fri, 21 Nov 1997 09:55:06 -0600
Date: Mon, 20 Dec 2025 10:00:00 +0800
Without day of week:
Date: 21 Nov 1997 09:55:06 -0600
UTC time:
Date: 21 Nov 1997 15:55:06 +0000
3.4. Address Specification
Addresses occur in several message header fields to indicate senders and recipients of messages. An address may either be an individual mailbox or a group of mailboxes.
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = [CFWS] "`<" addr-spec ">`" [CFWS] /
obs-angle-addr
group = display-name ":" [group-list] ";" [CFWS]
display-name = phrase
mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
address-list = (address *("," address)) / obs-addr-list
group-list = mailbox-list / CFWS / obs-group-list
Address Examples:
Simple form (address only):
[email protected]
Full form (with display name):
Alice Smith ``<[email protected]>``
"Joe Q. Public" ``<[email protected]>``
Multiple recipients:
To: [email protected], [email protected]
Group address:
To: Development Team: [email protected], [email protected];
3.4.1. Addr-Spec Specification
An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character ("@", ASCII value 64) followed by an Internet domain.
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
dtext = %d33-90 / ; Printable US-ASCII
%d94-126 / ; characters not including
obs-dtext ; "[", "]", or "\"
Address Examples:
Standard format:
[email protected]
[email protected]
With quoted local part:
"joe smith"@example.com
Domain literal (IP address):
user@[192.0.2.1]
Chapter 3 Summary
Key Syntax Elements
Message Structure Hierarchy:
Message
├── Lexical Tokens
│ ├── atom, quoted-string
│ ├── word, phrase
│ └── comment, FWS
├── Date-Time
│ └── Day, DD Mon YYYY HH:MM:SS +ZZZZ
└── Address
├── mailbox: name ``<user@domain>``
└── group: name: addr1, addr2;
Implementation Checklist
- Correctly handle folding white space (FWS)
- Support comments (nested parentheses)
- Parse quoted strings and quoted pairs
- Validate date-time format and validity
- Parse mailbox and group addresses
- Handle local-part and domain
- Support obsolete syntax (parse only, don't generate)
Next: 4. Obsolete Syntax
Previous: 2. Lexical Analysis of Messages