Skip to main content

4. Obsolete Syntax

Earlier versions of this specification allowed for different (and usually more permissive) syntax than is allowed in this version. Also, there have been syntactic elements used in messages that have never been explicitly documented. Though some of these syntactic forms MUST NOT be generated according to the grammar in Section 3, they MUST be accepted and parsed by a conformant receiver. This section documents many of these syntactic elements. Taking the grammar in Section 3 and adding the definitions presented in this section will result in the grammar to use for interpretation of messages.

Note: This section identifies syntactic forms that any implementation MUST reasonably interpret. However, there are certainly Internet messages that do not conform to even the additional syntax given in this section. The fact that a particular form does not appear in any section of this document is not justification for computer programs to crash or for malformed data to be irretrievably lost by any implementation. It is up to the implementation to deal robustly with messages.

Key Differences Between Obsolete and Current Syntax:

  1. Whitespace Usage: Within structured header field bodies (i.e., between the colon and the CRLF of any structured header field), white space characters (including folding white space) and comments may be freely inserted between any syntactic tokens. This allows for many complex forms that have proven difficult for some implementations to parse.

  2. Folding White Space Rules: The rule in Section 3.2.2 regarding lines composed entirely of white space in comments and folding white space does not apply. See the discussion of folding white space in Section 4.2 below.

  3. Special Characters: Certain characters were allowed that are no longer allowed. The NUL character (ASCII value 0) was once allowed but is no longer for compatibility reasons. Similarly, US-ASCII control characters other than CR, LF, SP, and HTAB (ASCII values 1 through 8, 11, 12, 14 through 31, and 127) were allowed to appear in header field bodies. CR and LF were allowed to appear in messages other than as CRLF; this usage is shown here as well.

Key Concepts

Purpose of Obsolete Syntax

Message Generation              Message Parsing
↓ ↓
Section 3 Syntax Section 3 + Section 4
(Strict Rules) (Permissive Rules)
↓ ↓
MUST NOT generate MUST accept obsolete forms
obsolete forms

Implementation Requirements

RoleRequirement for Obsolete Syntax
Message GeneratorsMUST NOT generate obsolete syntax
Message ParsersMUST accept and parse obsolete syntax
All ImplementationsMUST robustly handle malformed messages

4.1. Miscellaneous Obsolete Tokens

These syntactic elements are used elsewhere in the obsolete syntax or in the main syntax. Bare CR, bare LF, and NUL are added to obs-qp, obs-body, and obs-unstruct. US-ASCII control characters are added to obs-qp, obs-unstruct, obs-ctext, and obs-qtext. The period character is added to obs-phrase. The obs-phrase-list provides for a (potentially empty) comma-separated list of phrases that may contain "null" elements. That is, there may be two or more commas in such a list with nothing in between them, or commas at the beginning or end of the list.

obs-NO-WS-CTL   =   %d1-8 /            ; US-ASCII control
%d11 / ; characters that do not
%d12 / ; include the carriage
%d14-31 / ; return, line feed, and
%d127 ; white space characters

obs-ctext = obs-NO-WS-CTL

obs-qtext = obs-NO-WS-CTL

obs-utext = %d0 / obs-NO-WS-CTL / VCHAR

obs-qp = "\" (%d0 / obs-NO-WS-CTL / LF / CR)

obs-body = *((*LF *CR *((%d0 / text) *LF *CR)) / CRLF)

obs-unstruct = *((*LF *CR *(obs-utext *LF *CR)) / FWS)

obs-phrase = word *(word / "." / CFWS)

obs-phrase-list = [phrase / CFWS] *("," [phrase / CFWS])

Note: The "period" (or "full stop") character (".") in obs-phrase is not a form that was allowed in earlier versions of this or any other specification. Period (or any other character from specials) was not allowed in phrase because it introduced a parsing difficulty distinguishing between phrases and portions of an addr-spec (see Section 4.4). It appears here because the period character is currently used in many messages in the display-name portion of addresses, especially for initials in names, and therefore must be interpreted properly.

Bare CR and Bare LF appear in messages with two different meanings:

  1. As line separators: In many cases, bare CR or bare LF is incorrectly used instead of CRLF to indicate line separation
  2. As control characters: In other cases, bare CR and bare LF are simply used as US-ASCII control characters with their traditional ASCII meanings

4.2. Obsolete Folding White Space

In the obsolete syntax, any amount of folding white space MAY be inserted where the obs-FWS rule is allowed. This creates the possibility of having two consecutive "folds" in a line, and therefore the possibility that a line that is supposed to be unfolded will have no visible content.

obs-FWS         =   1*WSP *(CRLF 1*WSP)

Example:

Obsolete folding (allowed but not recommended):
Subject: This

is valid
(middle line contains only whitespace)

Current standard (Section 3):
Lines composed entirely of whitespace are forbidden

4.3. Obsolete Date and Time

The syntax for the obsolete date format allows a 2-digit year in the date field and allows for a list of alphabetic time zone specifications that were used in earlier versions of this specification. It also permits comments and folding white space between many of the tokens.

obs-day-of-week =   [CFWS] day-name [CFWS]

obs-day = [CFWS] 1*2DIGIT [CFWS]

obs-year = [CFWS] 2*DIGIT [CFWS]

obs-hour = [CFWS] 2DIGIT [CFWS]

obs-minute = [CFWS] 2DIGIT [CFWS]

obs-second = [CFWS] 2DIGIT [CFWS]

obs-zone = "UT" / "GMT" / ; Universal Time
; North American UT
; offsets
"EST" / "EDT" / ; Eastern: - 5/ - 4
"CST" / "CDT" / ; Central: - 6/ - 5
"MST" / "MDT" / ; Mountain: - 7/ - 6
"PST" / "PDT" / ; Pacific: - 8/ - 7
;
%d65-73 / ; Military zones - "A"
%d75-90 / ; through "I" and "K"
%d97-105 / ; through "Z", both
%d107-122 ; upper and lower case

Interpretation of 2-digit or 3-digit Years:

  • 00-49: Add 2000, giving 2000-2049
  • 50-99: Add 1900, giving 1950-1999
  • 3-digit: Add 1900

Time Zone Abbreviation Interpretation:

AbbreviationMeaningEquivalent
UT, GMTUniversal Time+0000
ESTEastern Standard Time-0500
EDTEastern Daylight Time-0400
CSTCentral Standard Time-0600
CDTCentral Daylight Time-0500
MSTMountain Standard Time-0700
MDTMountain Daylight Time-0600
PSTPacific Standard Time-0800
PDTPacific Daylight Time-0700
Military zones (A-Z)UnpredictableShould be treated as -0000

Note: The 1-character military time zones were defined in a non-standard way in RFC 822 and are therefore unpredictable in their meaning. Unless there is out-of-band information confirming their meaning, they SHOULD all be considered to be equivalent to "-0000".

4.4. Obsolete Addressing

There are four primary differences in addressing:

  1. Routing: Mailbox addresses were allowed to have a routing portion before the addr-spec when enclosed in < and >. The route is simply a comma-separated list of domain names, each preceded by "@", and the list is terminated with a colon.

  2. CFWS Insertion: CFWS were allowed between the period-separated elements of local-part and domain (i.e., not using the dot-atom). In addition, local-part is allowed to contain quoted-string in addition to atom.

  3. Null Members: mailbox-list and address-list were allowed to have "null" members. That is, there could be two or more commas in such a list with nothing in between them, or commas at the beginning or end of the list.

  4. Domain Literals: US-ASCII control characters and quoted-pairs were allowed in domain literals and are added here.

obs-angle-addr  =   [CFWS] "`<" obs-route addr-spec ">`" [CFWS]

obs-route = obs-domain-list ":"

obs-domain-list = *(CFWS / ",") "@" domain
*("," [CFWS] ["@" domain])

obs-mbox-list = *([CFWS] ",") mailbox *("," [mailbox / CFWS])

obs-addr-list = *([CFWS] ",") address *("," [address / CFWS])

obs-group-list = 1*([CFWS] ",") [CFWS]

obs-local-part = word *("." word)

obs-domain = atom *("." atom)

obs-dtext = obs-NO-WS-CTL / quoted-pair

When interpreting messages, the route portion SHOULD be ignored.

Routing Address Example:

Obsolete routing format:
`<@node1.example,@node2.example:[email protected]>`
↑ ↑
└── Routing ──────────┘

When interpreting, ignore routing and use only:
[email protected]

4.5. Obsolete Header Fields

Syntactically, the primary difference in the obsolete field syntax is that it allows multiple occurrences of any field and they may occur in any order. Also, any amount of white space is allowed before the ":" at the end of the field name.

Unless otherwise noted in the following sections, interpretation of the other fields is identical to the interpretation of their non-obsolete counterparts in Section 3.

Obsolete Field Characteristics:

  • Multiple occurrences allowed
  • Any order allowed
  • Whitespace allowed before colon after field name

Example:

Obsolete format (not recommended):
From : [email protected]
Date:Mon, 21 Nov 1997 09:55:06 GMT

Current standard:
From: [email protected]
Date: Mon, 21 Nov 1997 09:55:06 -0600

Chapter 4 Summary

Key Takeaways

  1. Dual Role of Obsolete Syntax:

    • MUST NOT generate
    • MUST parse
  2. Major Differences:

    • More permissive whitespace rules
    • Special characters allowed (NUL, control characters)
    • 2-digit years
    • Alphabetic time zones
    • Address routing
    • Field repetition
  3. Implementation Responsibility:

    • Robustly handle malformed messages
    • Don't crash
    • Don't lose data

Compatibility Strategy

Strict Generation            Permissive Parsing
↓ ↓
Use Section 3 only Accept Section 3+4
New standard format Both new and old formats

Testing Checklist

Parser implementations should test the following obsolete formats:

  • 2-digit years (e.g., "21 Nov 97")
  • Alphabetic time zones (e.g., "EST", "PST")
  • Bare CR or bare LF
  • Addresses with routing
  • Whitespace after field name
  • Folding lines composed entirely of whitespace
  • Phrases containing periods

Next: 5. Security Considerations

Previous: 3. Syntax