Skip to main content

6. Content-Transfer-Encoding Header Field

Many media types which could usefully be transported via email are represented, in their "natural" format, as 8-bit character or binary data. Such data cannot be transmitted over some transport protocols. For example, RFC 821 restricts mail messages to 7-bit US-ASCII data with lines no longer than 1000 characters. The Content-Transfer-Encoding field is used to specify what encoding transformation has been applied.

6.1. Content-Transfer-Encoding Syntax

encoding := "Content-Transfer-Encoding" ":" mechanism

mechanism := "7bit" / "8bit" / "binary" /
"quoted-printable" / "base64" /
ietf-token / x-token

These values are not case sensitive -- Base64 and BASE64 and bAsE64 are all equivalent.

6.2. Content-Transfer-Encoding Semantics

7bit

The "7bit" encoding means that the data is all represented as short lines of US-ASCII data with no octets with decimal values greater than 127. Lines must be no longer than 998 octets, not counting the CRLF. No NUL octets (decimal value 0) are allowed. CR and LF occur only as part of CRLF sequences.

8bit

The "8bit" encoding means that the data is all represented as relatively short lines with 998 octets or less between CRLF sequences, but octets with decimal values greater than 127 may be used. As with "7bit" data, CR and LF occur only as part of CRLF sequences and no NULs are allowed.

binary

The "binary" encoding indicates that any sequence of octets whatsoever is allowed. This encoding is not further defined in this document.

quoted-printable

The "quoted-printable" encoding is intended to represent data that largely consists of octets that correspond to printable characters in the US-ASCII character set.

base64

The "base64" encoding is designed to represent arbitrary sequences of octets in a form that need not be humanly readable.

6.3. New Content-Transfer-Encodings

New Content-Transfer-Encoding values may be registered with IANA. The requirements for such registration are specified in RFC 2048.

6.4. Interpretation and Use

The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all mean that the identity (i.e., NO) encoding transformation has been performed. As such, they serve simply as indicators of the domain of the body data, and provide useful information about the sort of encoding that might be needed for transmission in a given transport system.

6.5. Translating Encodings

It may be desirable to allow the transmission of non-textual body content without encoding it using the Base64 or Quoted-Printable encodings. The "8bit" and "binary" encoding mechanisms provide such functionality.

6.6. Canonical Encoding Model

The encoding formats defined here explicitly encode all data in ASCII. Thus, if the data being encoded is not ASCII, it must first be converted to ASCII using some character encoding. This encoding must be declared using the "charset" parameter in the Content-Type field.

6.7. Quoted-Printable Content-Transfer-Encoding

The Quoted-Printable encoding uses printable ASCII characters (characters with values 33 through 126) to allow encoding to be used on data that is largely text.

Encoding Rules

  1. Any printable ASCII character (decimal values 33 through 60 and 62 through 126) may be represented literally, except for "="
  2. Tab and space may be represented literally, unless they appear at the end of a line
  3. The equals sign "=" is used as an escape character
  4. Non-representable characters are represented as "=" followed by two hexadecimal digits representing the octet's value
  5. If data contains meaningful line breaks, they must be represented as quoted-printable encoding
  6. Encoded lines must not be longer than 76 characters, not counting the CRLF

Example

Original: If you believe that truth=beauty, then surely mathematics is the most beautiful branch of philosophy.

Encoded: If you believe that truth=3Dbeauty, then surely mathematics is the most =
beautiful branch of philosophy.

6.8. Base64 Content-Transfer-Encoding

The Base64 Content-Transfer-Encoding is designed to represent arbitrary sequences of octets in a form that need not be humanly readable.

Encoding Process

  1. Divide the input data stream into groups of 24 bits (3 octets)
  2. Divide each 24-bit group into four groups of 6 bits
  3. Map each 6-bit group to one character in the Base64 alphabet
  4. If the last group has fewer than 24 bits, pad with zero bits and add "=" as padding in the output

Base64 Alphabet

Value Encoding  Value Encoding  Value Encoding  Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y

Example

Original (ASCII): Man
Binary: 01001101 01100001 01101110
Grouped 6-bit: 010011 010110 000101 101110
Base64: T W F u

Encoded Output Format

  • The encoded output stream must be represented in lines of no more than 76 characters each
  • All lines except the last must be exactly 76 characters long
  • Any CRLF pairs appearing in the encoded data represent line breaks in the encoded output only

Encoding Comparison:

EncodingPurposeLine LimitCharacter SetExpansion
7bitPure ASCII text998 bytesUS-ASCIINone
8bitExtended text998 bytes8-bit octetsNone
binaryBinary dataNoneAnyNone
quoted-printableMostly ASCII76 charsASCII + escape~1-3x
base64Arbitrary binary76 chars64 chars~1.33x

Selection Guide:

  • Pure ASCII text: 7bit (no encoding needed)
  • Text with occasional non-ASCII: quoted-printable (better readability)
  • Binary data (images, attachments): base64 (standard method)
  • Modern systems: 8bit or binary (if supported)