1. Introduction
Base encoding of data is used in many situations to store or transfer data in environments that, perhaps for legacy reasons, are restricted to US-ASCII [1] data. Base encoding can also be used in new applications that do not have legacy restrictions, simply because it makes it possible to manipulate objects with text editors.
In the past, different applications have had different requirements and thus sometimes implemented base encodings in slightly different ways. Today, protocol specifications sometimes use base encodings in general, and "base64" in particular, without a precise description or reference. Multipurpose Internet Mail Extensions (MIME) [4] is often used as a reference for base64 without considering the consequences for line-wrapping or non-alphabet characters. The purpose of this specification is to establish common alphabet and encoding considerations. This will hopefully reduce ambiguity in other documents, leading to better interoperability.
Why Base Encoding is Needed
Core Problem
Many legacy systems and protocols can only handle text data (US-ASCII) and cannot directly transmit binary data:
Problem Scenarios:
❌ Email systems (SMTP) - Only support 7-bit ASCII
❌ URL parameters - Certain characters have special meanings
❌ JSON/XML - Cannot directly embed binary data
❌ Text editors - Cannot edit binary files
Base Encoding Solution
Binary Data → Base Encoding → Text Data
(Non-printable) (Convert) (Printable, Transmittable)
Example:
Raw data: [0x48, 0x65, 0x6C, 0x6C, 0x6F] (binary)
Base64: "SGVsbG8=" (text)
Historical Background and Interoperability Issues
Problems Caused by Implementation Differences
Before Base64 standardization, different implementations had variations:
| Implementation | Line Length Limit | Padding Rule | Alphabet |
|---|---|---|---|
| MIME | 76 characters | Required | Standard |
| PEM | 64 characters | Required | Standard |
| Some URL encoding | No limit | Optional | URL-safe |
These differences led to:
- ❌ Cross-system data exchange failures
- ❌ Decoding errors
- ❌ Security vulnerabilities
Value of RFC 4648
This specification resolves these issues by:
- Unified Alphabet - Clearly defines standard alphabets for Base64, Base32, Base16
- Clear Rules - Specifies how to handle line breaks, padding, illegal characters
- Provides Variants - Defines URL-safe Base64 variant
- Interoperability - Ensures compatibility between different implementations
Applicable Scenarios
Typical Applications of Base64
1. Email Attachments (MIME)
Content-Transfer-Encoding: base64
2. Data URIs
...
3. JWT Tokens
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
4. HTTP Basic Authentication
Authorization: Basic dXNlcjpwYXNz
5. Embedding Binary Data in XML/JSON
{"avatar": "SGVsbG8gV29ybGQ="}
Why Not Transmit Binary Directly?
Reasons:
1. Protocol restrictions - SMTP, HTTP headers only support ASCII
2. Text safety - Avoid issues caused by control characters
3. Editability - Can view and modify with text editors
4. Compatibility - More reliable cross-platform, cross-system transmission
Goals of This Specification
RFC 4648 aims to:
✅ Eliminate Ambiguity - Provide clear, unambiguous encoding definitions
✅ Improve Interoperability - Ensure compatibility between different implementations
✅ Provide Choices - Offer appropriate encoding variants for different scenarios
✅ Security Considerations - Clearly specify security-related implementation requirements
Next Steps
The following sections will detail:
- Section 2: Conventions for using RFC 2119 keywords
- Section 3: Implementation discrepancies and recommended behaviors
- Sections 4-8: Detailed specifications for various Base encodings
- Sections 9-10: Examples and test vectors
- Section 12: Security considerations