1. Introduction

Base encoding of data is used in many situations to store or transfer data in environments that, perhaps for legacy reasons, are restricted to US-ASCII [1] data. Base encoding can also be used in new applications that do not have legacy restrictions, simply because it makes it possible to manipulate objects with text editors.

In the past, different applications have had different requirements and thus sometimes implemented base encodings in slightly different ways. Today, protocol specifications sometimes use base encodings in general, and "base64" in particular, without a precise description or reference. Multipurpose Internet Mail Extensions (MIME) [4] is often used as a reference for base64 without considering the consequences for line-wrapping or non-alphabet characters. The purpose of this specification is to establish common alphabet and encoding considerations. This will hopefully reduce ambiguity in other documents, leading to better interoperability.

Why Base Encoding is Needed

Core Problem

Many legacy systems and protocols can only handle text data (US-ASCII) and cannot directly transmit binary data:

Problem Scenarios:
❌ Email systems (SMTP) - Only support 7-bit ASCII
❌ URL parameters - Certain characters have special meanings
❌ JSON/XML - Cannot directly embed binary data
❌ Text editors - Cannot edit binary files

Base Encoding Solution

Binary Data → Base Encoding → Text Data
(Non-printable)   (Convert)     (Printable, Transmittable)

Example:
Raw data: [0x48, 0x65, 0x6C, 0x6C, 0x6F]  (binary)
Base64:   "SGVsbG8="                      (text)

Historical Background and Interoperability Issues

Problems Caused by Implementation Differences

Before Base64 standardization, different implementations had variations:

Implementation	Line Length Limit	Padding Rule	Alphabet
MIME	76 characters	Required	Standard
PEM	64 characters	Required	Standard
Some URL encoding	No limit	Optional	URL-safe

These differences led to:

❌ Cross-system data exchange failures
❌ Decoding errors
❌ Security vulnerabilities

Value of RFC 4648

This specification resolves these issues by:

Unified Alphabet - Clearly defines standard alphabets for Base64, Base32, Base16
Clear Rules - Specifies how to handle line breaks, padding, illegal characters
Provides Variants - Defines URL-safe Base64 variant
Interoperability - Ensures compatibility between different implementations

Applicable Scenarios

Typical Applications of Base64

1. Email Attachments (MIME)
   Content-Transfer-Encoding: base64
   
2. Data URIs
   data:image/png;base64,iVBORw0KGgo...
   
3. JWT Tokens
   eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
   
4. HTTP Basic Authentication
   Authorization: Basic dXNlcjpwYXNz
   
5. Embedding Binary Data in XML/JSON
   {"avatar": "SGVsbG8gV29ybGQ="}

Why Not Transmit Binary Directly?

Reasons:
Protocol restrictions - SMTP, HTTP headers only support ASCII
Text safety - Avoid issues caused by control characters
Editability - Can view and modify with text editors
Compatibility - More reliable cross-platform, cross-system transmission

Goals of This Specification

RFC 4648 aims to:

✅ Eliminate Ambiguity - Provide clear, unambiguous encoding definitions
✅ Improve Interoperability - Ensure compatibility between different implementations
✅ Provide Choices - Offer appropriate encoding variants for different scenarios
✅ Security Considerations - Clearly specify security-related implementation requirements

Next Steps

The following sections will detail:

Section 2: Conventions for using RFC 2119 keywords
Section 3: Implementation discrepancies and recommended behaviors
Sections 4-8: Detailed specifications for various Base encodings
Sections 9-10: Examples and test vectors
Section 12: Security considerations

Why Base Encoding is Needed​

Core Problem​

Base Encoding Solution​

Historical Background and Interoperability Issues​

Problems Caused by Implementation Differences​

Value of RFC 4648​

Applicable Scenarios​

Typical Applications of Base64​

Why Not Transmit Binary Directly?​

Goals of This Specification​

Next Steps​