Skip to main content

1. Introduction

Base encoding of data is used in many situations to store or transfer data in environments that, perhaps for legacy reasons, are restricted to US-ASCII [1] data. Base encoding can also be used in new applications that do not have legacy restrictions, simply because it makes it possible to manipulate objects with text editors.

In the past, different applications have had different requirements and thus sometimes implemented base encodings in slightly different ways. Today, protocol specifications sometimes use base encodings in general, and "base64" in particular, without a precise description or reference. Multipurpose Internet Mail Extensions (MIME) [4] is often used as a reference for base64 without considering the consequences for line-wrapping or non-alphabet characters. The purpose of this specification is to establish common alphabet and encoding considerations. This will hopefully reduce ambiguity in other documents, leading to better interoperability.


Why Base Encoding is Needed

Core Problem

Many legacy systems and protocols can only handle text data (US-ASCII) and cannot directly transmit binary data:

Problem Scenarios:
❌ Email systems (SMTP) - Only support 7-bit ASCII
❌ URL parameters - Certain characters have special meanings
❌ JSON/XML - Cannot directly embed binary data
❌ Text editors - Cannot edit binary files

Base Encoding Solution

Binary Data → Base Encoding → Text Data
(Non-printable) (Convert) (Printable, Transmittable)

Example:
Raw data: [0x48, 0x65, 0x6C, 0x6C, 0x6F] (binary)
Base64: "SGVsbG8=" (text)

Historical Background and Interoperability Issues

Problems Caused by Implementation Differences

Before Base64 standardization, different implementations had variations:

ImplementationLine Length LimitPadding RuleAlphabet
MIME76 charactersRequiredStandard
PEM64 charactersRequiredStandard
Some URL encodingNo limitOptionalURL-safe

These differences led to:

  • ❌ Cross-system data exchange failures
  • ❌ Decoding errors
  • ❌ Security vulnerabilities

Value of RFC 4648

This specification resolves these issues by:

  1. Unified Alphabet - Clearly defines standard alphabets for Base64, Base32, Base16
  2. Clear Rules - Specifies how to handle line breaks, padding, illegal characters
  3. Provides Variants - Defines URL-safe Base64 variant
  4. Interoperability - Ensures compatibility between different implementations

Applicable Scenarios

Typical Applications of Base64

1. Email Attachments (MIME)
Content-Transfer-Encoding: base64

2. Data URIs
...

3. JWT Tokens
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

4. HTTP Basic Authentication
Authorization: Basic dXNlcjpwYXNz

5. Embedding Binary Data in XML/JSON
{"avatar": "SGVsbG8gV29ybGQ="}

Why Not Transmit Binary Directly?

Reasons:
1. Protocol restrictions - SMTP, HTTP headers only support ASCII
2. Text safety - Avoid issues caused by control characters
3. Editability - Can view and modify with text editors
4. Compatibility - More reliable cross-platform, cross-system transmission

Goals of This Specification

RFC 4648 aims to:

Eliminate Ambiguity - Provide clear, unambiguous encoding definitions
Improve Interoperability - Ensure compatibility between different implementations
Provide Choices - Offer appropriate encoding variants for different scenarios
Security Considerations - Clearly specify security-related implementation requirements


Next Steps

The following sections will detail:

  • Section 2: Conventions for using RFC 2119 keywords
  • Section 3: Implementation discrepancies and recommended behaviors
  • Sections 4-8: Detailed specifications for various Base encodings
  • Sections 9-10: Examples and test vectors
  • Section 12: Security considerations