RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
Status: Internet Standard (STD 66)
Updates: RFC 1738
Obsoletes: RFC 2732, 2396, 1808
Authors: T. Berners-Lee (W3C/MIT), R. Fielding (Day Software), L. Masinter (Adobe Systems)
Date: January 2005
Abstract
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource.
This specification defines:
- The generic URI syntax
- A process for resolving URI references that might be in relative form
- Guidelines and security considerations for the use of URIs on the Internet
Key Features:
- The URI syntax defines a grammar that is a superset of all valid URIs
- Allows implementations to parse common components of a URI reference without knowing scheme-specific requirements
- Does not define a generative grammar for URIs; that task is performed by individual URI scheme specifications
Importance
RFC 3986 is core to Web infrastructure:
- 🌐 Defines generic syntax for URLs and URNs
- 🔗 Foundation for all resource location on the Web
- 📋 Basis for all protocols including HTTP, HTTPS, FTP
- 🎯 Relative URI resolution algorithm
- 🔒 URI security considerations
Table of Contents
1. Introduction
- 1.1 Overview of URIs
- 1.1.1 Generic Syntax
- 1.1.2 Examples
- 1.1.3 URI, URL, and URN
- 1.2 Design Considerations
- 1.2.1 Transcription
- 1.2.2 Separating Identification from Interaction
- 1.2.3 Hierarchical Identifiers
- 1.3 Syntax Notation
2. Characters
- 2.1 Percent-Encoding
- 2.2 Reserved Characters
- 2.3 Unreserved Characters
- 2.4 When to Encode or Decode
- 2.5 Identifying Data
3. Syntax Components
- 3.1 Scheme
- 3.2 Authority
- 3.2.1 User Information
- 3.2.2 Host
- 3.2.3 Port
- 3.3 Path
- 3.4 Query
- 3.5 Fragment
4. Usage
- 4.1 URI Reference
- 4.2 Relative Reference
- 4.3 Absolute URI
- 4.4 Same-Document Reference
- 4.5 Suffix Reference
5. Reference Resolution
- 5.1 Establishing a Base URI
- 5.2 Relative Resolution
- 5.3 Component Recomposition
- 5.4 Reference Resolution Examples
6. Normalization and Comparison
- 6.1 Equivalence
- 6.2 Comparison Ladder
7. Security Considerations
- 7.1 Reliability and Consistency
- 7.2 Malicious Construction
- 7.3 Back-End Transcoding
- 7.4 Rare IP Address Formats
- 7.5 Sensitive Information
- 7.6 Semantic Attacks
8. IANA Considerations
9. Acknowledgements
10. References
Appendices
- Appendix A. Collected ABNF for URI
- Appendix B. Parsing a URI Reference with a Regular Expression
- Appendix C. Delimiting a URI in Context
- Appendix D. Changes from RFC 2396
Quick Reference
URI Generic Syntax
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
URI Component Example
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
Common URI Schemes
| Scheme | Purpose | Example |
|---|---|---|
| http | HTTP Protocol | http://www.example.com/ |
| https | Secure HTTP | https://www.example.com/ |
| ftp | File Transfer | ftp://ftp.example.com/file.txt |
| mailto | mailto:[email protected] | |
| file | Local File | file:///path/to/file |
| data | Inline Data | data:text/plain;base64,SGVsbG8= |
| tel | Telephone | tel:+1-800-555-1212 |
Reserved Characters
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Unreserved Characters
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Percent-Encoding
pct-encoded = "%" HEXDIG HEXDIG
Examples:
Space → %20
"你" (Chinese) → %E4%BD%A0
URI vs URL vs URN
Relationship Diagram
URI (Uniform Resource Identifier)
/ \
URL URN
(Uniform Resource Locator) (Uniform Resource Name)
(How to access) (What it is)
Comparison
| Concept | Focus | Persistence | Example |
|---|---|---|---|
| URI | Identification | Not guaranteed | All URIs |
| URL | Location | Location-dependent | http://example.com/page |
| URN | Name | Persistent | urn:isbn:0-486-27557-4 |
Key: All URLs are URIs, all URNs are URIs, but not all URIs are URLs or URNs.
Implementation Requirements
MUST Implement
- ✅ Basic URI syntax parsing
- ✅ Correct percent-encoding handling
- ✅ Relative URI resolution algorithm
- ✅ Case-insensitive scheme and host
- ✅ Dot segment removal in paths
SHOULD Implement
- ✅ URI normalization
- ✅ IRI support (Internationalized Resource Identifiers)
- ✅ IPv6 address support
- ✅ Secure user information handling
MAY Implement
- Scheme-specific validation
- URI equivalence comparison
- Automatic normalization
Related RFCs
- RFC 1738: URL Specification (updated)
- RFC 2396: URI Generic Syntax (obsoleted)
- RFC 2732: IPv6 Address Format (obsoleted)
- RFC 3987: IRI (Internationalized Resource Identifiers)
- RFC 6874: IPv6 Zone Identifiers
- RFC 7230: HTTP/1.1 Message Syntax
- RFC 8820: URI Design and Ownership
Online Resources
Next Chapter: 1. Introduction