1. Introduction
This chapter introduces the basic concepts of URIs, design considerations, and syntax notation.
Overview
A Uniform Resource Identifier (URI) provides a simple and extensible means for identifying a resource.
Historical Background:
- Derived from concepts introduced by the World Wide Web global information initiative
- Use of these identifiers dates from 1990
- This document obsoletes RFC 2396, merging URL and relative URL specifications
This Specification:
- Defines a single, generic syntax for all URIs
- Introduces IPv6 address syntax
- Excludes specific syntax of individual URI schemes (updated in separate documents)
1.1. Overview of URIs
Characteristics of URIs
URI characteristics can be summarized by three words: Uniform, Resource, Identifier
Uniform
Benefits of Uniformity:
-
Context-Independence: Allows different types of resource identifiers to be used in the same context, even when access mechanisms differ
-
Semantic Consistency: Allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers
-
Extensibility: Allows introduction of new types of resource identifiers without interfering with existing identifiers
-
Reusability: Allows identifiers to be reused in many different contexts, permitting new applications or protocols to leverage a pre-existing, large, and widely used set of resource identifiers
Example:
http://example.com/page
ftp://example.com/file
mailto:[email protected]
All use the same generic syntax structure
Resource
Scope of Resources: This specification does not limit what might be a resource; the term "resource" is used in a general sense for whatever might be identified by a URI.
Familiar Examples:
- 📄 Electronic documents
- 🖼️ Images
- 📊 Information sources with consistent purpose (e.g., "today's weather report for Los Angeles")
- 🔧 Services (e.g., HTTP-to-SMS gateway)
- 📚 Collections of other resources
Broader Resources:
- 🧑 Human beings
- 🏢 Corporations
- 📖 Bound books in a library
- 🔢 Abstract concepts (mathematical operators, relationship types, numeric values)
Key Point: Resources are not necessarily accessible via the Internet
Identifier
Definition: An identifier embodies the information required to distinguish what is being identified from all other things within its scope of identification.
Meaning of "Identify":
- Refers to the purpose of distinguishing one resource from all others
- Does not consider how that purpose is accomplished (e.g., by name, address, or context)
- Should not be misinterpreted as the identifier defining or embodying the identity of referenced content
- Should not assume that systems using the URI will access the identified resource
URI Definition: A URI is an identifier consisting of a sequence of characters matching the syntax rule named <URI> in Section 3.
Global Nature of URIs
Global Scope: URIs have global scope and are interpreted consistently regardless of context
Example:
http://localhost/
Has the same interpretation for every user employing this reference
Even though the network interface corresponding to "localhost" may differ
Interpretation is independent of access
Context Relativity:
- Actions based on the reference will be relative to the end-user's context
- Operations intended to reference globally unique things must use URIs that distinguish the resource from all others
Local Context URIs:
file:///etc/hosts
Used only when context itself is a resource-defining aspect
E.g., online help manuals referencing files on the end-user's file system
1.1.1. Generic Syntax
Scheme Names
Each URI begins with a scheme name (defined in Section 3.1) that refers to a specification for assigning identifiers within that scheme.
Federated Naming System: URI syntax is a federated and extensible naming system where specifications of each scheme may further restrict syntax and semantics of identifiers using that scheme.
Generic Elements
This specification defines those elements of the URI syntax that are:
- ✅ Required by all URI schemes
- ✅ Common to many URI schemes
Benefits:
- Scheme-Independent Parsing: Defines syntax and semantics needed to implement scheme-independent URI reference parsing
- Deferred Processing: Allows scheme-dependent processing of a URI to be deferred until needed
- Protocol Independence: Protocols and data formats using URI references can refer to this specification as defining allowed syntax range
- Forward Compatibility: Includes schemes not yet defined
Decoupled Evolution: Decouples evolution of identification schemes from evolution of protocols, data formats, and implementations using URIs.
Generic Parser
Parsing Capability: A parser for the generic URI syntax can parse any URI reference into its major components
Two-Stage Parsing:
- Generic Parsing: Determine scheme and major components
- Scheme-Specific Parsing: Perform further scheme-specific parsing on components
Superset Relationship: Generic URI syntax is a superset of the syntax of all URI schemes
1.1.2. Examples
URI Examples
The following example URIs illustrate several URI schemes and variations in their common syntax components:
ftp://ftp.is.co.za/rfc/rfc1808.txt
- Scheme:
ftp - Authority:
ftp.is.co.za - Path:
/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
- Scheme:
http - Authority:
www.ietf.org - Path:
/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
- Scheme:
ldap - Authority:
[2001:db8::7](IPv6 address) - Path:
/c=GB - Query:
objectClass?one
mailto:[email protected]
- Scheme:
mailto - Path:
[email protected]
news:comp.infosystems.www.servers.unix
- Scheme:
news - Path:
comp.infosystems.www.servers.unix
tel:+1-816-555-1212
- Scheme:
tel - Path:
+1-816-555-1212
telnet://192.0.2.16:80/
- Scheme:
telnet - Authority:
192.0.2.16:80 - Path:
/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
- Scheme:
urn - Path:
oasis:names:specification:docbook:dtd:xml:4.1.2
1.1.3. URI, URL, and URN
Conceptual Distinction
URI Classification: A URI can be further classified as a locator, a name, or both
URL (Uniform Resource Locator)
Definition: The subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location")
Characteristics:
- Provides access method
- Describes network location
- May change if resource moves
Examples:
http://www.example.com/index.html
ftp://ftp.example.com/file.txt
URN (Uniform Resource Name)
Definition: Historically used to refer to both URIs under the "urn" scheme [RFC2141], which are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable
Characteristics:
- Persistent identifier
- Location-independent
- Remains valid even if resource disappears
Examples:
urn:isbn:0-486-27557-4
urn:ietf:rfc:3986
Relationship Diagram
URI (Uniform Resource Identifier)
/ \
URL URN
(How to find resource) (Persistent name)
(Location-dependent) (Location-independent)
Contemporary Usage
Terminology Simplification: Currently, it is best to view the terms "URL" and "URN" as mnemonics within the URI space
Practical Guidance:
- Use "URI" rather than "URL" or "URN"
- All URLs are URIs
- All URNs are URIs
- But not all URIs are URLs or URNs
1.2. Design Considerations
URI design must balance multiple, sometimes conflicting, goals.
1.2.1. Transcription
Goal: URIs should be transcribable by humans using various techniques and media
Constraints:
- Should be brief
- Should be memorable
- Should be easy to enter
Conflicts with Other Goals:
Brevity vs Readability
http://x.co/a vs http://example.com/article
Memorability vs Global Uniqueness
http://blog vs http://username.blog.example.com
Practical Considerations:
- Limited character set (ASCII)
- Case-insensitive systems
- Special character handling
Transcription Errors:
Common mistakes:
- Confusing 0 (zero) and O (letter)
- Confusing 1 (one) and l (lowercase L)
- Confusing - (hyphen) and _ (underscore)
1.2.2. Separating Identification from Interaction
Principle: URI identifies a resource independently of how that resource is accessed
Benefits:
- Identification Persistence: Identifiers can remain unchanged when access methods change
- Protocol Independence: Same resource can be accessed via multiple methods
- Reference Integrity: Can reference inaccessible resources
Example:
Identification: urn:isbn:0-486-27557-4
Access 1: http://amazon.com/dp/0486275574
Access 2: http://barnesandnoble.com/...
Access 3: Local library
Non-Access Uses:
- 📝 Document references
- 🏷️ Metadata tags
- 🔗 Link relationships
- 📊 Data identification
1.2.3. Hierarchical Identifiers
Hierarchical Structure: URI syntax supports hierarchical namespaces
Organization Form:
http://example.com/products/electronics/phones/model-x
└─Authority─┘ └────────Path Hierarchy─────────┘
Benefits:
- Delegated Management: Allows delegation of naming authority
- Relative References: Supports relative URI references
- Logical Organization: Reflects logical organization of resources
Hierarchy Example:
/products/
/electronics/
/phones/
/model-x
/model-y
/laptops/
/clothing/
1.3. Syntax Notation
This specification uses Augmented Backus-Naur Form (ABNF) [RFC2234] to define URI syntax rules.
ABNF Basics
Rule Format:
rulename = elements
Basic Elements:
ALPHA = A-Z / a-z
DIGIT = 0-9
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
Operators:
/: Alternation (or)*: Repetition (zero or more)[ ]: Optional( ): Grouping
Example:
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
Interpretation:
- Starts with a letter
- Followed by zero or more (letter/digit/+/-/.)
Key Concepts Summary
Three Characteristics of URIs
- Uniform: Consistent syntax and semantics
- Resource: Can identify anything
- Identifier: Ability to distinguish and identify
URI vs URL vs URN
| Concept | Focus | Persistence | Example |
|---|---|---|---|
| URI | Identification | Not guaranteed | All URIs |
| URL | Location | Location-dependent | http://example.com/page |
| URN | Name | Persistent | urn:isbn:0-486-27557-4 |
Design Principles
- Transcribability: Humans can easily enter and remember
- Separation of Concerns: Identification independent of access
- Hierarchy: Supports delegation and organization
Next Chapter: 2. Characters - Character handling and encoding in URIs