Skip to main content

1. Introduction

This chapter introduces the basic concepts of URIs, design considerations, and syntax notation.


Overview

A Uniform Resource Identifier (URI) provides a simple and extensible means for identifying a resource.

Historical Background:

  • Derived from concepts introduced by the World Wide Web global information initiative
  • Use of these identifiers dates from 1990
  • This document obsoletes RFC 2396, merging URL and relative URL specifications

This Specification:

  • Defines a single, generic syntax for all URIs
  • Introduces IPv6 address syntax
  • Excludes specific syntax of individual URI schemes (updated in separate documents)

1.1. Overview of URIs

Characteristics of URIs

URI characteristics can be summarized by three words: Uniform, Resource, Identifier

Uniform

Benefits of Uniformity:

  1. Context-Independence: Allows different types of resource identifiers to be used in the same context, even when access mechanisms differ

  2. Semantic Consistency: Allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers

  3. Extensibility: Allows introduction of new types of resource identifiers without interfering with existing identifiers

  4. Reusability: Allows identifiers to be reused in many different contexts, permitting new applications or protocols to leverage a pre-existing, large, and widely used set of resource identifiers

Example:

http://example.com/page
ftp://example.com/file
mailto:[email protected]

All use the same generic syntax structure

Resource

Scope of Resources: This specification does not limit what might be a resource; the term "resource" is used in a general sense for whatever might be identified by a URI.

Familiar Examples:

  • 📄 Electronic documents
  • 🖼️ Images
  • 📊 Information sources with consistent purpose (e.g., "today's weather report for Los Angeles")
  • 🔧 Services (e.g., HTTP-to-SMS gateway)
  • 📚 Collections of other resources

Broader Resources:

  • 🧑 Human beings
  • 🏢 Corporations
  • 📖 Bound books in a library
  • 🔢 Abstract concepts (mathematical operators, relationship types, numeric values)

Key Point: Resources are not necessarily accessible via the Internet

Identifier

Definition: An identifier embodies the information required to distinguish what is being identified from all other things within its scope of identification.

Meaning of "Identify":

  • Refers to the purpose of distinguishing one resource from all others
  • Does not consider how that purpose is accomplished (e.g., by name, address, or context)
  • Should not be misinterpreted as the identifier defining or embodying the identity of referenced content
  • Should not assume that systems using the URI will access the identified resource

URI Definition: A URI is an identifier consisting of a sequence of characters matching the syntax rule named <URI> in Section 3.

Global Nature of URIs

Global Scope: URIs have global scope and are interpreted consistently regardless of context

Example:

http://localhost/

Has the same interpretation for every user employing this reference
Even though the network interface corresponding to "localhost" may differ
Interpretation is independent of access

Context Relativity:

  • Actions based on the reference will be relative to the end-user's context
  • Operations intended to reference globally unique things must use URIs that distinguish the resource from all others

Local Context URIs:

file:///etc/hosts

Used only when context itself is a resource-defining aspect
E.g., online help manuals referencing files on the end-user's file system

1.1.1. Generic Syntax

Scheme Names

Each URI begins with a scheme name (defined in Section 3.1) that refers to a specification for assigning identifiers within that scheme.

Federated Naming System: URI syntax is a federated and extensible naming system where specifications of each scheme may further restrict syntax and semantics of identifiers using that scheme.

Generic Elements

This specification defines those elements of the URI syntax that are:

  • ✅ Required by all URI schemes
  • ✅ Common to many URI schemes

Benefits:

  1. Scheme-Independent Parsing: Defines syntax and semantics needed to implement scheme-independent URI reference parsing
  2. Deferred Processing: Allows scheme-dependent processing of a URI to be deferred until needed
  3. Protocol Independence: Protocols and data formats using URI references can refer to this specification as defining allowed syntax range
  4. Forward Compatibility: Includes schemes not yet defined

Decoupled Evolution: Decouples evolution of identification schemes from evolution of protocols, data formats, and implementations using URIs.

Generic Parser

Parsing Capability: A parser for the generic URI syntax can parse any URI reference into its major components

Two-Stage Parsing:

  1. Generic Parsing: Determine scheme and major components
  2. Scheme-Specific Parsing: Perform further scheme-specific parsing on components

Superset Relationship: Generic URI syntax is a superset of the syntax of all URI schemes


1.1.2. Examples

URI Examples

The following example URIs illustrate several URI schemes and variations in their common syntax components:

ftp://ftp.is.co.za/rfc/rfc1808.txt
  • Scheme: ftp
  • Authority: ftp.is.co.za
  • Path: /rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
  • Scheme: http
  • Authority: www.ietf.org
  • Path: /rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
  • Scheme: ldap
  • Authority: [2001:db8::7] (IPv6 address)
  • Path: /c=GB
  • Query: objectClass?one
news:comp.infosystems.www.servers.unix
  • Scheme: news
  • Path: comp.infosystems.www.servers.unix
tel:+1-816-555-1212
  • Scheme: tel
  • Path: +1-816-555-1212
telnet://192.0.2.16:80/
  • Scheme: telnet
  • Authority: 192.0.2.16:80
  • Path: /
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
  • Scheme: urn
  • Path: oasis:names:specification:docbook:dtd:xml:4.1.2

1.1.3. URI, URL, and URN

Conceptual Distinction

URI Classification: A URI can be further classified as a locator, a name, or both

URL (Uniform Resource Locator)

Definition: The subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location")

Characteristics:

  • Provides access method
  • Describes network location
  • May change if resource moves

Examples:

http://www.example.com/index.html
ftp://ftp.example.com/file.txt

URN (Uniform Resource Name)

Definition: Historically used to refer to both URIs under the "urn" scheme [RFC2141], which are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable

Characteristics:

  • Persistent identifier
  • Location-independent
  • Remains valid even if resource disappears

Examples:

urn:isbn:0-486-27557-4
urn:ietf:rfc:3986

Relationship Diagram

        URI (Uniform Resource Identifier)
/ \
URL URN
(How to find resource) (Persistent name)
(Location-dependent) (Location-independent)

Contemporary Usage

Terminology Simplification: Currently, it is best to view the terms "URL" and "URN" as mnemonics within the URI space

Practical Guidance:

  • Use "URI" rather than "URL" or "URN"
  • All URLs are URIs
  • All URNs are URIs
  • But not all URIs are URLs or URNs

1.2. Design Considerations

URI design must balance multiple, sometimes conflicting, goals.

1.2.1. Transcription

Goal: URIs should be transcribable by humans using various techniques and media

Constraints:

  • Should be brief
  • Should be memorable
  • Should be easy to enter

Conflicts with Other Goals:

Brevity vs Readability
http://x.co/a vs http://example.com/article

Memorability vs Global Uniqueness
http://blog vs http://username.blog.example.com

Practical Considerations:

  • Limited character set (ASCII)
  • Case-insensitive systems
  • Special character handling

Transcription Errors:

Common mistakes:
- Confusing 0 (zero) and O (letter)
- Confusing 1 (one) and l (lowercase L)
- Confusing - (hyphen) and _ (underscore)

1.2.2. Separating Identification from Interaction

Principle: URI identifies a resource independently of how that resource is accessed

Benefits:

  1. Identification Persistence: Identifiers can remain unchanged when access methods change
  2. Protocol Independence: Same resource can be accessed via multiple methods
  3. Reference Integrity: Can reference inaccessible resources

Example:

Identification: urn:isbn:0-486-27557-4
Access 1: http://amazon.com/dp/0486275574
Access 2: http://barnesandnoble.com/...
Access 3: Local library

Non-Access Uses:

  • 📝 Document references
  • 🏷️ Metadata tags
  • 🔗 Link relationships
  • 📊 Data identification

1.2.3. Hierarchical Identifiers

Hierarchical Structure: URI syntax supports hierarchical namespaces

Organization Form:

http://example.com/products/electronics/phones/model-x
└─Authority─┘ └────────Path Hierarchy─────────┘

Benefits:

  1. Delegated Management: Allows delegation of naming authority
  2. Relative References: Supports relative URI references
  3. Logical Organization: Reflects logical organization of resources

Hierarchy Example:

/products/
/electronics/
/phones/
/model-x
/model-y
/laptops/
/clothing/

1.3. Syntax Notation

This specification uses Augmented Backus-Naur Form (ABNF) [RFC2234] to define URI syntax rules.

ABNF Basics

Rule Format:

rulename = elements

Basic Elements:

ALPHA    = A-Z / a-z
DIGIT = 0-9
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"

Operators:

  • / : Alternation (or)
  • * : Repetition (zero or more)
  • [ ] : Optional
  • ( ) : Grouping

Example:

scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

Interpretation:
- Starts with a letter
- Followed by zero or more (letter/digit/+/-/.)

Key Concepts Summary

Three Characteristics of URIs

  1. Uniform: Consistent syntax and semantics
  2. Resource: Can identify anything
  3. Identifier: Ability to distinguish and identify

URI vs URL vs URN

ConceptFocusPersistenceExample
URIIdentificationNot guaranteedAll URIs
URLLocationLocation-dependenthttp://example.com/page
URNNamePersistenturn:isbn:0-486-27557-4

Design Principles

  1. Transcribability: Humans can easily enter and remember
  2. Separation of Concerns: Identification independent of access
  3. Hierarchy: Supports delegation and organization

Next Chapter: 2. Characters - Character handling and encoding in URIs