3. Syntax Components
The generic URI syntax consists of a hierarchical sequence of components referred to as the scheme, authority, path, query, and fragment.
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
Component Structure Example:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
The following sections provide detailed descriptions of each component.
3.1. Scheme
Each URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme. Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-").
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
Examples:
httphttpsftpmailtofiledatatel
Scheme names are case-insensitive. The canonical form is lowercase, and normalizers SHOULD convert scheme names to lowercase.
Note: Although schemes are case-insensitive, producers and normalizers should use lowercase letters.
3.2. Authority
The authority component consists of an optional user information subcomponent, a host subcomponent, and an optional port subcomponent, preceded by two slashes ("//").
authority = [ userinfo "@" ] host [ ":" port ]
Complete Example:
user:[email protected]:8080
\__________/ \______________/ \__/
userinfo host port
3.2.1. User Information
The userinfo subcomponent may consist of a user name and, optionally, scheme-specific information about how to gain authorization to access the resource.
userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
Examples:
user@hostuser:password@host(deprecated, security risk)anonymous@host
Security Warning: Including passwords in URIs is deprecated because passwords may be exposed in logs, history, etc.
3.2.2. Host
The host subcomponent can be a registered name (including but not limited to a hostname) or an IP address. IPv6 addresses MUST be enclosed in square brackets.
host = IP-literal / IPv4address / reg-name
IP-literal = "[" ( IPv6address / IPvFuture ) "]"
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
reg-name = *( unreserved / pct-encoded / sub-delims )
Host Type Examples:
1. Registered Name
www.example.com
example.org
localhost
2. IPv4 Address
192.0.2.1
127.0.0.1
3. IPv6 Address
[2001:db8::1]
[::1]
[fe80::1]
4. Future IP Version
[v9.abc:def]
Host Normalization: Hostnames are case-insensitive and SHOULD be normalized to lowercase.
3.2.3. Port
The port subcomponent is optional and represented in decimal digits.
port = *DIGIT
Examples:
http://example.com:80/(HTTP default port)https://example.com:443/(HTTPS default port)http://example.com:8080/(custom port)
Default Port: If the port is empty or not given, it is assumed to be the default port for the scheme.
3.3. Path
The path component contains data that, along with data in the query component, serves to identify a resource within the scope of the URI's scheme and naming authority. The path is composed of a sequence of path segments separated by a slash ("/") character.
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
segment = *pchar
segment-nz = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
Path Type Examples:
1. Absolute Path
/path/to/resource
/index.html
/
2. Relative Path
path/to/resource
../parent/resource
resource
3. Empty Path
http://example.com
Path Segments: A path consists of segments separated by slashes. Special segments include:
.(current directory)..(parent directory)
3.4. Query
The query component contains non-hierarchical data that, along with data in the path component, serves to identify a resource. The query component is indicated by a question mark ("?").
query = *( pchar / "/" / "?" )
Query String Examples:
?key=value
?name=John&age=30
?search=hello+world
?filter[]=a&filter[]=b
?q=%E4%BD%A0%E5%A5%BD
Common Format: Although the URI specification doesn't define the query string format, the key=value pairs separated by & format has become a de facto standard:
?key1=value1&key2=value2&key3=value3
3.5. Fragment
The fragment component allows indirect identification of a secondary resource within a representation of the resource identified by the URI. The fragment identifier is indicated by a hash sign ("#").
fragment = *( pchar / "/" / "?" )
Fragment Examples:
#section1
#top
#chapter-3
#line-42
Important Characteristics:
- Client-Side Processing: Fragment identifiers are processed by the client and are not sent to the server
- In-Document Navigation: Commonly used for anchors in HTML documents
- Secondary Resource: Identifies a part of the primary resource
Example:
http://example.com/page.html#section2
- Server receives:
http://example.com/page.html - Client navigates to:
#section2
URI Components Summary
| Component | Prefix | Required | Example | Description |
|---|---|---|---|---|
| scheme | - | Yes | http | Protocol/scheme |
| authority | // | No | user@host:port | Authority information |
| path | - | Yes* | /path/to/resource | Resource path |
| query | ? | No | key=value | Query parameters |
| fragment | # | No | section1 | Fragment identifier |
*Path may be empty
Complete URI Example Breakdown
Example 1: HTTP URL
https://user:[email protected]:8080/path/to/page?key=value#section
| Component | Value |
|---|---|
| scheme | https |
| userinfo | user:pass |
| host | www.example.com |
| port | 8080 |
| path | /path/to/page |
| query | key=value |
| fragment | section |
Example 2: Mailto URI
mailto:[email protected]?subject=Hello
| Component | Value |
|---|---|
| scheme | mailto |
| path | [email protected] |
| query | subject=Hello |
Example 3: File URI
file:///home/user/document.txt
| Component | Value |
|---|---|
| scheme | file |
| authority | (empty) |
| path | /home/user/document.txt |
Component Encoding Rules
Different components allow different character sets:
| Component | Allowed Special Characters | Must Be Encoded |
|---|---|---|
| scheme | + - . | All others |
| userinfo | : ! $ & ' ( ) * + , ; = | All others |
| host | - . _ (reg-name) | All others |
| port | 0-9 | All others |
| path | : @ ! $ & ' ( ) * + , ; = | All others |
| query | : @ / ? ! $ & ' ( ) * + , ; = | All others |
| fragment | : @ / ? ! $ & ' ( ) * + , ; = | All others |
Next Chapter: 4. URI Reference - URI references and relative references