Skip to main content

2. URN Syntax

As discussed above, the syntax for URNs in this specification allows significantly more functionality than was the case in the earlier specifications, most recently [RFC2141]. It is also harmonized with the general URI syntax [RFC3986] (which, it must be noted, was completed after the earlier URN specifications).

However, this specification does not extend the URN syntax to allow direct use of characters outside the ASCII range [RFC20]. That restriction implies that any such characters need to be percent-encoded as described in Section 2.1 of the URI specification [RFC3986].

The basic syntax for a URN is defined using the Augmented Backus-Naur Form (ABNF) as specified in [RFC5234]. Rules not defined here (specifically: alphanum, fragment, and pchar) are defined as part of the URI syntax [RFC3986] and used here to point out the syntactic relationship with the terms used there. The definitions of some of the terms used below are not comprehensive; additional restrictions are imposed by the prose that can be found in sections of this document that are specific to those terms (especially r-component in Section 2.3.1 and q-component in Section 2.3.2).

namestring    = assigned-name
[ rq-components ]
[ "#" f-component ]
assigned-name = "urn" ":" NID ":" NSS
NID = (alphanum) 0*30(ldh) (alphanum)
ldh = alphanum / "-"
NSS = pchar *(pchar / "/")
rq-components = [ "?+" r-component ]
[ "?=" q-component ]
r-component = pchar *( pchar / "/" / "?" )
q-component = pchar *( pchar / "/" / "?" )
f-component = fragment

The question mark character "?" can be used without percent-encoding inside r-components, q-components, and f-components. Other than inside those components, a "?" that is not immediately followed by "=" or "+" is not defined for URNs and SHOULD be treated as a syntax error by URN-specific parsers and other processors.

The following sections provide additional information about the syntactic elements of URNs.