2. URN Syntax
As discussed above, the syntax for URNs in this specification allows significantly more functionality than was the case in the earlier specifications, most recently [RFC2141]. It is also harmonized with the general URI syntax [RFC3986] (which, it must be noted, was completed after the earlier URN specifications).
However, this specification does not extend the URN syntax to allow direct use of characters outside the ASCII range [RFC20]. That restriction implies that any such characters need to be percent-encoded as described in Section 2.1 of the URI specification [RFC3986].
The basic syntax for a URN is defined using the Augmented Backus-Naur Form (ABNF) as specified in [RFC5234]. Rules not defined here (specifically: alphanum, fragment, and pchar) are defined as part of the URI syntax [RFC3986] and used here to point out the syntactic relationship with the terms used there. The definitions of some of the terms used below are not comprehensive; additional restrictions are imposed by the prose that can be found in sections of this document that are specific to those terms (especially r-component in Section 2.3.1 and q-component in Section 2.3.2).
namestring = assigned-name
[ rq-components ]
[ "#" f-component ]
assigned-name = "urn" ":" NID ":" NSS
NID = (alphanum) 0*30(ldh) (alphanum)
ldh = alphanum / "-"
NSS = pchar *(pchar / "/")
rq-components = [ "?+" r-component ]
[ "?=" q-component ]
r-component = pchar *( pchar / "/" / "?" )
q-component = pchar *( pchar / "/" / "?" )
f-component = fragment
The question mark character "?" can be used without percent-encoding inside r-components, q-components, and f-components. Other than inside those components, a "?" that is not immediately followed by "=" or "+" is not defined for URNs and SHOULD be treated as a syntax error by URN-specific parsers and other processors.
The following sections provide additional information about the syntactic elements of URNs.