2.2. Namespace Specific String (NSS)

The NSS is a string, unique within a URN namespace, that is assigned and managed in a consistent way and that conforms to the definition of the relevant URN namespace. The combination of the NID (unique across the entire "urn" scheme) and the NSS (unique within the URN namespace) ensures that the resulting URN is globally unique.

The NSS as specified in this document allows several characters not permitted by earlier specifications (see Appendix B). In particular, the "/" character, which is now allowed, effectively makes it possible to encapsulate hierarchical names from non-URN identifier systems. For instance, consider the hypothetical example of a hierarchical identifier system in which the names take the form of a sequence of numbers separated by the "/" character, such as "1/406/47452/2". If the authority for such names were to use URNs, it would be natural to place the existing name in the NSS, resulting in URNs such as "urn:example:1/406/47452/2".

Those changes to the syntax for the NSS do not modify the encoding rules for URN namespaces that were defined in accordance with [RFC2141]. If any such URN namespace whose names are used outside of the URN context (i.e., in a non-URN identifier system) also allows the use of "/", "~", or "&" in the native form within that identifier system, then the encoding rules for that URN namespace are not changed by this specification.

Depending on the rules governing a non-URN identifier system and its associated URN namespace, names that are valid in that identifier system might contain characters that are not allowed by the "pchar" production referenced above (e.g., characters outside the ASCII range or, consistent with the restrictions in RFC 3986, the characters "/", "?", "#", "[", and "]"). While such a name might be valid within the non-URN identifier system, it is not a valid URN until it has been translated into an NSS that conforms to the rules of that particular URN namespace. In the case of URNs that are formed from names that exist separately in a non-URN identifier system, translation of a name from its "native" format to a URN format is accomplished by using the canonicalization and encoding methods defined for URNs in general or specific rules for that URN namespace. Software that is not aware of namespace-specific canonicalization and encoding rules MUST NOT construct URNs from the name in the non-URN identifier system.

In particular, with regard to characters outside the ASCII range, URNs that appear in protocols or that are passed between systems MUST use only Unicode characters encoded in UTF-8 and further encoded as required by RFC 3986. To the extent feasible and consistent with the requirements of names defined and standardized elsewhere, as well as the principles discussed in Section 1.2, the characters used to represent names SHOULD be restricted to either ASCII letters and digits or to the characters and syntax of some widely used models such as those of Internationalizing Domain Names in Applications (IDNA) [RFC5890], Preparation, Enforcement, and Comparison of Internationalized Strings (PRECIS) [RFC7613], or the Unicode Identifier and Pattern Syntax specification [UAX31].

In order to make URNs as stable and persistent as possible when protocols evolve and the environment around them changes, URN namespaces SHOULD NOT allow characters outside the ASCII range [RFC20] unless the nature of the particular URN namespace makes such characters necessary.