5. Mapping I-Regexp to Regexp Dialects
5. Mapping I-Regexp to Regexp Dialects
The material in this section is not normative; it is provided as guidance to developers who want to use I-Regexps in the context of other regular expression dialects.
5.1 Multi-Character Escapes
I-Regexp does not support common multi-character escapes (MCEs) and character classes built around them. These can usually be replaced as shown by the examples in Table 1.
| MCE/class: | Replace with: |
|---|---|
\S | [^ \t\n\r] |
[\S ] | [^\t\n\r] |
\d | [0-9] |
Table 1: Example Substitutes for Multi-Character Escapes
Note that the semantics of \d in XSD regular expressions is that of \p{Nd}; however, this would include all Unicode characters that are digits in various writing systems, which is almost certainly not what is required in IETF publications.
The construct \p{IsBasicLatin} is essentially a reference to legacy ASCII; it can be replaced by the character class [\u0000-\u007f].
5.2 XSD Regexps
Any I-Regexp is also an XSD regexp [XSD-2], so the mapping is an identity function.
Note that a few errata for [XSD-2] have been fixed in [XSD-1.1-2]; therefore, it is also included in the Normative References (Section 9.1). XSD 1.1 is less widely implemented than XSD 1.0, and implementations of XSD 1.0 are likely to include these bugfixes; for the intents and purposes of this specification, an implementation of XSD 1.0 regexps is equivalent to an implementation of XSD 1.1 regexps.
5.3 ECMAScript Regexps
Perform the following steps on an I-Regexp to obtain an ECMAScript regexp [ECMA-262]:
- For any unescaped dots (
.) outside character classes (first alternative of charClass production), replace the dot with[^\n\r]. - Envelope the result in
^(?:and)$.
The ECMAScript regexp is to be interpreted as a Unicode pattern ("u" flag; see Section 21.2.2 "Pattern Semantics" of [ECMA-262]).
Note that where a regexp literal is required, the actual regexp needs to be enclosed in /.
5.4 PCRE, RE2, and Ruby Regexps
To obtain a valid regexp in Perl Compatible Regular Expressions (PCRE) [PCRE2], the Go programming language's RE2 regexp library [RE2], and the Ruby programming language, perform the same steps as in Section 5.3, except that the last step is:
- Enclose the regexp in
\A(?:and)\z.