6. Motivation and Background
6. Motivation and Background
While regular expressions originally were intended to describe a formal language to support a Boolean matching function, they have been enhanced with parsing functions that support the extraction and replacement of arbitrary portions of the matched text. With this accretion of features, parsing-regexp libraries have become more susceptible to bugs and surprising performance degradations that can be exploited in denial-of-service attacks by an attacker who controls the regexp submitted for processing. I-Regexp is designed to offer interoperability and to be less vulnerable to such attacks, with the trade-off that its only function is to offer a Boolean response as to whether a character sequence is matched by a regexp.
6.1 Implementing I-Regexp
XSD regexps are relatively easy to implement or map to widely implemented parsing-regexp dialects, with these notable exceptions:
-
Character class subtraction. This is a very useful feature in many specifications, but it is unfortunately mostly absent from parsing-regexp dialects. Thus, it is omitted from I-Regexp.
-
Multi-character escapes.
\d,\w,\sand their uppercase complement classes exhibit a large amount of variation between regexp flavors. Thus, they are omitted from I-Regexp. -
Not all regexp implementations support access to Unicode tables that enable executing constructs such as
\p{Nd}, although the\p/\Pfeature in general is now quite widely available. While, in principle, it is possible to translate these into character-class matches, this also requires access to those tables. Thus, regexp libraries in severely constrained environments may not be able to support I-Regexp conformance.