Software:Comparison of regular expression engines
From HandWiki
Short description: None
This is a comparison of regular expression engines.
Libraries
Name | Official website | Programming language | Software license | Used by |
---|---|---|---|---|
Boost.Regex[Note 1] | Boost C++ Libraries | C++ | Boost | Notepad++ >= 6.0.0, EmEditor |
Boost Xpressive | Boost C++ Libraries | C++ | Boost | |
DEELX | RegExLab | C++ | Proprietary | |
FREJ[Note 2] | Fuzzy Regular Expressions for Java | Java | LGPL | |
GLib/GRegex[Note 3] | GLib reference manual | C | LGPL | |
GNU regex | Gnulib reference manual | C | LGPL | GNU libc, GNU programs |
GRETA | Microsoft Research | C++ | Proprietary | |
Gregex | Grovf Inc. | RTL, HLS | Proprietary | FPGA accelerated >100Gbit/s regex engine for cybersecurity, financial, e-commerce industries. |
Hyperscan | Intel | C, x86-specific assembly (SSSE3+[1]) | 3-clause BSD | Rspamd |
ICU | International Components for Unicode | C, C++[Note 4] | ICU | Foundation (Apple and Swift open-source versions) |
Jakarta Regexp | The Apache Jakarta Project | Java | Apache | |
java.util.regex | Java's User manual | Java | GNU GPLv2 with Classpath exception | jEdit |
JRegex | JRegex | Java | BSD | |
MATLAB | Regular Expressions | MATLAB Language | Proprietary | |
Oniguruma | Kosako | C | BSD | Atom, Take Command Console, Tera Term, TextMate, Sublime Text, SubEthaEdit, EmEditor and jq |
Pattwo | Stevesoft | Java (compatible with Java 1.0) | LGPL | |
PCRE | pcre.org | C, C++[Note 5] | BSD | Apache HTTP Server, Nginx, BBEdit, Edbrowse, Julia, HHVM, Notepad++ < 6.0.0, PHP, Delphi, R, Exim SWI-Prolog |
Qt/QRegExp | Digia | C++ | Qt GNU GPL v. 3.0, | Kate, Kile |
regex - Henry Spencer's regular expression libraries | ArgList | C | BSD | |
RE2 | RE2 | C++ | BSD | Go, Google Sheets, Gmail, G Suite |
Henry Spencer's Advanced Regular Expressions | Tcl | C | BSD | |
RGX | RGX | C++ based component library | P6R | |
RXP | Titan IC | RTL | Proprietary | hardware-accelerated search acceleration using RegEx available for ASIC, FPGA and cloud. Enables massively parallel content processing at ultra-high speeds. |
SubReg | Matt Bucknall | C | MIT | |
TPerlRegEx | TPerlRegEx VCL Component | Object Pascal | MPLv1.1 | |
TRE[Note 2] | Ville Laurikari | C | BSD | musl |
TRegExpr | TRegExpr, documentation, | Object Pascal | Dual-license: freeware, or LGPL with static linking exception | Total Commander |
Wolfram Language (Mathematica) | Wolfram Language Documentation Center | Wolfram Language | Proprietary | Mathematica, the Wolfram Development Platform |
XRegExp | XRegExp | JavaScript | MIT |
Languages
Language | Official website | Software license | Remarks |
---|---|---|---|
ActionScript 3 | ActionScript Technology Center | Free | |
APL (APLX, Dyalog, GNU) | APL Wiki | Licensed by the respective implementation | ⎕SS (PCRE), ⎕R /⎕S (PCRE), ⎕SS (PCRE2), respectively
|
C++11 (C++) | C++ standards website | Licensed by the respective implementation | Since ISO14822:2011(e), similar to ECMAScript on default (Grammar Description) |
D | D | Boost Software License[Note 1] | |
Free Pascal (Object Pascal) | freepascal.org | LGPL with static linking exception | Free Pascal 2.6+ ships with TRegExpr from Sorokin and two other regular expression libraries; See wiki.lazarus.freepascal.org/Regexpr. |
Go | Golang.org | BSD-style | |
Haskell | Haskell.org | BSD3 | Omitted in the language report, and in GHC's Hierarchical Libraries |
Java | Java | GNU General Public License | REs are written as strings in source code: all backslashes must be doubled, harming readability. |
JavaScript (ECMAScript) | ECMA-262 | BSD3 | Limited but REs are first-class citizens of the language with a specific /.../mod syntax.
|
Julia | JuliaLang.org | MIT License | REs are part of the language core library using PCRE built-in and an optional wrapper for (C code) ICU is available. |
Lua | Lua.org | MIT License | Uses simplified, limited dialect; can be bound to more powerful library, like PCRE or an alternative parser like LPeg. |
Mathematica | Wolfram | Proprietary | |
.NET | MSDN | MIT License[Note 2][Note 3] | |
Nim | nim-lang.org | MIT License | Standard library includes PCRE-based re and nre modules, as well as various alternatives (ex. strutils, pegs (Parsing Expression Grammar matching), strscans, parseutils, etc.). |
OCaml | Caml | LGPL | (As of 2010), the standard module is generally regarded as deprecated;[2] often recommended libraries are pcre (with full support for PCRE) and re (which is not as complete but claims better performance and provides frontends to popular syntaxes: PCRE, Perl, Posix, Emacs, shell globbing). |
Perl | Perl.com | Artistic License, or GNU General Public License | Full, central part of the language |
PHP | PHP.net | PHP License | Has two implementations, with PCRE being the more efficient in speed, functions |
POSIX C (C) | POSIX.1 web publication | Licensed by the respective implementation | Supports POSIX BRE and ERE syntax |
Python | python.org | Python Software Foundation License | Python has two major implementations, the built in re and the regex library. |
Ruby | ruby-doc.org | GNU Library General Public License | Ruby 1.8, Ruby 1.9, and Ruby 2.0 and later versions use different engines; Ruby 1.9 integrates Oniguruma, Ruby 2.0 and later integrate Onigmo, a fork from Oniguruma. |
Rust | docs.rs | MIT License | The primary regex crate does not allow look-around expressions. There is an Oniguruma binding called onig that does. |
SAP ABAP | SAP.com | Proprietary | |
Tcl | tcl.tk | Tcl/Tk License (BSD-style) |
Tcl library doubles as a regular expression library. |
Wolfram Language | Wolfram Research | Proprietary: usable for free on a limited scale on the Wolfram Development platform | |
XML Schema | W3C | Licensed by the respective implementation | |
XPath 3/XQuery | W3C | Licensed by the respective implementation |
- ↑ "STD.regex - D Programming Language - Digital Mars". http://www.digitalmars.com/d/2.0/phobos/std_regex.html.
- ↑ "Dotnet/Corefx". 16 February 2022. https://github.com/dotnet/corefx/blob/7116584186f8f3a886616aaf8cb5d4a982c60e27/src/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs#L2.
- ↑ "Dotnet/Corefx". 16 February 2022. https://github.com/dotnet/corefx#license.
Language features
NOTE: An application using a library for regular expression support does not necessarily support the full set of features of the library, e.g., GNU grep uses PCRE, but supports no lookahead, though PCRE does.
Part 1
"+" quantifier | Negated character classes | Non-greedy quantifiers [Note 1] |
Shy groups [Note 2] |
Recursion | Look-ahead | Look-behind | Backreferences [Note 3] |
>9 indexable captures | |
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | Yes | Yes | Yes | Yes | Yes[Note 4] | Yes | Yes | Yes | Yes |
Boost.Xpressive | Yes | Yes | Yes | Yes | Yes[Note 5] | Yes | Yes | Yes | Yes |
CL-PPCRE | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
EmEditor | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
FREJ | No[Note 6] | No | Some[Note 6] | Yes | No | No | No | Yes | Yes |
GLib/GRegex | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
GNU grep | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | N/A |
Haskell | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
RXP | Yes | Yes | Yes | Yes | No | No | No | Yes | Yes |
ICU Regex | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Java | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
JavaScript (ECMAScript) | Yes | Yes | Yes | Yes | No | Yes | Yes[Note 7] | Yes | Yes |
JGsoft | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Lua | Yes | Yes | Some[Note 8] | No | No | No | No | Yes | No |
.NET | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
OCaml | Yes | Yes | No | No | No | No | No | Yes | No |
PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
PHP | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Python | Yes | Yes | Yes | Yes | Yes[Note 9] | Yes | Yes | Yes | Yes |
Qt/QRegExp | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes |
RE2 | Yes | Yes | Yes | Yes | No | No | No | No | Yes |
Ruby, Onigmo | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
TRE | Yes | Yes | Yes | Yes | No | No | No | Yes | No |
Vim Template:Latest preview release/Vim | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
RGX | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Tcl | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
TRegExpr | Yes | ? | Yes | ? | ? | ? | ? | ? | ? |
XML Schema | Yes | Yes | No | N/A | No | No | No | No | N/A |
XPath 3/XQuery | Yes | Yes | Yes | Yes | No | No | No | Yes | Yes |
XRegExp | Yes | Yes | Yes | Yes | No | Yes | Yes[Note 7] | Yes | Yes |
- ↑ Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all.
- ↑ Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the group's content does not need to be accessed later.
- ↑ Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab".
- ↑ "Perl Regular Expression Syntax - 1.47.0". http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions.
- ↑ "User's Guide - 1.47.0". http://www.boost.org/doc/libs/1_47_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference.
- ↑ 6.0 6.1 FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier.
- ↑ 7.0 7.1 As of ES2018
- ↑ Lua's only non-greedy quantifier is
-
, which is a non-greedy version of*
. It does not have non-greedy versions of+
or?
; in the former case, the non-greedy effect can be achieved by repeating the token followed by-
, but in the latter case, there is no equivalent. - ↑ Supported by the optional regex library only.
Part 2
Directives [Note 1] |
Conditionals | Atomic groups [Note 2] |
Named capture [Note 3] |
Comments | Embedded code | Unicode property support [3] | Balancing groups [Note 4] |
Variable-length look-behinds [Note 5] | |
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | No | No |
Boost.Xpressive | Yes | No | Yes | Yes | Yes | No | No | No | No |
CL-PPCRE | Yes | Yes | Yes | Yes | Yes | Yes | Some[Note 6] | No | No |
EmEditor | Yes | Yes | ? | ? | Yes | No | ? | No | No |
FREJ | No | No | Yes | Yes | Yes | No | ? | No | No |
GLib/GRegex | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | No | No |
GNU grep | Yes | Yes | ? | Yes | Yes | No | No | No | No |
Haskell | ? | ? | ? | ? | ? | No | No | No | No |
RXP | Yes | Yes | No | Yes | Yes | No | No | No | No |
ICU Regex | Yes | No | Yes | Yes[Note 7] | Yes | No | Yes | No | No |
Java | Yes | No | Yes | Yes[Note 8] | Yes | No | Some[Note 6] | No | No |
JavaScript (ECMAScript) | No | No | No | Yes | No | No | Some[Note 6][Note 9][4] | No | Yes |
JGsoft | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | No | Yes |
Lua | No | No | No | No | No | No | No | No | No |
.NET | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | Yes | Yes |
OCaml | No | No | No | No | No | No | No | No | No |
PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No |
Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No[Note 10] |
PHP | Yes | Yes | Yes | Yes | Yes | No | No | No | No |
Python | Yes | Yes | Yes[Note 11] | Yes | Yes | No | Yes[Note 12] | No | Yes[Note 13] |
Qt/QRegExp | No | No | No | No | No | No | No | No | No |
RE2 | Yes | No | ? | Yes | No | No | Some[Note 6] | No | No |
Ruby, Onigmo | Yes | Yes | Yes | Yes | Yes | No | Some[Note 6] | No | No |
Tcl | Yes | No | Yes | No | Yes | No | Yes | No | No |
TRE | Yes | No | No | No | Yes | No | ? | No | No |
Vim | Yes | No | Yes | No | No | No | No | No | Yes |
RGX | Yes | Yes | Yes | Yes | Yes | No | Yes | No | No |
XML Schema | No | No | No | No | No | No | Yes | No | No |
XPath 3/XQuery | No | No | No | No | No | No | Yes | No | No |
XRegExp | Leading only | No | No | Yes | Yes | No | Yes | No | Yes |
- ↑ Also known as flags modifiers, modes modifiers or option letters. Example pattern: "(?i:test)".
- ↑ Also called independent sub-expressions.
- ↑ Similar to back references, but with names instead of indices.
- ↑ Special feature allowing to match balanced constructs without recursion.
- ↑ Refers to the possibility of including quantifiers in look-behinds, thus making their length unpredictable.
- ↑ 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply.
- ↑ Available as of ICU55.
- ↑ Available as of JDK7.
- ↑ The support and range of properties is dependent on implementation.
- ↑ Experimental support added in v5.29.9.
- ↑ Supported by Python v3.11 and later, and the optional regex library only.
- ↑ May only be available in the regex library when used with Python versions after 3.3.
- ↑ Supported by the optional regex library only.
API features
Native UTF-16 support[Note 1] | Native UTF-8 support[Note 1] | Multi-line matching | Partial match[Note 2] | |
---|---|---|---|---|
Boost.Regex | No | No | Yes | Yes |
GLib/GRegex | Yes | Yes | Yes | Yes |
RXP | Yes | Yes | No | Yes |
ICU Regex | Yes | No | Yes | ? |
Java | Yes[Note 3] | Yes[Note 3] | Yes | Yes |
.NET | No[Note 4] | Yes | Yes | ? |
PCRE | Yes[Note 5] | Yes | Yes | Yes |
Qt/QRegExp | Yes | No | No | Yes[Note 6] |
Qt/QRegularExpression | Yes | Yes | Yes | Yes |
Tcl | Yes | Yes[Note 7] | Yes | ? |
TRE | Yes | Yes | Yes | ? |
RGX | No | No | Yes | ? |
wxWidgets::wxRegEx[Note 8] | Yes | Yes | Yes | ? |
XRegExp | Yes | Yes | Yes | No |
- ↑ 1.0 1.1 Means the format can be used internally without explicit conversion.
- ↑ Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully.[1].
- ↑ 3.0 3.1 Supports Unicode 15.0 standard from 2023.[2].
- ↑ Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010.[3].
- ↑ Since version 8.30.
- ↑ Partial matching is performed implicitly, requiring a separate call to matchedLength() if an exact match fails.
- ↑ Tcl includes facilities to convert to and from UTF-8.
- ↑ wxRegEx uses any system supplied POSIX library or if not available and for Unicode mode uses Henry Spencer's library.
See also
References
- ↑ "Getting Started – Hyperscan 5.4.0 documentation". https://intel.github.io/hyperscan/dev-reference/getting_started.html#requirements.
- ↑ "Regex - Regular Expressions in OCaml". https://stackoverflow.com/questions/3221067#comment3323649_3221067.
- ↑ "UTS #18: Unicode Regular Expressions". https://www.unicode.org/reports/tr18/.
- ↑ "ECMA-262, 9th edition, June 2018 ECMAScript® 2018 Language Specification". https://www.ecma-international.org/ecma-262/9.0/#sec-runtime-semantics-unicodematchproperty-p. Retrieved 4 August 2020.
External links
- Regular Expression Flavor Comparison – Detailed comparison of the most popular regular expression flavors
- Regexp Syntax Summary
- Online Regular Expression Testing – with support for Java, JavaScript, .Net, PHP, Python and Ruby
- Implementing Regular Expressions – series of articles by Russ Cox, author of RE2
- Regular Expression Engines
Original source: https://en.wikipedia.org/wiki/Comparison of regular expression engines.
Read more |