What is the regular expression filtering option?



Regular expressions are a sequence of characters to perform powerful searching and matching. A regular express is known as a pattern and will match (or not) another string. Metacharacters modify the search to look for alternative or wildcard characters, or certain patterns within the target string.

In NetoM@il, regular expression matching is case-insensitive. A pattern involving no metacharacters will behave like a simple substring search.

Metacharacter Description
. Matches any single character.
[ ]

A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].

The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].

The . dot character matches a literal dot. For example, outside the bracket expression a.c matches "abc", "azc", etc., but [a.c] matches only "a", ".", or "c".

[^ ]

Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.

Note that this must match the non-character. [^a]bc will match "zbc" but not "bc".

^ Matches the starting position within the string.
$ Matches the ending position of the string or the position just before a string-ending newline.
( ) Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n). A marked subexpression is also called a block or capturing group.
\n Matches what the nth marked subexpression matched, where n is a digit from 1 to 9.
* Matches the preceding element zero or more times. For example, ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on.
{m,n} Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regular expressions.
? Matches the preceding element zero or one time. For example, ab?c matches only "ac" or "abc".
+ Matches the preceding element one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
| The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. For example, abc|def matches "abc" or "def".

Examples

  • .at matches any three-character string ending with "at", including "hat", "cat", and "bat".
  • [hc]at matches "hat" and "cat".
  • [^b]at matches all strings matched by .at except "bat".
  • [^hc]at matches all strings matched by .at other than "hat" and "cat".
  • ^[hc]at matches "hat" and "cat", but only at the beginning of the string or line.
  • [hc]at$ matches "hat" and "cat", but only at the end of the string or line.
  • \[.\] matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]" and "[b]".
  • [hc]+at matches "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on, but not "at".
  • [hc]?at matches "hat", "cat", and "at".
  • [hc]*at matches "hat", "cat", "hhat", "chat", "hcat", "cchchat", "at", and so on.
  • (cat|dog) matches "cat" or "dog".
  • colou?r matches both spellings of this word.
  • reali[sz]e matches both spellings of this word.
  • a(.)b\1c matches "abbbc", "aQbQc", etc.
  • ([^ ]+) +\1 will match double words in a piece of text such as "cat cat".

Where a "match" is within the string. This does not matter so much for our purposes, where abc and .*abc.* are equivalent. When using backreferences or replacements then the difference can become important.

There are more features of regular expressions that are beyond the scope of this article.


  • Last Modified: 18/11/2016