Regular Expressions

Regular Expressions

Regular expressions (regexps, REs) are a powerfull tool for creating templates which can be used for searching and comparing any symbols and strings (even the most complex) in a text.

How such template is composed? This is done with the help of special characters, metacharacters and character classes (sets). A regular expression is a simple string and any characters in it wich are not special (reserved) characters are treated as themselves.

Special characters are subdivided in three groups:

characters of the first group match a certain character class (set): e.g. expression "\w" matches any letter
characters of the second class, unlike the first one, don't have a length (so-called zero-width characters): e.g. "^" matches start of a line, "\b" matches a start of a word
third class includes operators. Operators are applied to metacharacters, common characters and to other operators.

One can group up any expression (by enclosing it in parentheses) and apply an operator to the entire group.

Syntax of regular expression in nnCron is the same as in Perl, with a few minor differences in certain extended operators.

Syntax

All regexps should be enclosed between forward slashes (/.../). Parameters may be placed after the ending slash:

/.../i - ignore case differences.

/.../x - ignore whitespace characters and newline characters (for convenience).

/.../s - treat the regular expression as a single line (regard the special character "." (period) as "any character, including newline character").

Examples:

\ matches only word 'Valery'
/Valery/
\ matches words  'VALERY', 'valery', 'Valery' etc.
/Valery/i 
\ matches  'foobar', 'foobar barfoo'
/foobar/
\ matches  'foobar', 'FOOBAR', 'foobar and two other foos'
/ FOO bar /ix
\ matches 'Valery%crlf%Kondakoff'
/Valery.*Kondakoff/s

All characters in a regular expression are consecutively (left to right) compared with the target string. All characters which are not special characters or operators listed below are treated as themselves and checked for a simple match.

Special Characters

^	Start of a line
$	End of a line
.	Any character except for newline character (if used without parameter " /.../s")
[ ... ]	Any character of set inside the brackets. Inside square brackets, other special characters won't work, but metacharacters can be used. You can use two characters with a hyphen between them to designate a range of characters: e.g. [a-f] would match any of the following characters: a, b, c, d, e, f.
[^ ... ]	None of the characters listed in square brackets. Inside of square brackets, other special characters won't work, but meta characters can be used. You can use a hyphen between two characters to designate a sequence of characters (inclusive) between these two: e.g. [^0-9] would match any character except 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
\#	This would match the character (#) following the backslash (except for characters a-z and 0-9). E.g. sequence "\\" would mean "\" , "\." would match character "." (stop), "\$" would match "$" etc...
\b	Start of word
\B	End of a word
\xNN	A character with ASCII hex code NN (\x20 would be a blank space, \x4A - J, \x6A - j, etc.)
\n	0x10 (lf)
\r	0x13 (cr)
\t	0x09 (tab)
\s	A whitespace character (tab/space /cr/lf)
\S	Not a whitespace character
\w	A word character (letters, digits, _)
\W	A non-word character
\d	A digit
\D	Not a digit
\u	An uppercase character
\l	A lowercase character

Examples:

\ matches word  'help' followed by a stop
/help\./
\ matches words 'cats', 'cars' etc.
/ca.s/
\ matches  'testing', 'tester', but not 'the test'
/^test/
\ matches 'see me', but not 'meter' or 'me and you'
/me$/
\ matches one of the vowels
/[aeiou]/
\ matches one letter or digit
/[a-z0-9]/
\ matches 'footer', 'footing', 'a foot', but not 'afoot'
/\bfoot/
\ matches 'afoot', 'foot.' (stop is not considered a part of the word)
\ doesn't  match 'footing'
/foot\B/
\ matches whole word 'foot' only
/\bfoot\B/
\ matches words 'q2w', 'r5t' etc.
/\D\d\D/

Extended Special Characters

Unlike ordinary special characters, extended ones are not compatible with Perl:

\N	Reference to a subpattern within the same regular expression, where N is the number of the required subpattern. This operator has certain limitations: it will only work if the referenced supbattern does not contain any repetition operators.

Example:

\ matches phrases 'man to man', '
\ hand to hand', '100 to 100' etc.
(\b\w+\B) to \1

Operators

Operators cannot be used by themselves, they should be preceded by a character or meta character which they affect. If an operator is preceded by some expression enclosed in parentheses, it affects the entire contents of parentheses

( ... ) Group characters into a single pattern and remember it

| Preceding or following pattern (logical OR)

* Zero or more times

+ One or more times

? 0 or 1 times the preceding pattern

{n} To repeat n times

{n,} To repeat n or more times

{n,m} To repeat n to m times

Examples:

\ matches words 'cat' or 'mouse'
/(cat)|(mouse)/

\ matches words 'dogs', 'doggie'
/dog(s|gie)/
\ matches 'ma', 'maaa', 'maaaaaaa'
/ma+/
\ matches 'm', 'maaa'
/ma*/
\ matches 'yada yada yada'
/(yada ){2,}/
\ matches 'fooandbar', 'foobar'
/foo(and)?bar/

If you add character "?" after an operator, this will turn a greedy operator into a non-greedy one. For example, a greedy "*" will become non-greedy when replaced by "*?". Greedy operators try to match as much a possible, and non-greedy ones match little as possible.

Extended Operators

?#N Lookbehind operator . N is the number of characters to look for.

?~N No lookbehind

?= Lookahead

?! No lookahead

Please note that although the last two operators exist in Perl, they are used there in this way: (?=foobar). In nnCron, this operator looks like this: (foobar)?=.

Examples:

\ matches any word followed by a tab character, 
\ the tab character itself not  included with the mathing characters
/\w+(\t)?=/
\ matches any instance of 'foo' which is not followed by 'bar'
/foo(bar)?!/
\ matches any instance of 'bar' not preceded by 'foo'
/(foo)?#3bar/

More examples:

\ matches "foobar", "bar"
/(foo)?bar/
\ matches only "foobar"
/^foobar$/
\ matches "foobar", "for", "far"
/f[obar]+r/
\ any number with a decimal point
/([\d\.])+/
\ matches "foofoofoobarfoobar", "bar"
/((foo)|(bar))+/

/.../i	- ignore case differences.
/.../x	- ignore whitespace characters and newline characters (for convenience).
/.../s	- treat the regular expression as a single line (regard the special character "." (period) as "any character, including newline character").

( ... )	Group characters into a single pattern and remember it
\|	Preceding or following pattern (logical OR)
*	Zero or more times
+	One or more times
?	0 or 1 times the preceding pattern
{n}	To repeat n times
{n,}	To repeat n or more times
{n,m}	To repeat n to m times

?#N	Lookbehind operator . N is the number of characters to look for.
?~N	No lookbehind
?=	Lookahead
?!	No lookahead