Regular Expression Syntax
-------------------------
The syntax of a regular expression is as follows (this is adapted
from the manual page):
A regular expression is zero or more "branches", separated by `|'.
It matches anything that matches one of the branches.
A branch is zero or more "pieces", concatenated. It matches a match
for the first, followed by a match for the second, etc.
A piece is an "atom" possibly followed by `*', `+', or `?'. An atom
followed by `*' matches a sequence of 0 or more matches of the atom. An
atom followed by `+' matches a sequence of 1 or more matches of the
atom. An atom followed by `?' matches a match of the atom, or the null
string.
An atom is a regular expression in parentheses (matching a match for
the regular expression), a "range" (see below), `.' (matching any
single character), `^' (matching the null string at the beginning of
the input string), `$' (matching the null string at the end of the
input string), one of the strings `\s', `\S', `\w', `\W', `\d', `\D',
`\b', `\B', or a `\' followed by a single character (matching that
character), or a single character with no other significance (matching
that character).
A "range" is a sequence of characters enclosed in `[]'. It normally
matches any single character from the sequence. If the sequence begins
with `^', it matches any single character _not_ from the rest of the
sequence. If two characters in the sequence are separated by `-', this
is shorthand for the full list of ASCII characters between them (e.g.
`[0-9]' matches any decimal digit). To include a literal `]' in the
sequence, make it the first character (following a possible `^'). To
include a literal `-', make it the first or last character.
Also, any of the `*', `+' or `?' operators can be suffixed by a `?'
character (i.e. `*?', `+?', `??'). The meaning of the operator remains
the same but it becomes "non-greedy". This means that it will match the
_smallest_ number of characters satisfying the regular expression,
instead of the default behaviour which is to match the _largest_.
The backslash-introduced atoms have the following meanings:
`\s'
Match any whitespace character.
`\S'
Match any non-whitespace character.
`\w'
Match any alphanumeric or underscore character.
`\W'
Match any non-(alphanumeric or underscore) character.
`\d'
Match any numeric character.
`\D'
Match any non-numeric character.
`\b'
Match the null string between two adjacent `\w' and `\W'
characters (in any order).
`\B'
Match the null string that is not between two adjacent `\w' and
`\W' characters.
Some example legal regular expressions could be:
`ab*a+b'
Matches an `a' followed by zero or more `b' characters, followed by
one or more `a' characters, followed by a `b'. For example,
`aaab', `abbbab', etc...
`(one|two)_three'
Matches `one_three' or `two_three'.
`^cmd_[0-9]+'
`^cmd_\d+'
Matches `cmd_' followed by one or more digits, it must start at the
beginning of the line.