GNU Info

Info Node: (librep.info)Regexp Syntax

(librep.info)Regexp Syntax


Next: Regexp Functions Up: Regular Expressions
Enter node , (file) or (file)node

Regular Expression Syntax
-------------------------

   The syntax of a regular expression is as follows (this is adapted
from the manual page):

   A regular expression is zero or more "branches", separated by `|'.
It matches anything that matches one of the branches.

   A branch is zero or more "pieces", concatenated. It matches a match
for the first, followed by a match for the second, etc.

   A piece is an "atom" possibly followed by `*', `+', or `?'. An atom
followed by `*' matches a sequence of 0 or more matches of the atom. An
atom followed by `+' matches a sequence of 1 or more matches of the
atom. An atom followed by `?' matches a match of the atom, or the null
string.

   An atom is a regular expression in parentheses (matching a match for
the regular expression), a "range" (see below), `.' (matching any
single character), `^' (matching the null string at the beginning of
the input string), `$' (matching the null string at the end of the
input string), one of the strings `\s', `\S', `\w', `\W', `\d', `\D',
`\b', `\B', or a `\' followed by a single character (matching that
character), or a single character with no other significance (matching
that character).

   A "range" is a sequence of characters enclosed in `[]'. It normally
matches any single character from the sequence. If the sequence begins
with `^', it matches any single character _not_ from the rest of the
sequence. If two characters in the sequence are separated by `-', this
is shorthand for the full list of ASCII characters between them (e.g.
`[0-9]' matches any decimal digit). To include a literal `]' in the
sequence, make it the first character (following a possible `^'). To
include a literal `-', make it the first or last character.

   Also, any of the `*', `+' or `?' operators can be suffixed by a `?'
character (i.e. `*?', `+?', `??'). The meaning of the operator remains
the same but it becomes "non-greedy". This means that it will match the
_smallest_ number of characters satisfying the regular expression,
instead of the default behaviour which is to match the _largest_.

   The backslash-introduced atoms have the following meanings:

`\s'
     Match any whitespace character.

`\S'
     Match any non-whitespace character.

`\w'
     Match any alphanumeric or underscore character.

`\W'
     Match any non-(alphanumeric or underscore) character.

`\d'
     Match any numeric character.

`\D'
     Match any non-numeric character.

`\b'
     Match the null string between two adjacent `\w' and `\W'
     characters (in any order).

`\B'
     Match the null string that is not between two adjacent `\w' and
     `\W' characters.

Some example legal regular expressions could be:

`ab*a+b'
     Matches an `a' followed by zero or more `b' characters, followed by
     one or more `a' characters, followed by a `b'. For example,
     `aaab', `abbbab', etc...

`(one|two)_three'
     Matches `one_three' or `two_three'.

`^cmd_[0-9]+'
`^cmd_\d+'
     Matches `cmd_' followed by one or more digits, it must start at the
     beginning of the line.


automatically generated by info2www version 1.2.2.9