Copyright (C) 2000-2012 |
GNU Info (librep.info)Regexp SyntaxRegular Expression Syntax ------------------------- The syntax of a regular expression is as follows (this is adapted from the manual page): A regular expression is zero or more "branches", separated by `|'. It matches anything that matches one of the branches. A branch is zero or more "pieces", concatenated. It matches a match for the first, followed by a match for the second, etc. A piece is an "atom" possibly followed by `*', `+', or `?'. An atom followed by `*' matches a sequence of 0 or more matches of the atom. An atom followed by `+' matches a sequence of 1 or more matches of the atom. An atom followed by `?' matches a match of the atom, or the null string. An atom is a regular expression in parentheses (matching a match for the regular expression), a "range" (see below), `.' (matching any single character), `^' (matching the null string at the beginning of the input string), `$' (matching the null string at the end of the input string), one of the strings `\s', `\S', `\w', `\W', `\d', `\D', `\b', `\B', or a `\' followed by a single character (matching that character), or a single character with no other significance (matching that character). A "range" is a sequence of characters enclosed in `[]'. It normally matches any single character from the sequence. If the sequence begins with `^', it matches any single character _not_ from the rest of the sequence. If two characters in the sequence are separated by `-', this is shorthand for the full list of ASCII characters between them (e.g. `[0-9]' matches any decimal digit). To include a literal `]' in the sequence, make it the first character (following a possible `^'). To include a literal `-', make it the first or last character. Also, any of the `*', `+' or `?' operators can be suffixed by a `?' character (i.e. `*?', `+?', `??'). The meaning of the operator remains the same but it becomes "non-greedy". This means that it will match the _smallest_ number of characters satisfying the regular expression, instead of the default behaviour which is to match the _largest_. The backslash-introduced atoms have the following meanings: `\s' Match any whitespace character. `\S' Match any non-whitespace character. `\w' Match any alphanumeric or underscore character. `\W' Match any non-(alphanumeric or underscore) character. `\d' Match any numeric character. `\D' Match any non-numeric character. `\b' Match the null string between two adjacent `\w' and `\W' characters (in any order). `\B' Match the null string that is not between two adjacent `\w' and `\W' characters. Some example legal regular expressions could be: `ab*a+b' Matches an `a' followed by zero or more `b' characters, followed by one or more `a' characters, followed by a `b'. For example, `aaab', `abbbab', etc... `(one|two)_three' Matches `one_three' or `two_three'. `^cmd_[0-9]+' `^cmd_\d+' Matches `cmd_' followed by one or more digits, it must start at the beginning of the line. automatically generated by info2www version 1.2.2.9 |