GNU Info

Info Node: (elisp)Regexp Backslash

(elisp)Regexp Backslash


Prev: Char Classes Up: Syntax of Regexps
Enter node , (file) or (file)node

Backslash Constructs in Regular Expressions
...........................................

   For the most part, `\' followed by any character matches only that
character.  However, there are several exceptions: certain
two-character sequences starting with `\' that have special meanings.
(The character after the `\' in such a sequence is always ordinary when
used on its own.)  Here is a table of the special `\' constructs.

`\|'
     specifies an alternative.  Two regular expressions A and B with
     `\|' in between form an expression that matches anything that
     either A or B matches.

     Thus, `foo\|bar' matches either `foo' or `bar' but no other string.

     `\|' applies to the largest possible surrounding expressions.
     Only a surrounding `\( ... \)' grouping can limit the grouping
     power of `\|'.

     Full backtracking capability exists to handle multiple uses of
     `\|', if you use the POSIX regular expression functions (Note:
     POSIX Regexps).

`\{M\}'
     is a postfix operator that repeats the previous pattern exactly M
     times.  Thus, `x\{5\}' matches the string `xxxxx' and nothing
     else.  `c[ad]\{3\}r' matches string such as `caaar', `cdddr',
     `cadar', and so on.

`\{M,N\}'
     is more general postfix operator that specifies repetition with a
     minimum of M repeats and a maximum of N repeats.  If M is omitted,
     the minimum is 0; if N is omitted, there is no maximum.

     For example, `c[ad]\{1,2\}r' matches the strings `car', `cdr',
     `caar', `cadr', `cdar', and `cddr', and nothing else.
     `\{0,1\}' or `\{,1\}' is equivalent to `?'.	
     `\{0,\}' or `\{,\}' is equivalent to `*'.	
     `\{1,\}' is equivalent to `+'.

`\( ... \)'
     is a grouping construct that serves three purposes:

       1. To enclose a set of `\|' alternatives for other operations.
          Thus, the regular expression `\(foo\|bar\)x' matches either
          `foox' or `barx'.

       2. To enclose a complicated expression for the postfix operators
          `*', `+' and `?' to operate on.  Thus, `ba\(na\)*' matches
          `ba', `bana', `banana', `bananana', etc., with any number
          (zero or more) of `na' strings.

       3. To record a matched substring for future reference with
          `\DIGIT' (see below).

     This last application is not a consequence of the idea of a
     parenthetical grouping; it is a separate feature that was assigned
     as a second meaning to the same `\( ... \)' construct because, in
     pratice, there was usually no conflict between the two meanings.
     But occasionally there is a conflict, and that led to the
     introduction of shy groups.

`\(?: ... \)'
     is the "shy group" construct.  A shy group serves the first two
     purposes of an ordinary group (controlling the nesting of other
     operators), but it does not get a number, so you cannot refer back
     to its value with `\DIGIT'.

     Shy groups are particulary useful for mechanically-constructed
     regular expressions because they can be added automatically
     without altering the numbering of any ordinary, non-shy groups.

`\DIGIT'
     matches the same text that matched the DIGITth occurrence of a
     grouping (`\( ... \)') construct.

     In other words, after the end of a group, the matcher remembers the
     beginning and end of the text matched by that group.  Later on in
     the regular expression you can use `\' followed by DIGIT to match
     that same text, whatever it may have been.

     The strings matching the first nine grouping constructs appearing
     in the entire regular expression passed to a search or matching
     function are assigned numbers 1 through 9 in the order that the
     open parentheses appear in the regular expression.  So you can use
     `\1' through `\9' to refer to the text matched by the
     corresponding grouping constructs.

     For example, `\(.*\)\1' matches any newline-free string that is
     composed of two identical halves.  The `\(.*\)' matches the first
     half, which may be anything, but the `\1' that follows must match
     the same exact text.

     If a particular grouping construct in the regular expression was
     never matched--for instance, if it appears inside of an
     alternative that wasn't used, or inside of a repetition that
     repeated zero times--then the corresponding `\DIGIT' construct
     never matches anything.  To use an artificial example,,
     `\(foo\(b*\)\|lose\)\2' cannot match `lose': the second
     alternative inside the larger group matches it, but then `\2' is
     undefined and can't match anything.  But it can match `foobb',
     because the first alternative matches `foob' and `\2' matches `b'.

`\w'
     matches any word-constituent character.  The editor syntax table
     determines which characters these are.  Note: Syntax Tables.

`\W'
     matches any character that is not a word constituent.

`\sCODE'
     matches any character whose syntax is CODE.  Here CODE is a
     character that represents a syntax code: thus, `w' for word
     constituent, `-' for whitespace, `(' for open parenthesis, etc.
     To represent whitespace syntax, use either `-' or a space
     character.  Note: Syntax Class Table, for a list of syntax codes
     and the characters that stand for them.

`\SCODE'
     matches any character whose syntax is not CODE.

`\cC'
     matches any character whose category is C.  Here C is a character
     that represents a category: thus, `c' for Chinese characters or
     `g' for Greek characters in the standard category table.

`\CC'
     matches any character whose category is not C.

   The following regular expression constructs match the empty
string--that is, they don't use up any characters--but whether they
match depends on the context.

`\`'
     matches the empty string, but only at the beginning of the buffer
     or string being matched against.

`\''
     matches the empty string, but only at the end of the buffer or
     string being matched against.

`\='
     matches the empty string, but only at point.  (This construct is
     not defined when matching against a string.)

`\b'
     matches the empty string, but only at the beginning or end of a
     word.  Thus, `\bfoo\b' matches any occurrence of `foo' as a
     separate word.  `\bballs?\b' matches `ball' or `balls' as a
     separate word.

     `\b' matches at the beginning or end of the buffer regardless of
     what text appears next to it.

`\B'
     matches the empty string, but _not_ at the beginning or end of a
     word.

`\<'
     matches the empty string, but only at the beginning of a word.
     `\<' matches at the beginning of the buffer only if a
     word-constituent character follows.

`\>'
     matches the empty string, but only at the end of a word.  `\>'
     matches at the end of the buffer only if the contents end with a
     word-constituent character.

   Not every string is a valid regular expression.  For example, a
string with unbalanced square brackets is invalid (with a few
exceptions, such as `[]]'), and so is a string that ends with a single
`\'.  If an invalid regular expression is passed to any of the search
functions, an `invalid-regexp' error is signaled.


automatically generated by info2www version 1.2.2.9