Copyright (C) 2000-2012 |
GNU Info (elisp)Regexp BackslashBackslash Constructs in Regular Expressions ........................................... For the most part, `\' followed by any character matches only that character. However, there are several exceptions: certain two-character sequences starting with `\' that have special meanings. (The character after the `\' in such a sequence is always ordinary when used on its own.) Here is a table of the special `\' constructs. `\|' specifies an alternative. Two regular expressions A and B with `\|' in between form an expression that matches anything that either A or B matches. Thus, `foo\|bar' matches either `foo' or `bar' but no other string. `\|' applies to the largest possible surrounding expressions. Only a surrounding `\( ... \)' grouping can limit the grouping power of `\|'. Full backtracking capability exists to handle multiple uses of `\|', if you use the POSIX regular expression functions (Note: POSIX Regexps). `\{M\}' is a postfix operator that repeats the previous pattern exactly M times. Thus, `x\{5\}' matches the string `xxxxx' and nothing else. `c[ad]\{3\}r' matches string such as `caaar', `cdddr', `cadar', and so on. `\{M,N\}' is more general postfix operator that specifies repetition with a minimum of M repeats and a maximum of N repeats. If M is omitted, the minimum is 0; if N is omitted, there is no maximum. For example, `c[ad]\{1,2\}r' matches the strings `car', `cdr', `caar', `cadr', `cdar', and `cddr', and nothing else. `\{0,1\}' or `\{,1\}' is equivalent to `?'. `\{0,\}' or `\{,\}' is equivalent to `*'. `\{1,\}' is equivalent to `+'. `\( ... \)' is a grouping construct that serves three purposes: 1. To enclose a set of `\|' alternatives for other operations. Thus, the regular expression `\(foo\|bar\)x' matches either `foox' or `barx'. 2. To enclose a complicated expression for the postfix operators `*', `+' and `?' to operate on. Thus, `ba\(na\)*' matches `ba', `bana', `banana', `bananana', etc., with any number (zero or more) of `na' strings. 3. To record a matched substring for future reference with `\DIGIT' (see below). This last application is not a consequence of the idea of a parenthetical grouping; it is a separate feature that was assigned as a second meaning to the same `\( ... \)' construct because, in pratice, there was usually no conflict between the two meanings. But occasionally there is a conflict, and that led to the introduction of shy groups. `\(?: ... \)' is the "shy group" construct. A shy group serves the first two purposes of an ordinary group (controlling the nesting of other operators), but it does not get a number, so you cannot refer back to its value with `\DIGIT'. Shy groups are particulary useful for mechanically-constructed regular expressions because they can be added automatically without altering the numbering of any ordinary, non-shy groups. `\DIGIT' matches the same text that matched the DIGITth occurrence of a grouping (`\( ... \)') construct. In other words, after the end of a group, the matcher remembers the beginning and end of the text matched by that group. Later on in the regular expression you can use `\' followed by DIGIT to match that same text, whatever it may have been. The strings matching the first nine grouping constructs appearing in the entire regular expression passed to a search or matching function are assigned numbers 1 through 9 in the order that the open parentheses appear in the regular expression. So you can use `\1' through `\9' to refer to the text matched by the corresponding grouping constructs. For example, `\(.*\)\1' matches any newline-free string that is composed of two identical halves. The `\(.*\)' matches the first half, which may be anything, but the `\1' that follows must match the same exact text. If a particular grouping construct in the regular expression was never matched--for instance, if it appears inside of an alternative that wasn't used, or inside of a repetition that repeated zero times--then the corresponding `\DIGIT' construct never matches anything. To use an artificial example,, `\(foo\(b*\)\|lose\)\2' cannot match `lose': the second alternative inside the larger group matches it, but then `\2' is undefined and can't match anything. But it can match `foobb', because the first alternative matches `foob' and `\2' matches `b'. `\w' matches any word-constituent character. The editor syntax table determines which characters these are. Note: Syntax Tables. `\W' matches any character that is not a word constituent. `\sCODE' matches any character whose syntax is CODE. Here CODE is a character that represents a syntax code: thus, `w' for word constituent, `-' for whitespace, `(' for open parenthesis, etc. To represent whitespace syntax, use either `-' or a space character. Note: Syntax Class Table, for a list of syntax codes and the characters that stand for them. `\SCODE' matches any character whose syntax is not CODE. `\cC' matches any character whose category is C. Here C is a character that represents a category: thus, `c' for Chinese characters or `g' for Greek characters in the standard category table. `\CC' matches any character whose category is not C. The following regular expression constructs match the empty string--that is, they don't use up any characters--but whether they match depends on the context. `\`' matches the empty string, but only at the beginning of the buffer or string being matched against. `\'' matches the empty string, but only at the end of the buffer or string being matched against. `\=' matches the empty string, but only at point. (This construct is not defined when matching against a string.) `\b' matches the empty string, but only at the beginning or end of a word. Thus, `\bfoo\b' matches any occurrence of `foo' as a separate word. `\bballs?\b' matches `ball' or `balls' as a separate word. `\b' matches at the beginning or end of the buffer regardless of what text appears next to it. `\B' matches the empty string, but _not_ at the beginning or end of a word. `\<' matches the empty string, but only at the beginning of a word. `\<' matches at the beginning of the buffer only if a word-constituent character follows. `\>' matches the empty string, but only at the end of a word. `\>' matches at the end of the buffer only if the contents end with a word-constituent character. Not every string is a valid regular expression. For example, a string with unbalanced square brackets is invalid (with a few exceptions, such as `[]]'), and so is a string that ends with a single `\'. If an invalid regular expression is passed to any of the search functions, an `invalid-regexp' error is signaled. automatically generated by info2www version 1.2.2.9 |