Copyright (C) 2000-2012 |
GNU Info (ed.info)Regular expressionsRegular expressions ******************* Regular expressions are patterns used in selecting text. For example, the `ed' command g/STRING/ prints all lines containing STRING. Regular expressions are also used by the `s' command for selecting old text to be replaced with new. In addition to a specifying string literals, regular expressions can represent classes of strings. Strings thus represented are said to be matched by the corresponding regular expression. If it is possible for a regular expression to match several strings in a line, then the left-most longest match is the one selected. The following symbols are used in constructing regular expressions: `C' Any character C not listed below, including `{', `}', `(', `)', `<' and `>', matches itself. `\C' Any backslash-escaped character C, other than `{', ``}', `(', `)', `<', `>', `b', `B', `w', `W', `+' and `?', matches itself. `.' Matches any single character. `[CHAR-CLASS]' Matches any single character in CHAR-CLASS. To include a `]' in CHAR-CLASS, it must be the first character. A range of characters may be specified by separating the end characters of the range with a `-', e.g., `a-z' specifies the lower case characters. The following literal expressions can also be used in CHAR-CLASS to specify sets of characters: [:alnum:] [:cntrl:] [:lower:] [:space:] [:alpha:] [:digit:] [:print:] [:upper:] [:blank:] [:graph:] [:punct:] [:xdigit:] If `-' appears as the first or last character of CHAR-CLASS, then it matches itself. All other characters in CHAR-CLASS match themselves. Patterns in CHAR-CLASS of the form: [.COL-ELM.] [=COL-ELM=] where COL-ELM is a "collating element" are interpreted according to `locale (5)' (not currently supported). See `regex (3)' for an explanation of these constructs. `[^CHAR-CLASS]' Matches any single character, other than newline, not in CHAR-CLASS. CHAR-CLASS is defined as above. `^' If `^' is the first character of a regular expression, then it anchors the regular expression to the beginning of a line. Otherwise, it matches itself. `$' If `$' is the last character of a regular expression, it anchors the regular expression to the end of a line. Otherwise, it matches itself. `\(RE\)' Defines a (possibly null) subexpression RE. Subexpressions may be nested. A subsequent backreference of the form `\N', where N is a number in the range [1,9], expands to the text matched by the Nth subexpression. For example, the regular expression `\(a.c\)\1' matches the string `abcabc', but not `abcadc'. Subexpressions are ordered relative to their left delimiter. `*' Matches the single character regular expression or subexpression immediately preceding it zero or more times. If `*' is the first character of a regular expression or subexpression, then it matches itself. The `*' operator sometimes yields unexpected results. For example, the regular expression `b*' matches the beginning of the string `abbb', as opposed to the substring `bbb', since a null match is the only left-most match. `\{N,M\}' `\{N,\}' `\{N\}' Matches the single character regular expression or subexpression immediately preceding it at least N and at most M times. If M is omitted, then it matches at least N times. If the comma is also omitted, then it matches exactly N times. If any of these forms occurs first in a regular expression or subexpression, then it is interpreted literally (i.e., the regular expression `\{2\}' matches the string `{2}', and so on). `\<' `\>' Anchors the single character regular expression or subexpression immediately following it to the beginning (in the case of `\<') or ending (in the case of `\>') of a "word", i.e., in ASCII, a maximal string of alphanumeric characters, including the underscore (_). The following extended operators are preceded by a backslash `\' to distinguish them from traditional `ed' syntax. `\`' `\'' Unconditionally matches the beginning `\`' or ending `\'' of a line. `\?' Optionally matches the single character regular expression or subexpression immediately preceding it. For example, the regular expression `a[bd]\?c' matches the strings `abc', `adc' and `ac'. If `\?' occurs at the beginning of a regular expressions or subexpression, then it matches a literal `?'. `\+' Matches the single character regular expression or subexpression immediately preceding it one or more times. So the regular expression `a+' is shorthand for `aa*'. If `\+' occurs at the beginning of a regular expression or subexpression, then it matches a literal `+'. `\b' Matches the beginning or ending (null string) of a word. Thus the regular expression `\bhello\b' is equivalent to `\<hello\>'. However, `\b\b' is a valid regular expression whereas `\<\>' is not. `\B' Matches (a null string) inside a word. `\w' Matches any character in a word. `\W' Matches any character not in a word. automatically generated by info2www version 1.2.2.9 |