GNU Info

Info Node: (ed.info)Regular expressions

(ed.info)Regular expressions


Next: Commands Prev: Line addressing Up: Top
Enter node , (file) or (file)node

Regular expressions
*******************

   Regular expressions are patterns used in selecting text.  For
example, the `ed' command

     g/STRING/

prints all lines containing STRING.  Regular expressions are also used
by the `s' command for selecting old text to be replaced with new.

   In addition to a specifying string literals, regular expressions can
represent classes of strings.  Strings thus represented are said to be
matched by the corresponding regular expression.  If it is possible for
a regular expression to match several strings in a line, then the
left-most longest match is the one selected.

   The following symbols are used in constructing regular expressions:

`C'
     Any character C not listed below, including `{', `}', `(', `)',
     `<' and `>', matches itself.

`\C'
     Any backslash-escaped character C, other than `{', ``}', `(', `)',
     `<', `>', `b', `B', `w', `W', `+' and `?', matches itself.

`.'
     Matches any single character.

`[CHAR-CLASS]'
     Matches any single character in CHAR-CLASS.  To include a `]' in
     CHAR-CLASS, it must be the first character.  A range of characters
     may be specified by separating the end characters of the range
     with a `-', e.g., `a-z' specifies the lower case characters.  The
     following literal expressions can also be used in CHAR-CLASS to
     specify sets of characters:

          [:alnum:] [:cntrl:] [:lower:] [:space:]
          [:alpha:] [:digit:] [:print:] [:upper:]
          [:blank:] [:graph:] [:punct:] [:xdigit:]

     If `-' appears as the first or last character of CHAR-CLASS, then
     it matches itself.  All other characters in CHAR-CLASS match
     themselves.

     Patterns in CHAR-CLASS of the form:
          [.COL-ELM.]
          [=COL-ELM=]

     where COL-ELM is a "collating element" are interpreted according
     to `locale (5)' (not currently supported).  See `regex (3)' for an
     explanation of these constructs.

`[^CHAR-CLASS]'
     Matches any single character, other than newline, not in
     CHAR-CLASS.  CHAR-CLASS is defined as above.

`^'
     If `^' is the first character of a regular expression, then it
     anchors the regular expression to the beginning of a line.
     Otherwise, it matches itself.

`$'
     If `$' is the last character of a regular expression, it anchors
     the regular expression to the end of a line.  Otherwise, it matches
     itself.

`\(RE\)'
     Defines a (possibly null) subexpression RE.  Subexpressions may be
     nested.  A subsequent backreference of the form `\N', where N is a
     number in the range [1,9], expands to the text matched by the Nth
     subexpression. For example, the regular expression `\(a.c\)\1'
     matches the string `abcabc', but not `abcadc'.  Subexpressions are
     ordered relative to their left delimiter.

`*'
     Matches the single character regular expression or subexpression
     immediately preceding it zero or more times.  If `*' is the first
     character of a regular expression or subexpression, then it matches
     itself.  The `*' operator sometimes yields unexpected results.  For
     example, the regular expression `b*' matches the beginning of the
     string `abbb', as opposed to the substring `bbb', since a null
     match is the only left-most match.

`\{N,M\}'
`\{N,\}'
`\{N\}'
     Matches the single character regular expression or subexpression
     immediately preceding it at least N and at most M times.  If M is
     omitted, then it matches at least N times.  If the comma is also
     omitted, then it matches exactly N times.  If any of these forms
     occurs first in a regular expression or subexpression, then it is
     interpreted literally (i.e., the regular expression `\{2\}'
     matches the string `{2}', and so on).

`\<'
`\>'
     Anchors the single character regular expression or subexpression
     immediately following it to the beginning (in the case of `\<') or
     ending (in the case of `\>') of a "word", i.e., in ASCII, a
     maximal string of alphanumeric characters, including the
     underscore (_).

   The following extended operators are preceded by a backslash `\' to
distinguish them from traditional `ed' syntax.

`\`'
`\''
     Unconditionally matches the beginning `\`' or ending `\'' of a
     line.

`\?'
     Optionally matches the single character regular expression or
     subexpression immediately preceding it.  For example, the regular
     expression `a[bd]\?c' matches the strings `abc', `adc' and `ac'.
     If `\?' occurs at the beginning of a regular expressions or
     subexpression, then it matches a literal `?'.

`\+'
     Matches the single character regular expression or subexpression
     immediately preceding it one or more times.  So the regular
     expression `a+' is shorthand for `aa*'.  If `\+' occurs at the
     beginning of a regular expression or subexpression, then it
     matches a literal `+'.

`\b'
     Matches the beginning or ending (null string) of a word.  Thus the
     regular expression `\bhello\b' is equivalent to `\<hello\>'.
     However, `\b\b' is a valid regular expression whereas `\<\>' is
     not.

`\B'
     Matches (a null string) inside a word.

`\w'
     Matches any character in a word.

`\W'
     Matches any character not in a word.


automatically generated by info2www version 1.2.2.9