GNU Info

Info Node: (gawk.info)GNU Regexp Operators

(gawk.info)GNU Regexp Operators


Next: Case-sensitivity Prev: Character Lists Up: Regexp
Enter node , (file) or (file)node

`gawk'-Specific Regexp Operators
================================

   GNU software that deals with regular expressions provides a number of
additional regexp operators.  These operators are described in this
minor node and are specific to `gawk'; they are not available in other
`awk' implementations.  Most of the additional operators deal with word
matching.  For our purposes, a "word" is a sequence of one or more
letters, digits, or underscores (`_'):

`\w'
     Matches any word-constituent character--that is, it matches any
     letter, digit, or underscore. Think of it as short-hand for
     `[[:alnum:]_]'.

`\W'
     Matches any character that is not word-constituent.  Think of it
     as short-hand for `[^[:alnum:]_]'.

`\<'
     Matches the empty string at the beginning of a word.  For example,
     `/\<away/' matches `away' but not `stowaway'.

`\>'
     Matches the empty string at the end of a word.  For example,
     `/stow\>/' matches `stow' but not `stowaway'.

`\y'
     Matches the empty string at either the beginning or the end of a
     word (i.e., the word boundar*y*).  For example, `\yballs?\y'
     matches either `ball' or `balls', as a separate word.

`\B'
     Matches the empty string that occurs between two word-constituent
     characters. For example, `/\Brat\B/' matches `crate' but it does
     not match `dirty rat'.  `\B' is essentially the opposite of `\y'.

   There are two other operators that work on buffers.  In Emacs, a
"buffer" is, naturally, an Emacs buffer.  For other programs, `gawk''s
regexp library routines consider the entire string to match as the
buffer.

`\`'
     Matches the empty string at the beginning of a buffer (string).

`\''
     Matches the empty string at the end of a buffer (string).

   Because `^' and `$' always work in terms of the beginning and end of
strings, these operators don't add any new capabilities for `awk'.
They are provided for compatibility with other GNU software.

   In other GNU software, the word-boundary operator is `\b'. However,
that conflicts with the `awk' language's definition of `\b' as
backspace, so `gawk' uses a different letter.  An alternative method
would have been to require two backslashes in the GNU operators, but
this was deemed too confusing. The current method of using `\y' for the
GNU `\b' appears to be the lesser of two evils.

   The various command-line options (*note Command-Line Options:
Options.)  control how `gawk' interprets characters in regexps:

No options
     In the default case, `gawk' provides all the facilities of POSIX
     regexps and the GNU regexp operators described in Note: Regular
     Expression Operators.  However, interval
     expressions are not supported.

`--posix'
     Only POSIX regexps are supported; the GNU operators are not special
     (e.g., `\w' matches a literal `w').  Interval expressions are
     allowed.

`--traditional'
     Traditional Unix `awk' regexps are matched. The GNU operators are
     not special, interval expressions are not available, nor are the
     POSIX character classes (`[[:alnum:]]' and so on).  Characters
     described by octal and hexadecimal escape sequences are treated
     literally, even if they represent regexp metacharacters.

`--re-interval'
     Allow interval expressions in regexps, even if `--traditional' has
     been provided.


automatically generated by info2www version 1.2.2.9