GNU Info

Info Node: (gawk.info)Case-sensitivity

(gawk.info)Case-sensitivity


Next: Leftmost Longest Prev: GNU Regexp Operators Up: Regexp
Enter node , (file) or (file)node

Case Sensitivity in Matching
============================

   Case is normally significant in regular expressions, both when
matching ordinary characters (i.e., not metacharacters) and inside
character sets.  Thus, a `w' in a regular expression matches only a
lowercase `w' and not an uppercase `W'.

   The simplest way to do a case-independent match is to use a character
list--for example, `[Ww]'.  However, this can be cumbersome if you need
to use it often and it can make the regular expressions harder to read.
There are two alternatives that you might prefer.

   One way to perform a case-insensitive match at a particular point in
the program is to convert the data to a single case, using the
`tolower' or `toupper' built-in string functions (which we haven't
discussed yet; Note: String Manipulation Functions.).
For example:

     tolower($1) ~ /foo/  { ... }

converts the first field to lowercase before matching against it.  This
works in any POSIX-compliant `awk'.

   Another method, specific to `gawk', is to set the variable
`IGNORECASE' to a nonzero value (Note: Built-in Variables).  When
`IGNORECASE' is not zero, _all_ regexp and string operations ignore
case.  Changing the value of `IGNORECASE' dynamically controls the case
sensitivity of the program as it runs.  Case is significant by default
because `IGNORECASE' (like most variables) is initialized to zero:

     x = "aB"
     if (x ~ /ab/) ...   # this test will fail
     
     IGNORECASE = 1
     if (x ~ /ab/) ...   # now it will succeed

   In general, you cannot use `IGNORECASE' to make certain rules
case-insensitive and other rules case-sensitive, because there is no
straightforward way to set `IGNORECASE' just for the pattern of a
particular rule.(1) To do this, use either character lists or
`tolower'.  However, one thing you can do with `IGNORECASE' only is
dynamically turn case-sensitivity on or off for all the rules at once.

   `IGNORECASE' can be set on the command line or in a `BEGIN' rule
(Note: Other Command-Line Arguments.; also Note:
Startup and Cleanup Actions.).  Setting `IGNORECASE'
from the command line is a way to make a program case-insensitive
without having to edit it.

   Prior to `gawk' 3.0, the value of `IGNORECASE' affected regexp
operations only. It did not affect string comparison with `==', `!=',
and so on.  Beginning with version 3.0, both regexp and string
comparison operations are also affected by `IGNORECASE'.

   Beginning with `gawk' 3.0, the equivalences between upper- and
lowercase characters are based on the ISO-8859-1 (ISO Latin-1)
character set. This character set is a superset of the traditional 128
ASCII characters, that also provides a number of characters suitable
for use with European languages.

   The value of `IGNORECASE' has no effect if `gawk' is in
compatibility mode (Note: Command-Line Options.).  Case is
always significant in compatibility mode.

   ---------- Footnotes ----------

   (1) Experienced C and C++ programmers will note that it is possible,
using something like `IGNORECASE = 1 && /foObAr/ { ... }' and
`IGNORECASE = 0 || /foobar/ { ... }'.  However, this is somewhat
obscure and we don't recommend it.


automatically generated by info2www version 1.2.2.9