Escape Sequences
================
Some characters cannot be included literally in string constants
(`"foo"') or regexp constants (`/foo/'). Instead, they should be
represented with "escape sequences", which are character sequences
beginning with a backslash (`\'). One use of an escape sequence is to
include a double quote character in a string constant. Because a plain
double quote ends the string, you must use `\"' to represent an actual
double quote character as a part of the string. For example:
$ awk 'BEGIN { print "He said \"hi!\" to her." }'
-| He said "hi!" to her.
The backslash character itself is another character that cannot be
included normally; you must write `\\' to put one backslash in the
string or regexp. Thus, the string whose contents are the two
characters `"' and `\' must be written `"\"\\"'.
Another use of backslash is to represent unprintable characters such
as tab or newline. While there is nothing to stop you from entering
most unprintable characters directly in a string constant or regexp
constant, they may look ugly.
The following table lists all the escape sequences used in `awk' and
what they represent. Unless noted otherwise, all these escape sequences
apply to both string constants and regexp constants:
`\\'
A literal backslash, `\'.
`\a'
The "alert" character, `Ctrl-g', ASCII code 7 (BEL). (This
usually makes some sort of audible noise.)
`\b'
Backspace, `Ctrl-h', ASCII code 8 (BS).
`\f'
Formfeed, `Ctrl-l', ASCII code 12 (FF).
`\n'
Newline, `Ctrl-j', ASCII code 10 (LF).
`\r'
Carriage return, `Ctrl-m', ASCII code 13 (CR).
`\t'
Horizontal tab, `Ctrl-i', ASCII code 9 (HT).
`\v'
Vertical tab, `Ctrl-k', ASCII code 11 (VT).
`\NNN'
The octal value NNN, where NNN stands for 1 to 3 digits between
`0' and `7'. For example, the code for the ASCII ESC (escape)
character is `\033'.
`\xHH...'
The hexadecimal value HH, where HH stands for a sequence of
hexadecimal digits (`0' through `9', and either `A' through `F' or
`a' through `f'). Like the same construct in ISO C, the escape
sequence continues until the first non-hexadecimal digit is seen.
However, using more than two hexadecimal digits produces undefined
results. (The `\x' escape sequence is not allowed in POSIX `awk'.)
`\/'
A literal slash (necessary for regexp constants only). This
expression is used when you want to write a regexp constant that
contains a slash. Because the regexp is delimited by slashes, you
need to escape the slash that is part of the pattern, in order to
tell `awk' to keep processing the rest of the regexp.
`\"'
A literal double quote (necessary for string constants only).
This expression is used when you want to write a string constant
that contains a double quote. Because the string is delimited by
double quotes, you need to escape the quote that is part of the
string, in order to tell `awk' to keep processing the rest of the
string.
In `gawk', a number of additional two-character sequences that begin
with a backslash have special meaning in regexps. Note:`gawk'-Specific Regexp Operators.
In a regexp, a backslash before any character that is not in the
above table and not listed in *Note `gawk'-Specific Regexp Operators:
GNU Regexp Operators, means that the next character should be taken
literally, even if it would normally be a regexp operator. For
example, `/a\+b/' matches the three characters `a+b'.
For complete portability, do not use a backslash before any
character not shown in the table above.
To summarize:
* The escape sequences in the table above are always processed first,
for both string constants and regexp constants. This happens very
early, as soon as `awk' reads your program.
* `gawk' processes both regexp constants and dynamic regexps (Note:Using Dynamic Regexps.), for the special
operators listed in Note:`gawk'-Specific Regexp Operators.
* A backslash before any other character means to treat that
character literally.
Advanced Notes: Backslash Before Regular Characters
---------------------------------------------------
If you place a backslash in a string constant before something that
is not one of the characters listed above, POSIX `awk' purposely leaves
what happens as undefined. There are two choices:
Strip the backslash out
This is what Unix `awk' and `gawk' both do. For example, `"a\qc"'
is the same as `"aqc"'. (Because this is such an easy bug to both
introduce and to miss, `gawk' warns you about it.) Consider `FS =
"[ \t]+\|[ \t]+"' to use vertical bars surrounded by whitespace as
the field separator. There should be two backslashes in the
string, `FS = "[ \t]+\\|[ \t]+"'.)
Leave the backslash alone
Some other `awk' implementations do this. In such
implementations, `"a\qc"' is the same as if you had typed
`"a\\qc"'.
Advanced Notes: Escape Sequences for Metacharacters
---------------------------------------------------
Suppose you use an octal or hexadecimal escape to represent a regexp
metacharacter (Note:Regular Expression Operators.).
Does `awk' treat the character as a literal character or as a regexp
operator?
Historically, such characters were taken literally. (d.c.)
However, the POSIX standard indicates that they should be treated as
real metacharacters, which is what `gawk' does. In compatibility mode
(Note:Command-Line Options.), `gawk' treats the characters
represented by octal and hexadecimal escape sequences literally when
used in regexp constants. Thus, `/a\52b/' is equivalent to `/a\*b/'.