GNU Info

Info Node: (gawk.info)Escape Sequences

(gawk.info)Escape Sequences


Next: Regexp Operators Prev: Regexp Usage Up: Regexp
Enter node , (file) or (file)node

Escape Sequences
================

   Some characters cannot be included literally in string constants
(`"foo"') or regexp constants (`/foo/').  Instead, they should be
represented with "escape sequences", which are character sequences
beginning with a backslash (`\').  One use of an escape sequence is to
include a double quote character in a string constant.  Because a plain
double quote ends the string, you must use `\"' to represent an actual
double quote character as a part of the string.  For example:

     $ awk 'BEGIN { print "He said \"hi!\" to her." }'
     -| He said "hi!" to her.

   The  backslash character itself is another character that cannot be
included normally; you must write `\\' to put one backslash in the
string or regexp.  Thus, the string whose contents are the two
characters `"' and `\' must be written `"\"\\"'.

   Another use of backslash is to represent unprintable characters such
as tab or newline.  While there is nothing to stop you from entering
most unprintable characters directly in a string constant or regexp
constant, they may look ugly.

   The following table lists all the escape sequences used in `awk' and
what they represent. Unless noted otherwise, all these escape sequences
apply to both string constants and regexp constants:

`\\'
     A literal backslash, `\'.

`\a'
     The "alert" character, `Ctrl-g', ASCII code 7 (BEL).  (This
     usually makes some sort of audible noise.)

`\b'
     Backspace, `Ctrl-h', ASCII code 8 (BS).

`\f'
     Formfeed, `Ctrl-l', ASCII code 12 (FF).

`\n'
     Newline, `Ctrl-j', ASCII code 10 (LF).

`\r'
     Carriage return, `Ctrl-m', ASCII code 13 (CR).

`\t'
     Horizontal tab, `Ctrl-i', ASCII code 9 (HT).

`\v'
     Vertical tab, `Ctrl-k', ASCII code 11 (VT).

`\NNN'
     The octal value NNN, where NNN stands for 1 to 3 digits between
     `0' and `7'.  For example, the code for the ASCII ESC (escape)
     character is `\033'.

`\xHH...'
     The hexadecimal value HH, where HH stands for a sequence of
     hexadecimal digits (`0' through `9', and either `A' through `F' or
     `a' through `f').  Like the same construct in ISO C, the escape
     sequence continues until the first non-hexadecimal digit is seen.
     However, using more than two hexadecimal digits produces undefined
     results. (The `\x' escape sequence is not allowed in POSIX `awk'.)

`\/'
     A literal slash (necessary for regexp constants only).  This
     expression is used when you want to write a regexp constant that
     contains a slash. Because the regexp is delimited by slashes, you
     need to escape the slash that is part of the pattern, in order to
     tell `awk' to keep processing the rest of the regexp.

`\"'
     A literal double quote (necessary for string constants only).
     This expression is used when you want to write a string constant
     that contains a double quote. Because the string is delimited by
     double quotes, you need to escape the quote that is part of the
     string, in order to tell `awk' to keep processing the rest of the
     string.

   In `gawk', a number of additional two-character sequences that begin
with a backslash have special meaning in regexps.  Note:
`gawk'-Specific Regexp Operators.

   In a regexp, a backslash before any character that is not in the
above table and not listed in *Note `gawk'-Specific Regexp Operators:
GNU Regexp Operators, means that the next character should be taken
literally, even if it would normally be a regexp operator.  For
example, `/a\+b/' matches the three characters `a+b'.

   For complete portability, do not use a backslash before any
character not shown in the table above.

   To summarize:

   * The escape sequences in the table above are always processed first,
     for both string constants and regexp constants. This happens very
     early, as soon as `awk' reads your program.

   * `gawk' processes both regexp constants and dynamic regexps (Note:
     Using Dynamic Regexps.), for the special
     operators listed in Note: `gawk'-Specific Regexp Operators.


   * A backslash before any other character means to treat that
     character literally.

Advanced Notes: Backslash Before Regular Characters
---------------------------------------------------

   If you place a backslash in a string constant before something that
is not one of the characters listed above, POSIX `awk' purposely leaves
what happens as undefined.  There are two choices:

Strip the backslash out
     This is what Unix `awk' and `gawk' both do.  For example, `"a\qc"'
     is the same as `"aqc"'.  (Because this is such an easy bug to both
     introduce and to miss, `gawk' warns you about it.)  Consider `FS =
     "[ \t]+\|[ \t]+"' to use vertical bars surrounded by whitespace as
     the field separator. There should be two backslashes in the
     string, `FS = "[ \t]+\\|[ \t]+"'.)

Leave the backslash alone
     Some other `awk' implementations do this.  In such
     implementations, `"a\qc"' is the same as if you had typed
     `"a\\qc"'.

Advanced Notes: Escape Sequences for Metacharacters
---------------------------------------------------

   Suppose you use an octal or hexadecimal escape to represent a regexp
metacharacter (Note: Regular Expression Operators.).
Does `awk' treat the character as a literal character or as a regexp
operator?

   Historically, such characters were taken literally.  (d.c.)
However, the POSIX standard indicates that they should be treated as
real metacharacters, which is what `gawk' does.  In compatibility mode
(Note: Command-Line Options.), `gawk' treats the characters
represented by octal and hexadecimal escape sequences literally when
used in regexp constants. Thus, `/a\52b/' is equivalent to `/a\*b/'.


automatically generated by info2www version 1.2.2.9