GNU Info

Info Node: (gawk.info)Field Splitting Summary

(gawk.info)Field Splitting Summary


Prev: Command Line Field Separator Up: Field Separators
Enter node , (file) or (file)node

Field Splitting Summary
-----------------------

   The following table summarizes how fields are split, based on the
value of `FS'. (`==' means "is equal to.")

`FS == " "'
     Fields are separated by runs of whitespace.  Leading and trailing
     whitespace are ignored.  This is the default.

`FS == ANY OTHER SINGLE CHARACTER'
     Fields are separated by each occurrence of the character.  Multiple
     successive occurrences delimit empty fields, as do leading and
     trailing occurrences.  The character can even be a regexp
     metacharacter; it does not need to be escaped.

`FS == REGEXP'
     Fields are separated by occurrences of characters that match
     REGEXP.  Leading and trailing matches of REGEXP delimit empty
     fields.

`FS == ""'
     Each individual character in the record becomes a separate field.
     (This is a `gawk' extension; it is not specified by the POSIX
     standard.)

Advanced Notes: Changing `FS' Does Not Affect the Fields
--------------------------------------------------------

   According to the POSIX standard, `awk' is supposed to behave as if
each record is split into fields at the time it is read.  In
particular, this means that if you change the value of `FS' after a
record is read, the value of the fields (i.e., how they were split)
should reflect the old value of `FS', not the new one.

   However, many implementations of `awk' do not work this way.
Instead, they defer splitting the fields until a field is actually
referenced.  The fields are split using the _current_ value of `FS'!
(d.c.)  This behavior can be difficult to diagnose. The following
example illustrates the difference between the two methods.  (The
`sed'(1) command prints just the first line of `/etc/passwd'.)

     sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'

which usually prints:

     root

on an incorrect implementation of `awk', while `gawk' prints something
like:

     root:nSijPlPhZZwgE:0:0:Root:/:

   ---------- Footnotes ----------

   (1) The `sed' utility is a "stream editor."  Its behavior is also
defined by the POSIX standard.


automatically generated by info2www version 1.2.2.9