GNU Info

Info Node: (gawk.info)Fields

(gawk.info)Fields


Next: Non-Constant Fields Prev: Records Up: Reading Files
Enter node , (file) or (file)node

Examining Fields
================

   When `awk' reads an input record, the record is automatically
separated or "parsed" by the interpreter into chunks called "fields".
By default, fields are separated by "whitespace", like words in a line.
Whitespace in `awk' means any string of one or more spaces, tabs, or
newlines;(1) other characters, such as formfeed, vertical tab, etc.
that are considered whitespace by other languages, are _not_ considered
whitespace by `awk'.

   The purpose of fields is to make it more convenient for you to refer
to these pieces of the record.  You don't have to use them--you can
operate on the whole record if you want--but fields are what make
simple `awk' programs so powerful.

   A dollar-sign (`$') is used to refer to a field in an `awk' program,
followed by the number of the field you want.  Thus, `$1' refers to the
first field, `$2' to the second, and so on.  (Unlike the Unix shells,
the field numbers are not limited to single digits.  `$127' is the one
hundred and twenty-seventh field in the record.)  For example, suppose
the following is a line of input:

     This seems like a pretty nice example.

Here the first field, or `$1', is `This', the second field, or `$2', is
`seems', and so on.  Note that the last field, `$7', is `example.'.
Because there is no space between the `e' and the `.', the period is
considered part of the seventh field.

   `NF' is a built-in variable whose value is the number of fields in
the current record.  `awk' automatically updates the value of `NF' each
time it reads a record.  No matter how many fields there are, the last
field in a record can be represented by `$NF'.  So, `$NF' is the same
as `$7', which is `example.'.  If you try to reference a field beyond
the last one (such as `$8' when the record has only seven fields), you
get the empty string.  (If used in a numeric operation, you get zero.)

   The use of `$0', which looks like a reference to the "zeroth" field,
is a special case: it represents the whole input record when you are
not interested in specific fields.  Here are some more examples:

     $ awk '$1 ~ /foo/ { print $0 }' BBS-list
     -| fooey        555-1234     2400/1200/300     B
     -| foot         555-6699     1200/300          B
     -| macfoo       555-6480     1200/300          A
     -| sabafoo      555-2127     1200/300          C

This example prints each record in the file `BBS-list' whose first
field contains the string `foo'.  The operator `~' is called a
"matching operator" (Note: How to Use Regular Expressions.
); it tests whether a string (here, the field `$1') matches a
given regular expression.

   By contrast, the following example looks for `foo' in _the entire
record_ and prints the first field and the last field for each matching
input record:

     $ awk '/foo/ { print $1, $NF }' BBS-list
     -| fooey B
     -| foot B
     -| macfoo A
     -| sabafoo C

   ---------- Footnotes ----------

   (1) In POSIX `awk', newlines are not considered whitespace for
separating fields.


automatically generated by info2www version 1.2.2.9