Reading Fixed-Width Data
========================
(This minor node discusses an advanced feature of `awk'. If you are
a novice `awk' user, you might want to skip it on the first reading.)
`gawk' version 2.13 introduced a facility for dealing with
fixed-width fields with no distinctive field separator. For example,
data of this nature arises in the input for old Fortran programs where
numbers are run together, or in the output of programs that did not
anticipate the use of their output as input for other programs.
An example of the latter is a table where all the columns are lined
up by the use of a variable number of spaces and _empty fields are just
spaces_. Clearly, `awk''s normal field splitting based on `FS' does
not work well in this case. Although a portable `awk' program can use
a series of `substr' calls on `$0' (Note:String Manipulation
Functions.), this is awkward and inefficient for a
large number of fields.
The splitting of an input record into fixed-width fields is
specified by assigning a string containing space-separated numbers to
the built-in variable `FIELDWIDTHS'. Each number specifies the width
of the field, _including_ columns between fields. If you want to
ignore the columns between fields, you can specify the width as a
separate field that is subsequently ignored. It is a fatal error to
supply a field width that is not a positive number. The following data
is the output of the Unix `w' utility. It is useful to illustrate the
use of `FIELDWIDTHS':
10:06pm up 21 days, 14:04, 23 users
User tty login idle JCPU PCPU what
hzuo ttyV0 8:58pm 9 5 vi p24.tex
hzang ttyV3 6:37pm 50 -csh
eklye ttyV5 9:53pm 7 1 em thes.tex
dportein ttyV6 8:17pm 1:47 -csh
gierd ttyD3 10:00pm 1 elm
dave ttyD4 9:47pm 4 4 w
brent ttyp0 26Jun91 4:46 26:46 4:41 bash
dave ttyq4 26Jun9115days 46 46 wnewmail
The following program takes the above input, converts the idle time
to number of seconds, and prints out the first two fields and the
calculated idle time.
*Note:* This program uses a number of `awk' features that haven't
been introduced yet.
BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" }
NR > 2 {
idle = $4
sub(/^ */, "", idle) # strip leading spaces
if (idle == "")
idle = 0
if (idle ~ /:/) {
split(idle, t, ":")
idle = t[1] * 60 + t[2]
}
if (idle ~ /days/)
idle *= 24 * 60 * 60
print $1, $2, idle
}
Running the program on the data produces the following results:
hzuo ttyV0 0
hzang ttyV3 50
eklye ttyV5 0
dportein ttyV6 107
gierd ttyD3 1
dave ttyD4 0
brent ttyp0 286
dave ttyq4 1296000
Another (possibly more practical) example of fixed-width input data
is the input from a deck of balloting cards. In some parts of the
United States, voters mark their choices by punching holes in computer
cards. These cards are then processed to count the votes for any
particular candidate or on any particular issue. Because a voter may
choose not to vote on some issue, any column on the card may be empty.
An `awk' program for processing such data could use the `FIELDWIDTHS'
feature to simplify reading the data. (Of course, getting `gawk' to
run on a system with card readers is another story!)
Assigning a value to `FS' causes `gawk' to return to using `FS' for
field splitting. Use `FS = FS' to make this happen, without having to
know the current value of `FS'. In order to tell which kind of field
splitting is in effect, use `PROCINFO["FS"]' (Note:Built-in Variables
That Convey Information.). The value is `"FS"' if regular
field splitting is being used, or it is `"FIELDWIDTHS"' if fixed-width
field splitting is being used:
if (PROCINFO["FS"] == "FS")
REGULAR FIELD SPLITTING ...
else
FIXED-WIDTH FIELD SPLITTING ...
This information is useful when writing a function that needs to
temporarily change `FS' or `FIELDWIDTHS', read some records, and then
restore the original settings (Note:Reading the User Database.
, for an example of such a function).