GNU Info

Info Node: (gawk.info)Nextfile Function

(gawk.info)Nextfile Function


Next: Assert Function Prev: General Functions Up: General Functions
Enter node , (file) or (file)node

Implementing `nextfile' as a Function
-------------------------------------

   The `nextfile' statement presented in Note: Using `gawk''s
`nextfile' Statement, is a `gawk'-specific
extension--it is not available in most other implementations of `awk'.
This minor node shows two versions of a `nextfile' function that you
can use to simulate `gawk''s `nextfile' statement if you cannot use
`gawk'.

   A first attempt at writing a `nextfile' function is as follows:

     # nextfile --- skip remaining records in current file
     # this should be read in before the "main" awk program
     
     function nextfile()    { _abandon_ = FILENAME; next }
     _abandon_ == FILENAME  { next }

   Because it supplies a rule that must be executed first, this file
should be included before the main program. This rule compares the
current data file's name (which is always in the `FILENAME' variable) to
a private variable named `_abandon_'.  If the file name matches, then
the action part of the rule executes a `next' statement to go on to the
next record.  (The use of `_' in the variable name is a convention.  It
is discussed more fully in Note: Naming Library Function Global
Variables.)

   The use of the `next' statement effectively creates a loop that reads
all the records from the current data file.  The end of the file is
eventually reached and a new data file is opened, changing the value of
`FILENAME'.  Once this happens, the comparison of `_abandon_' to
`FILENAME' fails and execution continues with the first rule of the
"real" program.

   The `nextfile' function itself simply sets the value of `_abandon_'
and then executes a `next' statement to start the loop.

   This initial version has a subtle problem.  If the same data file is
listed _twice_ on the commandline, one right after the other or even
with just a variable assignment between them, this code skips right
through the file, a second time, even though it should stop when it
gets to the end of the first occurrence.  A second version of
`nextfile' that remedies this problem is shown here:

     # nextfile --- skip remaining records in current file
     # correctly handle successive occurrences of the same file
     # this should be read in before the "main" awk program
     
     function nextfile()   { _abandon_ = FILENAME; next }
     
     _abandon_ == FILENAME {
           if (FNR == 1)
               _abandon_ = ""
           else
               next
     }

   The `nextfile' function has not changed.  It makes `_abandon_' equal
to the current file name and then executes a `next' statement.  The
`next' statement reads the next record and increments `FNR' so that
`FNR' is guaranteed to have a value of at least two.  However, if
`nextfile' is called for the last record in the file, then `awk' closes
the current data file and moves on to the next one.  Upon doing so,
`FILENAME' is set to the name of the new file and `FNR' is reset to
one.  If this next file is the same as the previous one, `_abandon_' is
still equal to `FILENAME'.  However, `FNR' is equal to one, telling us
that this is a new occurrence of the file and not the one we were
reading when the `nextfile' function was executed.  In that case,
`_abandon_' is reset to the empty string, so that further executions of
this rule fail (until the next time that `nextfile' is called).

   If `FNR' is not one, then we are still in the original data file and
the program executes a `next' statement to skip through it.

   An important question to ask at this point is: given that the
functionality of `nextfile' can be provided with a library file, why is
it built into `gawk'?  Adding features for little reason leads to
larger, slower programs that are harder to maintain.  The answer is
that building `nextfile' into `gawk' provides significant gains in
efficiency.  If the `nextfile' function is executed at the beginning of
a large data file, `awk' still has to scan the entire file, splitting
it up into records, just to skip over it.  The built-in `nextfile' can
simply close the file immediately and proceed to the next one, which
saves a lot of time.  This is particularly important in `awk', because
`awk' programs are generally I/O-bound (i.e., they spend most of their
time doing input and output, instead of performing computations).


automatically generated by info2www version 1.2.2.9