GNU Info

Info Node: (gawk.info)Filetrans Function

(gawk.info)Filetrans Function


Next: Rewind Function Prev: Data File Management Up: Data File Management
Enter node , (file) or (file)node

Noting Data File Boundaries
---------------------------

   The `BEGIN' and `END' rules are each executed exactly once, at the
beginning and end of your `awk' program, respectively (Note: The
`BEGIN' and `END' Special Patterns.).  We (the `gawk'
authors) once had a user who mistakenly thought that the `BEGIN' rule
is executed at the beginning of each data file and the `END' rule is
executed at the end of each data file.  When informed that this was not
the case, the user requested that we add new special patterns to
`gawk', named `BEGIN_FILE' and `END_FILE', that would have the desired
behavior.  He even supplied us the code to do so.

   Adding these special patterns to `gawk' wasn't necessary; the job
can be done cleanly in `awk' itself, as illustrated by the following
library program.  It arranges to call two user-supplied functions,
`beginfile' and `endfile', at the beginning and end of each data file.
Besides solving the problem in only nine(!) lines of code, it does so
_portably_; this works with any implementation of `awk':

     # transfile.awk
     #
     # Give the user a hook for filename transitions
     #
     # The user must supply functions beginfile() and endfile()
     # that each take the name of the file being started or
     # finished, respectively.
     
     FILENAME != _oldfilename \
     {
         if (_oldfilename != "")
             endfile(_oldfilename)
         _oldfilename = FILENAME
         beginfile(FILENAME)
     }
     
     END   { endfile(FILENAME) }

   This file must be loaded before the user's "main" program, so that
the rule it supplies is executed first.

   This rule relies on `awk''s `FILENAME' variable that automatically
changes for each new data file.  The current file name is saved in a
private variable, `_oldfilename'.  If `FILENAME' does not equal
`_oldfilename', then a new data file is being processed and it is
necessary to call `endfile' for the old file.  Because `endfile' should
only be called if a file has been processed, the program first checks
to make sure that `_oldfilename' is not the null string.  The program
then assigns the current file name to `_oldfilename' and calls
`beginfile' for the file.  Because, like all `awk' variables,
`_oldfilename' is initialized to the null string, this rule executes
correctly even for the first data file.

   The program also supplies an `END' rule to do the final processing
for the last file.  Because this `END' rule comes before any `END' rules
supplied in the "main" program, `endfile' is called first.  Once again
the value of multiple `BEGIN' and `END' rules should be clear.

   This version has same problem as the first version of `nextfile'
(Note: Implementing `nextfile' as a Function.).  If
the same data file occurs twice in a row on the command line, then
`endfile' and `beginfile' are not executed at the end of the first pass
and at the beginning of the second pass.  The following version solves
the problem:

     # ftrans.awk --- handle data file transitions
     #
     # user supplies beginfile() and endfile() functions
     FNR == 1 {
         if (_filename_ != "")
             endfile(_filename_)
         _filename_ = FILENAME
         beginfile(FILENAME)
     }
     
     END  { endfile(_filename_) }

   Note: Counting Things, shows how this library function
can be used and how it simplifies writing the main program.


automatically generated by info2www version 1.2.2.9