Copyright (C) 2000-2012 |
GNU Info (gawk.info)Filetrans FunctionNoting Data File Boundaries --------------------------- The `BEGIN' and `END' rules are each executed exactly once, at the beginning and end of your `awk' program, respectively (Note: The `BEGIN' and `END' Special Patterns.). We (the `gawk' authors) once had a user who mistakenly thought that the `BEGIN' rule is executed at the beginning of each data file and the `END' rule is executed at the end of each data file. When informed that this was not the case, the user requested that we add new special patterns to `gawk', named `BEGIN_FILE' and `END_FILE', that would have the desired behavior. He even supplied us the code to do so. Adding these special patterns to `gawk' wasn't necessary; the job can be done cleanly in `awk' itself, as illustrated by the following library program. It arranges to call two user-supplied functions, `beginfile' and `endfile', at the beginning and end of each data file. Besides solving the problem in only nine(!) lines of code, it does so _portably_; this works with any implementation of `awk': # transfile.awk # # Give the user a hook for filename transitions # # The user must supply functions beginfile() and endfile() # that each take the name of the file being started or # finished, respectively. FILENAME != _oldfilename \ { if (_oldfilename != "") endfile(_oldfilename) _oldfilename = FILENAME beginfile(FILENAME) } END { endfile(FILENAME) } This file must be loaded before the user's "main" program, so that the rule it supplies is executed first. This rule relies on `awk''s `FILENAME' variable that automatically changes for each new data file. The current file name is saved in a private variable, `_oldfilename'. If `FILENAME' does not equal `_oldfilename', then a new data file is being processed and it is necessary to call `endfile' for the old file. Because `endfile' should only be called if a file has been processed, the program first checks to make sure that `_oldfilename' is not the null string. The program then assigns the current file name to `_oldfilename' and calls `beginfile' for the file. Because, like all `awk' variables, `_oldfilename' is initialized to the null string, this rule executes correctly even for the first data file. The program also supplies an `END' rule to do the final processing for the last file. Because this `END' rule comes before any `END' rules supplied in the "main" program, `endfile' is called first. Once again the value of multiple `BEGIN' and `END' rules should be clear. This version has same problem as the first version of `nextfile' (Note: Implementing `nextfile' as a Function.). If the same data file occurs twice in a row on the command line, then `endfile' and `beginfile' are not executed at the end of the first pass and at the beginning of the second pass. The following version solves the problem: # ftrans.awk --- handle data file transitions # # user supplies beginfile() and endfile() functions FNR == 1 { if (_filename_ != "") endfile(_filename_) _filename_ = FILENAME beginfile(FILENAME) } END { endfile(_filename_) } Note: Counting Things, shows how this library function can be used and how it simplifies writing the main program. automatically generated by info2www version 1.2.2.9 |