Info Node: (gawk.info)Nextfile Function

www.fifi.org
    Documentation
        Manpages
        GNU Info
        Debian document tree
        Whole document tree
    Trigance web page
    Public services
    User info
    Mailing lists
    Secure server
    Multilingual usage

Validate HTML
Validate CSS

(gawk.info)Nextfile Function

Implementing `nextfile' as a Function ------------------------------------- The `nextfile' statement presented in Note: Using `gawk''s `nextfile' Statement, is a `gawk'-specific extension--it is not available in most other implementations of `awk'. This minor node shows two versions of a `nextfile' function that you can use to simulate `gawk''s `nextfile' statement if you cannot use `gawk'. A first attempt at writing a `nextfile' function is as follows: # nextfile --- skip remaining records in current file # this should be read in before the "main" awk program function nextfile() { _abandon_ = FILENAME; next } _abandon_ == FILENAME { next } Because it supplies a rule that must be executed first, this file should be included before the main program. This rule compares the current data file's name (which is always in the `FILENAME' variable) to a private variable named `_abandon_'. If the file name matches, then the action part of the rule executes a `next' statement to go on to the next record. (The use of `_' in the variable name is a convention. It is discussed more fully in Note: Naming Library Function Global Variables.) The use of the `next' statement effectively creates a loop that reads all the records from the current data file. The end of the file is eventually reached and a new data file is opened, changing the value of `FILENAME'. Once this happens, the comparison of `_abandon_' to `FILENAME' fails and execution continues with the first rule of the "real" program. The `nextfile' function itself simply sets the value of `_abandon_' and then executes a `next' statement to start the loop. This initial version has a subtle problem. If the same data file is listed _twice_ on the commandline, one right after the other or even with just a variable assignment between them, this code skips right through the file, a second time, even though it should stop when it gets to the end of the first occurrence. A second version of `nextfile' that remedies this problem is shown here: # nextfile --- skip remaining records in current file # correctly handle successive occurrences of the same file # this should be read in before the "main" awk program function nextfile() { _abandon_ = FILENAME; next } _abandon_ == FILENAME { if (FNR == 1) _abandon_ = "" else next } The `nextfile' function has not changed. It makes `_abandon_' equal to the current file name and then executes a `next' statement. The `next' statement reads the next record and increments `FNR' so that `FNR' is guaranteed to have a value of at least two. However, if `nextfile' is called for the last record in the file, then `awk' closes the current data file and moves on to the next one. Upon doing so, `FILENAME' is set to the name of the new file and `FNR' is reset to one. If this next file is the same as the previous one, `_abandon_' is still equal to `FILENAME'. However, `FNR' is equal to one, telling us that this is a new occurrence of the file and not the one we were reading when the `nextfile' function was executed. In that case, `_abandon_' is reset to the empty string, so that further executions of this rule fail (until the next time that `nextfile' is called). If `FNR' is not one, then we are still in the original data file and the program executes a `next' statement to skip through it. An important question to ask at this point is: given that the functionality of `nextfile' can be provided with a library file, why is it built into `gawk'? Adding features for little reason leads to larger, slower programs that are harder to maintain. The answer is that building `nextfile' into `gawk' provides significant gains in efficiency. If the `nextfile' function is executed at the beginning of a large data file, `awk' still has to scan the entire file, splitting it up into records, just to skip over it. The built-in `nextfile' can simply close the file immediately and proceed to the next one, which saves a lot of time. This is particularly important in `awk', because `awk' programs are generally I/O-bound (i.e., they spend most of their time doing input and output, instead of performing computations).

automatically generated by

info2www

version 1.2.2.9