GNU Info

Info Node: (gawk.info)Close Files And Pipes

(gawk.info)Close Files And Pipes


Prev: Special Files Up: Printing
Enter node , (file) or (file)node

Closing Input and Output Redirections
=====================================

   If the same file name or the same shell command is used with
`getline' more than once during the execution of an `awk' program
(Note: Explicit Input with `getline'.), the file is opened (or
the command is executed) the first time only.  At that time, the first
record of input is read from that file or command.  The next time the
same file or command is used with `getline', another record is read
from it, and so on.

   Similarly, when a file or pipe is opened for output, the file name or
command associated with it is remembered by `awk', and subsequent
writes to the same file or command are appended to the previous writes.
The file or pipe stays open until `awk' exits.

   This implies that special steps are necessary in order to read the
same file again from the beginning, or to rerun a shell command (rather
than reading more output from the same command).  The `close' function
makes these things possible:

     close(FILENAME)

or:

     close(COMMAND)

   The argument FILENAME or COMMAND can be any expression.  Its value
must _exactly_ match the string that was used to open the file or start
the command (spaces and other "irrelevant" characters included). For
example, if you open a pipe with this:

     "sort -r names" | getline foo

then you must close it with this:

     close("sort -r names")

   Once this function call is executed, the next `getline' from that
file or command, or the next `print' or `printf' to that file or
command, reopens the file or reruns the command.  Because the
expression that you use to close a file or pipeline must exactly match
the expression used to open the file or run the command, it is good
practice to use a variable to store the file name or command.  The
previous example becomes the following:

     sortcom = "sort -r names"
     sortcom | getline foo
     ...
     close(sortcom)

This helps avoid hard-to-find typographical errors in your `awk'
programs.  Here are some of the reasons for closing an output file:

   * To write a file and read it back later on in the same `awk'
     program.  Close the file after writing it, then begin reading it
     with `getline'.

   * To write numerous files, successively, in the same `awk' program.
     If the files aren't closed, eventually `awk' may exceed a system
     limit on the number of open files in one process.  It is best to
     close each one when the program has finished writing it.

   * To make a command finish.  When output is redirected through a
     pipe, the command reading the pipe normally continues to try to
     read input as long as the pipe is open.  Often this means the
     command cannot really do its work until the pipe is closed.  For
     example, if output is redirected to the `mail' program, the
     message is not actually sent until the pipe is closed.

   * To run the same program a second time, with the same arguments.
     This is not the same thing as giving more input to the first run!

     For example, suppose a program pipes output to the `mail' program.
     If it outputs several lines redirected to this pipe without closing
     it, they make a single message of several lines.  By contrast, if
     the program closes the pipe after each line of output, then each
     line makes a separate message.

   If you use more files than the system allows you to have open,
`gawk' attempts to multiplex the available open files among your data
files.  `gawk''s ability to do this depends upon the facilities of your
operating system, so it may not always work.  It is therefore both good
practice and good portability advice to always use `close' on your
files when you are done with them.  In fact, if you are using a lot of
pipes, it is essential that you close commands when done. For example,
consider something like this:

     {
         ...
         command = ("grep " $1 " /some/file | my_prog -q " $3)
         while ((command | getline) > 0) {
             PROCESS OUTPUT OF command
         }
         # need close(command) here
     }

   This example creates a new pipeline based on data in _each_ record.
Without the call to `close' indicated in the comment, `awk' creates
child processes to run the commands, until it eventually runs out of
file descriptors for more pipelines.

   Even though each command has finished (as indicated by the
end-of-file return status from `getline'), the child process is not
terminated;(1) more importantly, the file descriptor for the pipe is
not closed and released until `close' is called or `awk' exits.

   `close' will silently do nothing if given an argument that does not
represent a file, pipe or coprocess that was opened with a redirection.

   When using the `|&' operator to communicate with a coprocess, it is
occasionally useful to be able to close one end of the two-way pipe
without closing the other.  This is done by supplying a second argument
to `close'.  As in any other call to `close', the first argument is the
name of the command or special file used to start the coprocess.  The
second argument should be a string, with either of the values `"to"' or
`"from"'.  Case does not matter.  As this is an advanced feature, a
more complete discussion is delayed until Note: Two-Way Communications
with Another Process, which discusses it in more detail
and gives an example.

Advanced Notes: Using `close''s Return Value
--------------------------------------------

   In many versions of Unix `awk', the `close' function is actually a
statement.  It is a syntax error to try and use the return value from
`close': (d.c.)

     command = "..."
     command | getline info
     retval = close(command)  # syntax error in most Unix awks

   `gawk' treats `close' as a function.  The return value is -1 if the
argument names something that was never opened with a redirection, or
if there is a system problem closing the file or process.  In these
cases, `gawk' sets the built-in variable `ERRNO' to a string describing
the problem.

   In `gawk', when closing a pipe or coprocess, the return value is the
exit status of the command.  Otherwise, it is the return value from the
system's `close' or `fclose' C functions when closing input or output
files, respectively.  This value is zero if the close succeeds, or -1 if
it fails.

   The return value for closing a pipeline is particularly useful.  It
allows you to get the output from a command as well as its exit status.

   For POSIX-compliant systems, if the exit status is a number above
128, then the program was terminated by a signal.  Subtract 128 to get
the signal number:

     exit_val = close(command)
     if (exit_val > 128)
         print command, "died with signal", exit_val - 128
     else
         print command, "exited with code", exit_val

   Currently, in `gawk', this only works for commands piping into
`getline'.  For commands piped into from `print' or `printf', the
return value from `close' is that of the library's `pclose' function.

   ---------- Footnotes ----------

   (1) The technical terminology is rather morbid.  The finished child
is called a "zombie," and cleaning up after it is referred to as
"reaping."


automatically generated by info2www version 1.2.2.9