GNU Info

Info Node: (gawk.info)Two-way I/O

(gawk.info)Two-way I/O


Next: TCP/IP Networking Prev: Non-decimal Data Up: Advanced Features
Enter node , (file) or (file)node

Two-Way Communications with Another Process
===========================================

     From: brennan@whidbey.com (Mike Brennan)
     Newsgroups: comp.lang.awk
     Subject: Re: Learn the SECRET to Attract Women Easily
     Date: 4 Aug 1997 17:34:46 GMT
     Message-ID: <5s53rm$eca@news.whidbey.com>
     
     On 3 Aug 1997 13:17:43 GMT, Want More Dates???
     <tracy78@kilgrona.com> wrote:
     >Learn the SECRET to Attract Women Easily
     >
     >The SCENT(tm)  Pheromone Sex Attractant For Men to Attract Women
     
     The scent of awk programmers is a lot more attractive to women than
     the scent of perl programmers.
     --
     Mike Brennan

   It is often useful to be able to send data to a separate program for
processing and then read the result.  This can always be done with
temporary files:

     # write the data for processing
     tempfile = ("/tmp/mydata." PROCINFO["pid"])
     while (NOT DONE WITH DATA)
         print DATA | ("subprogram > " tempfile)
     close("subprogram > " tempfile)
     
     # read the results, remove tempfile when done
     while ((getline newdata < tempfile) > 0)
         PROCESS newdata APPROPRIATELY
     close(tempfile)
     system("rm " tempfile)

This works, but not elegantly.

   Starting with version 3.1 of `gawk', it is possible to open a
_two-way_ pipe to another process.  The second process is termed a
"coprocess", since it runs in parallel with `gawk'.  The two-way
connection is created using the new `|&' operator (borrowed from the
Korn Shell, `ksh'):(1)

     do {
         print DATA |& "subprogram"
         "subprogram" |& getline results
     } while (DATA LEFT TO PROCESS)
     close("subprogram")

   The first time an I/O operation is executed using the `|&' operator,
`gawk' creates a two-way pipeline to a child process that runs the
other program.  Output created with `print' or `printf' is written to
the program's standard input, and output from the program's standard
output can be read by the `gawk' program using `getline'.  As is the
case with processes started by `|', the subprogram can be any program,
or pipeline of programs, that can be started by the shell.

   There are some cautionary items to be aware of:

   * As the code inside `gawk' currently stands, the coprocess's
     standard error goes to the same place that the parent `gawk''s
     standard error goes. It is not possible to read the child's
     standard error separately.

   * I/O buffering may be a problem.  `gawk' automatically flushes all
     output down the pipe to the child process.  However, if the
     coprocess does not flush its output, `gawk' may hang when doing a
     `getline' in order to read the coprocess's results.  This could
     lead to a situation known as "deadlock", where each process is
     waiting for the other one to do something.

   It is possible to close just one end of the two-way pipe to a
coprocess, by supplying a second argument to the `close' function of
either `"to"' or `"from"' (*note Closing Input and Output Redirections:
Close Files And Pipes.).  These strings tell `gawk' to close the end of
the pipe that sends data to the process or the end that reads from it,
respectively.

   This is particularly necessary in order to use the system `sort'
utility as part of a coprocess; `sort' must read _all_ of its input
data before it can produce any output.  The `sort' program does not
receive an end-of-file indication until `gawk' closes the write end of
the pipe.

   When you have finished writing data to the `sort' utility, you can
close the `"to"' end of the pipe, and then start reading sorted data
via `getline'.  For example:

     BEGIN {
         command = "LC_ALL=C sort"
         n = split("abcdefghijklmnopqrstuvwxyz", a, "")
     
         for (i = n; i > 0; i--)
             print a[i] |& command
         close(command, "to")
     
         while ((command |& getline line) > 0)
             print "got", line
         close(command)
     }

   This program writes the letters of the alphabet in reverse order, one
per line, down the two-way pipe to `sort'.  It then closes the write
end of the pipe, so that `sort' receives an end-of-file indication.
This causes `sort' to sort the data and write the sorted data back to
the `gawk' program.  Once all of the data has been read, `gawk'
terminates the coprocess and exits.

   As a side note, the assignment `LC_ALL=C' in the `sort' command
ensures traditional Unix (ASCII) sorting from `sort'.

   ---------- Footnotes ----------

   (1) This is very different from the same operator in the C shell,
`csh'.


automatically generated by info2www version 1.2.2.9