Two-Way Communications with Another Process
===========================================
From: brennan@whidbey.com (Mike Brennan)
Newsgroups: comp.lang.awk
Subject: Re: Learn the SECRET to Attract Women Easily
Date: 4 Aug 1997 17:34:46 GMT
Message-ID: <5s53rm$eca@news.whidbey.com>
On 3 Aug 1997 13:17:43 GMT, Want More Dates???
<tracy78@kilgrona.com> wrote:
>Learn the SECRET to Attract Women Easily
>
>The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
The scent of awk programmers is a lot more attractive to women than
the scent of perl programmers.
--
Mike Brennan
It is often useful to be able to send data to a separate program for
processing and then read the result. This can always be done with
temporary files:
# write the data for processing
tempfile = ("/tmp/mydata." PROCINFO["pid"])
while (NOT DONE WITH DATA)
print DATA | ("subprogram > " tempfile)
close("subprogram > " tempfile)
# read the results, remove tempfile when done
while ((getline newdata < tempfile) > 0)
PROCESS newdata APPROPRIATELY
close(tempfile)
system("rm " tempfile)
This works, but not elegantly.
Starting with version 3.1 of `gawk', it is possible to open a
_two-way_ pipe to another process. The second process is termed a
"coprocess", since it runs in parallel with `gawk'. The two-way
connection is created using the new `|&' operator (borrowed from the
Korn Shell, `ksh'):(1)
do {
print DATA |& "subprogram"
"subprogram" |& getline results
} while (DATA LEFT TO PROCESS)
close("subprogram")
The first time an I/O operation is executed using the `|&' operator,
`gawk' creates a two-way pipeline to a child process that runs the
other program. Output created with `print' or `printf' is written to
the program's standard input, and output from the program's standard
output can be read by the `gawk' program using `getline'. As is the
case with processes started by `|', the subprogram can be any program,
or pipeline of programs, that can be started by the shell.
There are some cautionary items to be aware of:
* As the code inside `gawk' currently stands, the coprocess's
standard error goes to the same place that the parent `gawk''s
standard error goes. It is not possible to read the child's
standard error separately.
* I/O buffering may be a problem. `gawk' automatically flushes all
output down the pipe to the child process. However, if the
coprocess does not flush its output, `gawk' may hang when doing a
`getline' in order to read the coprocess's results. This could
lead to a situation known as "deadlock", where each process is
waiting for the other one to do something.
It is possible to close just one end of the two-way pipe to a
coprocess, by supplying a second argument to the `close' function of
either `"to"' or `"from"' (*note Closing Input and Output Redirections:
Close Files And Pipes.). These strings tell `gawk' to close the end of
the pipe that sends data to the process or the end that reads from it,
respectively.
This is particularly necessary in order to use the system `sort'
utility as part of a coprocess; `sort' must read _all_ of its input
data before it can produce any output. The `sort' program does not
receive an end-of-file indication until `gawk' closes the write end of
the pipe.
When you have finished writing data to the `sort' utility, you can
close the `"to"' end of the pipe, and then start reading sorted data
via `getline'. For example:
BEGIN {
command = "LC_ALL=C sort"
n = split("abcdefghijklmnopqrstuvwxyz", a, "")
for (i = n; i > 0; i--)
print a[i] |& command
close(command, "to")
while ((command |& getline line) > 0)
print "got", line
close(command)
}
This program writes the letters of the alphabet in reverse order, one
per line, down the two-way pipe to `sort'. It then closes the write
end of the pipe, so that `sort' receives an end-of-file indication.
This causes `sort' to sort the data and write the sorted data back to
the `gawk' program. Once all of the data has been read, `gawk'
terminates the coprocess and exits.
As a side note, the assignment `LC_ALL=C' in the `sort' command
ensures traditional Unix (ASCII) sorting from `sort'.
---------- Footnotes ----------
(1) This is very different from the same operator in the C shell,
`csh'.