Profiling Your `awk' Programs
=============================
Beginning with version 3.1 of `gawk', you may produce execution
traces of your `awk' programs. This is done with a specially compiled
version of `gawk', called `pgawk' ("profiling `gawk'").
`pgawk' is identical in every way to `gawk', except that when it has
finished running, it creates a profile of your program in a file named
`awkprof.out'. Because it is profiling, it also executes up to 45
percent slower than `gawk' normally does.
As shown in the following example, the `--profile' option can be
used to change the name of the file where `pgawk' will write the
profile:
$ pgawk --profile=myprog.prof -f myprog.awk data1 data2
In the above example, `pgawk' places the profile in `myprog.prof'
instead of in `awkprof.out'.
Regular `gawk' also accepts this option. When called with just
`--profile', `gawk' "pretty prints" the program into `awkprof.out',
without any execution counts. You may supply an option to `--profile'
to change the file name. Here is a sample session showing a simple
`awk' program, its input data, and the results from running `pgawk'.
First, the `awk' program:
BEGIN { print "First BEGIN rule" }
END { print "First END rule" }
/foo/ {
print "matched /foo/, gosh"
for (i = 1; i <= 3; i++)
sing()
}
{
if (/foo/)
print "if is true"
else
print "else is true"
}
BEGIN { print "Second BEGIN rule" }
END { print "Second END rule" }
function sing( dummy)
{
print "I gotta be me!"
}
Following is the input data:
foo
bar
baz
foo
junk
Here is the `awkprof.out' that results from running `pgawk' on this
program and data. (This example also illustrates that `awk'
programmers sometimes have to work late.):
# gawk profile, created Sun Aug 13 00:00:15 2000
# BEGIN block(s)
BEGIN {
1 print "First BEGIN rule"
1 print "Second BEGIN rule"
}
# Rule(s)
5 /foo/ { # 2
2 print "matched /foo/, gosh"
6 for (i = 1; i <= 3; i++) {
6 sing()
}
}
5 {
5 if (/foo/) { # 2
2 print "if is true"
3 } else {
3 print "else is true"
}
}
# END block(s)
END {
1 print "First END rule"
1 print "Second END rule"
}
# Functions, listed alphabetically
6 function sing(dummy)
{
6 print "I gotta be me!"
}
The previous example illustrates many of the basic rules for
profiling output. The rules are as follows:
* The program is printed in the order `BEGIN' rule, pattern/action
rules, `END' rule and functions, listed alphabetically. Multiple
`BEGIN' and `END' rules are merged together.
* Pattern-action rules have two counts. The first count, to the
left of the rule, shows how many times the rule's pattern was
_tested_. The second count, to the right of the rule's opening
left brace in a comment, shows how many times the rule's action
was _executed_. The difference between the two indicates how many
times the rule's pattern evaluated to false.
* Similarly, the count for an `if'-`else' statement shows how many
times the condition was tested. To the right of the opening left
brace for the `if''s body is a count showing how many times the
condition was true. The count for the `else' indicates how many
times the test failed.
* The count for a loop header (such as `for' or `while') shows how
many times the loop test was executed. (Because of this, you
can't just look at the count on the first statement in a rule to
determine how many times the rule was executed. If the first
statement is a loop, the count is misleading.)
* For user-defined functions, the count next to the `function'
keyword indicates how many times the function was called. The
counts next to the statements in the body show how many times
those statements were executed.
* The layout uses "K&R" style using tabs. Braces are used
everywhere, even when the body of an `if', `else', or loop is only
a single statement.
* Parentheses are used only where needed, as indicated by the
structure of the program and the precedence rules. For example,
`(3 + 5) * 4' means add three plus five, then multiply the total
by four. However, `3 + 5 * 4' has no parentheses, and means `3 +
(5 * 4)'.
* All string concatenations are parenthesized too. (This could be
made a bit smarter.)
* Parentheses are used around the arguments to `print' and `printf'
only when the `print' or `printf' statement is followed by a
redirection. Similarly, if the target of a redirection isn't a
scalar, it gets parenthesized.
* `pgawk' supplies leading comments in front of the `BEGIN' and
`END' rules, the pattern/action rules, and the functions.
The profiled version of your program may not look exactly like what
you typed when you wrote it. This is because `pgawk' creates the
profiled version by "pretty printing" its internal representation of
the program. The advantage to this is that `pgawk' can produce a
standard representation. The disadvantage is that all source code
comments are lost, as are the distinctions among multiple `BEGIN' and
`END' rules. Also, things such as:
/foo/
come out as:
/foo/ {
print $0
}
which is correct, but possibly surprising.
Besides creating profiles when a program has completed, `pgawk' can
produce a profile while it is running. This is useful if your `awk'
program goes into an infinite loop and you want to see what has been
executed. To use this feature, run `pgawk' in the background:
$ pgawk -f myprog &
[1] 13992
The shell prints a job number and process ID number, in this case,
13992. Use the `kill' command to send the `USR1' signal to `pgawk':
$ kill -USR1 13992
As usual, the profiled version of the program is written to
`awkprof.out', or to a different file if you use the `--profile' option.
Along with the regular profile, as shown earlier, the profile
includes a trace of any active functions:
# Function Call Stack:
# 3. baz
# 2. bar
# 1. foo
# -- main --
You may send `pgawk' the `USR1' signal as many times as you like.
Each time, the profile and function call trace are appended to the
output profile file.
If you use the `HUP' signal instead of the `USR1' signal, `pgawk'
produces the profile and the function call trace, and then exits.