GNU Info

Info Node: (gawk.info)Profiling

(gawk.info)Profiling


Prev: Portal Files Up: Advanced Features
Enter node , (file) or (file)node

Profiling Your `awk' Programs
=============================

   Beginning with version 3.1 of `gawk', you may produce execution
traces of your `awk' programs.  This is done with a specially compiled
version of `gawk', called `pgawk' ("profiling `gawk'").

   `pgawk' is identical in every way to `gawk', except that when it has
finished running, it creates a profile of your program in a file named
`awkprof.out'.  Because it is profiling, it also executes up to 45
percent slower than `gawk' normally does.

   As shown in the following example, the `--profile' option can be
used to change the name of the file where `pgawk' will write the
profile:

     $ pgawk --profile=myprog.prof -f myprog.awk data1 data2

In the above example, `pgawk' places the profile in `myprog.prof'
instead of in `awkprof.out'.

   Regular `gawk' also accepts this option.  When called with just
`--profile', `gawk' "pretty prints" the program into `awkprof.out',
without any execution counts.  You may supply an option to `--profile'
to change the file name.  Here is a sample session showing a simple
`awk' program, its input data, and the results from running `pgawk'.
First, the `awk' program:

     BEGIN { print "First BEGIN rule" }
     
     END { print "First END rule" }
     
     /foo/ {
         print "matched /foo/, gosh"
         for (i = 1; i <= 3; i++)
             sing()
     }
     
     {
         if (/foo/)
             print "if is true"
         else
             print "else is true"
     }
     
     BEGIN { print "Second BEGIN rule" }
     
     END { print "Second END rule" }
     
     function sing(    dummy)
     {
         print "I gotta be me!"
     }

   Following is the input data:

     foo
     bar
     baz
     foo
     junk

   Here is the `awkprof.out' that results from running `pgawk' on this
program and data.  (This example also illustrates that `awk'
programmers sometimes have to work late.):

             # gawk profile, created Sun Aug 13 00:00:15 2000
     
             # BEGIN block(s)
     
             BEGIN {
          1          print "First BEGIN rule"
          1          print "Second BEGIN rule"
             }
     
             # Rule(s)
     
          5  /foo/   { # 2
          2          print "matched /foo/, gosh"
          6          for (i = 1; i <= 3; i++) {
          6                  sing()
                     }
             }
     
          5  {
          5          if (/foo/) { # 2
          2                  print "if is true"
          3          } else {
          3                  print "else is true"
                     }
             }
     
             # END block(s)
     
             END {
          1          print "First END rule"
          1          print "Second END rule"
             }
     
             # Functions, listed alphabetically
     
          6  function sing(dummy)
             {
          6          print "I gotta be me!"
             }

   The previous example illustrates many of the basic rules for
profiling output.  The rules are as follows:

   * The program is printed in the order `BEGIN' rule, pattern/action
     rules, `END' rule and functions, listed alphabetically.  Multiple
     `BEGIN' and `END' rules are merged together.

   * Pattern-action rules have two counts.  The first count, to the
     left of the rule, shows how many times the rule's pattern was
     _tested_.  The second count, to the right of the rule's opening
     left brace in a comment, shows how many times the rule's action
     was _executed_.  The difference between the two indicates how many
     times the rule's pattern evaluated to false.

   * Similarly, the count for an `if'-`else' statement shows how many
     times the condition was tested.  To the right of the opening left
     brace for the `if''s body is a count showing how many times the
     condition was true.  The count for the `else' indicates how many
     times the test failed.

   * The count for a loop header (such as `for' or `while') shows how
     many times the loop test was executed.  (Because of this, you
     can't just look at the count on the first statement in a rule to
     determine how many times the rule was executed.  If the first
     statement is a loop, the count is misleading.)

   * For user-defined functions, the count next to the `function'
     keyword indicates how many times the function was called.  The
     counts next to the statements in the body show how many times
     those statements were executed.

   * The layout uses "K&R" style using tabs.  Braces are used
     everywhere, even when the body of an `if', `else', or loop is only
     a single statement.

   * Parentheses are used only where needed, as indicated by the
     structure of the program and the precedence rules.  For example,
     `(3 + 5) * 4' means add three plus five, then multiply the total
     by four.  However, `3 + 5 * 4' has no parentheses, and means `3 +
     (5 * 4)'.

   * All string concatenations are parenthesized too.  (This could be
     made a bit smarter.)

   * Parentheses are used around the arguments to `print' and `printf'
     only when the `print' or `printf' statement is followed by a
     redirection.  Similarly, if the target of a redirection isn't a
     scalar, it gets parenthesized.

   * `pgawk' supplies leading comments in front of the `BEGIN' and
     `END' rules, the pattern/action rules, and the functions.


   The profiled version of your program may not look exactly like what
you typed when you wrote it.  This is because `pgawk' creates the
profiled version by "pretty printing" its internal representation of
the program.  The advantage to this is that `pgawk' can produce a
standard representation.  The disadvantage is that all source code
comments are lost, as are the distinctions among multiple `BEGIN' and
`END' rules.  Also, things such as:

     /foo/

come out as:

     /foo/   {
         print $0
     }

which is correct, but possibly surprising.

   Besides creating profiles when a program has completed, `pgawk' can
produce a profile while it is running.  This is useful if your `awk'
program goes into an infinite loop and you want to see what has been
executed.  To use this feature, run `pgawk' in the background:

     $ pgawk -f myprog &
     [1] 13992

The shell prints a job number and process ID number, in this case,
13992.  Use the `kill' command to send the `USR1' signal to `pgawk':

     $ kill -USR1 13992

As usual, the profiled version of the program is written to
`awkprof.out', or to a different file if you use the `--profile' option.

   Along with the regular profile, as shown earlier, the profile
includes a trace of any active functions:

     # Function Call Stack:
     
     #   3. baz
     #   2. bar
     #   1. foo
     # -- main --

   You may send `pgawk' the `USR1' signal as many times as you like.
Each time, the profile and function call trace are appended to the
output profile file.

   If you use the `HUP' signal instead of the `USR1' signal, `pgawk'
produces the profile and the function call trace, and then exits.


automatically generated by info2www version 1.2.2.9