GNU Info

Info Node: (gawk.info)Extract Program

(gawk.info)Extract Program


Next: Simple Sed Prev: History Sorting Up: Miscellaneous Programs
Enter node , (file) or (file)node

Extracting Programs from Texinfo Source Files
---------------------------------------------

   The nodes Note: A Library of `awk' Functions, and
Note: Practical `awk' Programs, are the top level
nodes for a large number of `awk' programs.  If you want to experiment
with these programs, it is tedious to have to type them in by hand.
Here we present a program that can extract parts of a Texinfo input
file into separate files.

   This Info file is written in Texinfo, the GNU project's document
formatting language.  A single Texinfo source file can be used to
produce both printed and online documentation.  The Texinfo language is
described fully, starting with Note: Top.

   For our purposes, it is enough to know three things about Texinfo
input files:

   * The "at" symbol (`@') is special in Texinfo, much as the backslash
     (`\') is in C or `awk'.  Literal `@' symbols are represented in
     Texinfo source files as `@@'.

   * Comments start with either `@c' or `@comment'.  The file
     extraction program works by using special comments that start at
     the beginning of a line.

   * Lines containing `@group' and `@end group' commands bracket
     example text that should not be split across a page boundary.
     (Unfortunately, TeX isn't always smart enough to do things exactly
     right and we have to give it some help.)

   The following program, `extract.awk', reads through a Texinfo source
file and does two things, based on the special comments.  Upon seeing
`@c system ...', it runs a command, by extracting the command text from
the control line and passing it on to the `system' function (Note:
Input/Output Functions.).  Upon seeing `@c file
FILENAME', each subsequent line is sent to the file FILENAME, until `@c
endfile' is encountered.  The rules in `extract.awk' match either `@c'
or `@comment' by letting the `omment' part be optional.  Lines
containing `@group' and `@end group' are simply removed.  `extract.awk'
uses the `join' library function (*note Merging an Array into a String:
Join Function.).

   The example programs in the online Texinfo source for `GAWK:
Effective AWK Programming' (`gawk.texi') have all been bracketed inside
`file' and `endfile' lines.  The `gawk' distribution uses a copy of
`extract.awk' to extract the sample programs and install many of them
in a standard directory where `gawk' can find them.  The Texinfo file
looks something like this:

     ...
     This program has a @code{BEGIN} rule,
     that prints a nice message:
     
     @example
     @c file examples/messages.awk
     BEGIN @{ print "Don't panic!" @}
     @c end file
     @end example
     
     It also prints some final advice:
     
     @example
     @c file examples/messages.awk
     END @{ print "Always avoid bored archeologists!" @}
     @c end file
     @end example
     ...

   `extract.awk' begins by setting `IGNORECASE' to one, so that mixed
upper- and lowercase letters in the directives won't matter.

   The first rule handles calling `system', checking that a command is
given (`NF' is at least three) and also checking that the command exits
with a zero exit status, signifying OK:

     # extract.awk --- extract files and run programs
     #                 from texinfo files
     BEGIN    { IGNORECASE = 1 }
     
     /^@c(omment)?[ \t]+system/    \
     {
         if (NF < 3) {
             e = (FILENAME ":" FNR)
             e = (e  ": badly formed `system' line")
             print e > "/dev/stderr"
             next
         }
         $1 = ""
         $2 = ""
         stat = system($0)
         if (stat != 0) {
             e = (FILENAME ":" FNR)
             e = (e ": warning: system returned " stat)
             print e > "/dev/stderr"
         }
     }

The variable `e' is used so that the function fits nicely on the screen.

   The second rule handles moving data into files.  It verifies that a
file name is given in the directive.  If the file named is not the
current file, then the current file is closed.  Keeping the current file
open until a new file is encountered allows the use of the `>'
redirection for printing the contents, keeping open file management
simple.

   The `for' loop does the work.  It reads lines using `getline' (Note:
Explicit Input with `getline'.).  For an unexpected end of
file, it calls the `unexpected_eof' function.  If the line is an
"endfile" line, then it breaks out of the loop.  If the line is an
`@group' or `@end group' line, then it ignores it and goes on to the
next line.  Similarly, comments within examples are also ignored.

   Most of the work is in the following few lines.  If the line has no
`@' symbols, the program can print it directly.  Otherwise, each
leading `@' must be stripped off.  To remove the `@' symbols, the line
is split into separate elements of the array `a', using the `split'
function (Note: String Manipulation Functions.).  The
`@' symbol is used as the separator character.  Each element of `a'
that is empty indicates two successive `@' symbols in the original
line.  For each two empty elements (`@@' in the original file), we have
to add a single `@' symbol back in.

   When the processing of the array is finished, `join' is called with
the value of `SUBSEP', to rejoin the pieces back into a single line.
That line is then printed to the output file:

     /^@c(omment)?[ \t]+file/    \
     {
         if (NF != 3) {
             e = (FILENAME ":" FNR ": badly formed `file' line")
             print e > "/dev/stderr"
             next
         }
         if ($3 != curfile) {
             if (curfile != "")
                 close(curfile)
             curfile = $3
         }
     
         for (;;) {
             if ((getline line) <= 0)
                 unexpected_eof()
             if (line ~ /^@c(omment)?[ \t]+endfile/)
                 break
             else if (line ~ /^@(end[ \t]+)?group/)
                 continue
             else if (line ~ /^@c(omment+)?[ \t]+/)
                 continue
             if (index(line, "@") == 0) {
                 print line > curfile
                 continue
             }
             n = split(line, a, "@")
             # if a[1] == "", means leading @,
             # don't add one back in.
             for (i = 2; i <= n; i++) {
                 if (a[i] == "") { # was an @@
                     a[i] = "@"
                     if (a[i+1] == "")
                         i++
                 }
             }
             print join(a, 1, n, SUBSEP) > curfile
         }
     }

   An important thing to note is the use of the `>' redirection.
Output done with `>' only opens the file once; it stays open and
subsequent output is appended to the file (Note: Redirecting Output of
`print' and `printf'.).  This makes it easy to mix program
text and explanatory prose for the same sample source file (as has been
done here!) without any hassle.  The file is only closed when a new
data file name is encountered or at the end of the input file.

   Finally, the function `unexpected_eof' prints an appropriate error
message and then exits.  The `END' rule handles the final cleanup,
closing the open file:

     function unexpected_eof() {
         printf("%s:%d: unexpected EOF or error\n",
             FILENAME, FNR) > "/dev/stderr"
         exit 1
     }
     
     END {
         if (curfile)
             close(curfile)
     }


automatically generated by info2www version 1.2.2.9