GNU Info

Info Node: (gawk.info)Simple Sed

(gawk.info)Simple Sed


Next: Igawk Program Prev: Extract Program Up: Miscellaneous Programs
Enter node , (file) or (file)node

A Simple Stream Editor
----------------------

   The `sed' utility is a "stream editor," a program that reads a
stream of data, makes changes to it, and passes it on.  It is often
used to make global changes to a large file or to a stream of data
generated by a pipeline of commands.  While `sed' is a complicated
program in its own right, its most common use is to perform global
substitutions in the middle of a pipeline:

     command1 < orig.data | sed 's/old/new/g' | command2 > result

   Here, `s/old/new/g' tells `sed' to look for the regexp `old' on each
input line and globally replace it with the text `new', (i.e., all the
occurrences on a line).  This is similar to `awk''s `gsub' function
(Note: String Manipulation Functions.).

   The following program, `awksed.awk', accepts at least two
command-line arguments: the pattern to look for and the text to replace
it with. Any additional arguments are treated as data file names to
process. If none are provided, the standard input is used:

     # awksed.awk --- do s/foo/bar/g using just print
     #    Thanks to Michael Brennan for the idea
     function usage()
     {
         print "usage: awksed pat repl [files...]" > "/dev/stderr"
         exit 1
     }
     
     BEGIN {
         # validate arguments
         if (ARGC < 3)
             usage()
     
         RS = ARGV[1]
         ORS = ARGV[2]
     
         # don't use arguments as files
         ARGV[1] = ARGV[2] = ""
     }
     
     # look ma, no hands!
     {
         if (RT == "")
             printf "%s", $0
         else
             print
     }

   The program relies on `gawk''s ability to have `RS' be a regexp, as
well as on the setting of `RT' to the actual text that terminates the
record (Note: How Input Is Split into Records.).

   The idea is to have `RS' be the pattern to look for. `gawk'
automatically sets `$0' to the text between matches of the pattern.
This is text that we want to keep, unmodified.  Then, by setting `ORS'
to the replacement text, a simple `print' statement outputs the text we
want to keep, followed by the replacement text.

   There is one wrinkle to this scheme, which is what to do if the last
record doesn't end with text that matches `RS'.  Using a `print'
statement unconditionally prints the replacement text, which is not
correct.  However, if the file did not end in text that matches `RS',
`RT' is set to the null string.  In this case, we can print `$0' using
`printf' (*note Using `printf' Statements for Fancier Printing:
Printf.).

   The `BEGIN' rule handles the setup, checking for the right number of
arguments and calling `usage' if there is a problem. Then it sets `RS'
and `ORS' from the command-line arguments and sets `ARGV[1]' and
`ARGV[2]' to the null string, so that they are not treated as file names
(Note: Using `ARGC' and `ARGV'.).

   The `usage' function prints an error message and exits.  Finally,
the single rule handles the printing scheme outlined above, using
`print' or `printf' as appropriate, depending upon the value of `RT'.


automatically generated by info2www version 1.2.2.9