Info Node: (gawk.info)Regexp Field Splitting

www.fifi.org
    Documentation
        Manpages
        GNU Info
        Debian document tree
        Whole document tree
    Trigance web page
    Public services
    User info
    Mailing lists
    Secure server
    Multilingual usage

Validate HTML
Validate CSS

(gawk.info)Regexp Field Splitting

Using Regular Expressions to Separate Fields -------------------------------------------- The previous node discussed the use of single characters or simple strings as the value of `FS'. More generally, the value of `FS' may be a string containing any regular expression. In this case, each match in the record for the regular expression separates fields. For example, the assignment: FS = ", \t" makes every area of an input line that consists of a comma followed by a space and a tab into a field separator. (`\t' is an "escape sequence" that stands for a tab; Note: Escape Sequences, for the complete list of similar escape sequences.) For a less trivial example of a regular expression, try using single spaces to separate fields the way single commas are used. `FS' can be set to `"[ ]"' (left bracket, space, right bracket). This regular expression matches a single space and nothing else (Note: Regular Expressions.). There is an important difference between the two cases of `FS = " "' (a single space) and `FS = "[ \t\n]+"' (a regular expression matching one or more spaces, tabs, or newlines). For both values of `FS', fields are separated by "runs" (multiple adjacent occurrences) of spaces, tabs, and/or newlines. However, when the value of `FS' is `" "', `awk' first strips leading and trailing whitespace from the record and then decides where the fields are. For example, the following pipeline prints `b': $ echo ' a b c d ' | awk '{ print $2 }' -| b However, this pipeline prints `a' (note the extra spaces around each letter): $ echo ' a b c d ' | awk 'BEGIN { FS = "[ \t\n]+" } > { print $2 }' -| a In this case, the first field is "null" or empty. The stripping of leading and trailing whitespace also comes into play whenever `$0' is recomputed. For instance, study this pipeline: $ echo ' a b c d' | awk '{ print; $2 = $2; print }' -| a b c d -| a b c d The first `print' statement prints the record as it was read, with leading whitespace intact. The assignment to `$2' rebuilds `$0' by concatenating `$1' through `$NF' together, separated by the value of `OFS'. Because the leading whitespace was ignored when finding `$1', it is not part of the new `$0'. Finally, the last `print' statement prints the new `$0'.

automatically generated by

info2www

version 1.2.2.9