Copyright (C) 2000-2012 |
GNU Info (gawk.info)Regexp Field SplittingUsing Regular Expressions to Separate Fields -------------------------------------------- The previous node discussed the use of single characters or simple strings as the value of `FS'. More generally, the value of `FS' may be a string containing any regular expression. In this case, each match in the record for the regular expression separates fields. For example, the assignment: FS = ", \t" makes every area of an input line that consists of a comma followed by a space and a tab into a field separator. (`\t' is an "escape sequence" that stands for a tab; Note: Escape Sequences, for the complete list of similar escape sequences.) For a less trivial example of a regular expression, try using single spaces to separate fields the way single commas are used. `FS' can be set to `"[ ]"' (left bracket, space, right bracket). This regular expression matches a single space and nothing else (Note: Regular Expressions.). There is an important difference between the two cases of `FS = " "' (a single space) and `FS = "[ \t\n]+"' (a regular expression matching one or more spaces, tabs, or newlines). For both values of `FS', fields are separated by "runs" (multiple adjacent occurrences) of spaces, tabs, and/or newlines. However, when the value of `FS' is `" "', `awk' first strips leading and trailing whitespace from the record and then decides where the fields are. For example, the following pipeline prints `b': $ echo ' a b c d ' | awk '{ print $2 }' -| b However, this pipeline prints `a' (note the extra spaces around each letter): $ echo ' a b c d ' | awk 'BEGIN { FS = "[ \t\n]+" } > { print $2 }' -| a In this case, the first field is "null" or empty. The stripping of leading and trailing whitespace also comes into play whenever `$0' is recomputed. For instance, study this pipeline: $ echo ' a b c d' | awk '{ print; $2 = $2; print }' -| a b c d -| a b c d The first `print' statement prints the record as it was read, with leading whitespace intact. The assignment to `$2' rebuilds `$0' by concatenating `$1' through `$NF' together, separated by the value of `OFS'. Because the leading whitespace was ignored when finding `$1', it is not part of the new `$0'. Finally, the last `print' statement prints the new `$0'. automatically generated by info2www version 1.2.2.9 |