GNU Info

Info Node: (gawk.info)Typing and Comparison

(gawk.info)Typing and Comparison


Next: Boolean Ops Prev: Truth Values Up: Expressions
Enter node , (file) or (file)node

Variable Typing and Comparison Expressions
==========================================

     The Guide is definitive. Reality is frequently inaccurate.
     The Hitchhiker's Guide to the Galaxy

   Unlike other programming languages, `awk' variables do not have a
fixed type. Instead, they can be either a number or a string, depending
upon the value that is assigned to them.

   The 1992 POSIX standard introduced the concept of a "numeric
string", which is simply a string that looks like a number--for
example, `" +2"'.  This concept is used for determining the type of a
variable.  The type of the variable is important because the types of
two variables determine how they are compared.  In `gawk', variable
typing follows these rules:

   * A numeric constant or the result of a numeric operation has the
     NUMERIC attribute.

   * A string constant or the result of a string operation has the
     STRING attribute.

   * Fields, `getline' input, `FILENAME', `ARGV' elements, `ENVIRON'
     elements, and the elements of an array created by `split' that are
     numeric strings have the STRNUM attribute.  Otherwise, they have
     the STRING attribute.  Uninitialized variables also have the
     STRNUM attribute.

   * Attributes propagate across assignments but are not changed by any
     use.

   The last rule is particularly important. In the following program,
`a' has numeric type, even though it is later used in a string
operation:

     BEGIN {
              a = 12.345
              b = a " is a cute number"
              print b
     }

   When two operands are compared, either string comparison or numeric
comparison may be used. This depends upon the attributes of the
operands, according to the following symmetric matrix:

             +----------------------------------------------
             |       STRING          NUMERIC         STRNUM
     --------+----------------------------------------------
             |
     STRING  |       string          string          string
             |
     NUMERIC |       string          numeric         numeric
             |
     STRNUM  |       string          numeric         numeric
     --------+----------------------------------------------

   The basic idea is that user input that looks numeric--and _only_
user input--should be treated as numeric, even though it is actually
made of characters and is therefore also a string.  Thus, for example,
the string constant `" +3.14"' is a string, even though it looks
numeric, and is _never_ treated as number for comparison purposes.

   In short, when one operand is a "pure" string, such as a string
constant, then a string comparison is performed.  Otherwise, a numeric
comparison is performed.(1)

   "Comparison expressions" compare strings or numbers for
relationships such as equality.  They are written using "relational
operators", which are a superset of those in C.  Here is a table of
them:

`X < Y'
     True if X is less than Y.

`X <= Y'
     True if X is less than or equal to Y.

`X > Y'
     True if X is greater than Y.

`X >= Y'
     True if X is greater than or equal to Y.

`X == Y'
     True if X is equal to Y.

`X != Y'
     True if X is not equal to Y.

`X ~ Y'
     True if the string X matches the regexp denoted by Y.

`X !~ Y'
     True if the string X does not match the regexp denoted by Y.

`SUBSCRIPT in ARRAY'
     True if the array ARRAY has an element with the subscript
     SUBSCRIPT.

   Comparison expressions have the value one if true and zero if false.
When comparing operands of mixed types, numeric operands are converted
to strings using the value of `CONVFMT' (Note: Conversion of Strings
and Numbers.).

   Strings are compared by comparing the first character of each, then
the second character of each, and so on.  Thus, `"10"' is less than
`"9"'.  If there are two strings where one is a prefix of the other,
the shorter string is less than the longer one.  Thus, `"abc"' is less
than `"abcd"'.

   It is very easy to accidentally mistype the `==' operator and leave
off one of the `=' characters.  The result is still valid `awk' code,
but the program does not do what is intended:

     if (a = b)   # oops! should be a == b
        ...
     else
        ...

Unless `b' happens to be zero or the null string, the `if' part of the
test always succeeds.  Because the operators are so similar, this kind
of error is very difficult to spot when scanning the source code.

   The following table of expressions illustrates the kind of comparison
`gawk' performs, as well as what the result of the comparison is:

`1.5 <= 2.0'
     numeric comparison (true)

`"abc" >= "xyz"'
     string comparison (false)

`1.5 != " +2"'
     string comparison (true)

`"1e2" < "3"'
     string comparison (true)

`a = 2; b = "2"'
`a == b'
     string comparison (true)

`a = 2; b = " +2"'

`a == b'
     string comparison (false)

   In the next example:

     $ echo 1e2 3 | awk '{ print ($1 < $2) ? "true" : "false" }'
     -| false

the result is `false' because both `$1' and `$2' are user input.  They
are numeric strings--therefore both have the STRNUM attribute,
dictating a numeric comparison.  The purpose of the comparison rules
and the use of numeric strings is to attempt to produce the behavior
that is "least surprising," while still "doing the right thing."
String comparisons and regular expression comparisons are very
different.  For example:

     x == "foo"

has the value one, or is true if the variable `x' is precisely `foo'.
By contrast:

     x ~ /foo/

has the value one if `x' contains `foo', such as `"Oh, what a fool am
I!"'.

   The righthand operand of the `~' and `!~' operators may be either a
regexp constant (`/.../') or an ordinary expression. In the latter
case, the value of the expression as a string is used as a dynamic
regexp (Note: How to Use Regular Expressions.; also Note:
Using Dynamic Regexps.).

   In modern implementations of `awk', a constant regular expression in
slashes by itself is also an expression.  The regexp `/REGEXP/' is an
abbreviation for the following comparison expression:

     $0 ~ /REGEXP/

   One special place where `/foo/' is _not_ an abbreviation for `$0 ~
/foo/' is when it is the righthand operand of `~' or `!~'.  Note: Using
Regular Expression Constants, where this is
discussed in more detail.

   ---------- Footnotes ----------

   (1) The POSIX standard is under revision.  The revised standard's
rules for typing and comparison are the same as just described for
`gawk'.


automatically generated by info2www version 1.2.2.9