Copyright (C) 2000-2012 |
GNU Info (gawk.info)Floating Point IssuesFloating-Point Number Caveats ============================= As mentioned earlier, floating-point numbers represent what are called "real" numbers; i.e., those that have a fractional part. `awk' uses double-precision floating-point numbers to represent all numeric values. This minor node describes some of the issues involved in using floating-point numbers. There is a very nice paper on floating-point arithmetic by David Goldberg, `What Every Computer Scientist Should Know About Floating-point Arithmetic', `ACM Computing Surveys' *23*, 1 (1991-03), 5-48.(1) This is worth reading if you are interested in the details, but it does require a background in Computer Science. Internally, `awk' keeps both the numeric value (double-precision floating-point) and the string value for a variable. Separately, `awk' keeps track of what type the variable has (Note: Variable Typing and Comparison Expressions.), which plays a role in how variables are used in comparisons. It is important to note that the string value for a number may not reflect the full value (all the digits) that the numeric value actually contains. The following program (`values.awk') illustrates this: { $1 = $2 + $3 # see it for what it is printf("$1 = %.12g\n", $1) # use CONVFMT a = "<" $1 ">" print "a =", a # use OFMT print "$1 =", $1 } This program shows the full value of the sum of `$2' and `$3' using `printf', and then prints the string values obtained from both automatic conversion (via `CONVFMT') and from printing (via `OFMT'). Here is what happens when the program is run: $ echo 2 3.654321 1.2345678 | awk -f values.awk -| $1 = 4.8888888 -| a = <4.88889> -| $1 = 4.88889 This makes it clear that the full numeric value is different from what the default string representations show. `CONVFMT''s default value is `"%.6g"', which yields a value with at least six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, most of the time, 17 digits is enough to capture a floating-point number's value exactly.(2) Unlike numbers in the abstract sense (such as what you studied in high school or college math), numbers stored in computers are limited in certain ways. They cannot represent an infinite number of digits, nor can they always represent things exactly. In particular, floating-point numbers cannot always represent values exactly. Here is an example: $ awk '{ printf("%010d\n", $1 * 100) }' 515.79 -| 0000051579 515.80 -| 0000051579 515.81 -| 0000051580 515.82 -| 0000051582 Ctrl-d This shows that some values can be represented exactly, whereas others are only approximated. This is not a "bug" in `awk', but simply an artifact of how computers represent numbers. Another peculiarity of floating-point numbers on modern systems is that they often have more than one representation for the number zero! In particular, it is possible to represent "minus zero" as well as regular, or "positive" zero. This example shows that negative and positive zero are distinct values when stored internally, but that they are in fact equal to each other, as well as to "regular" zero: $ gawk 'BEGIN { mz = -0 ; pz = 0 > printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz > printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0 > }' -| -0 = -0, +0 = 0, (-0 == +0) -> 1 -| mz == 0 -> 1, pz == 0 -> 1 It helps to keep this in mind should you process numeric data that contains negative zero values; the fact that the zero is negative is noted and can affect comparisons. ---------- Footnotes ---------- (1) `http://www.validgh.com/goldberg/paper.ps' (2) Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this. automatically generated by info2www version 1.2.2.9 |