GNU Info

Info Node: (g77-295.info)Floating-point Errors

(g77-295.info)Floating-point Errors


Prev: Strange Behavior at Run Time Up: But-bugs
Enter node , (file) or (file)node

Floating-point Errors
---------------------

   Some programs appear to produce inconsistent floating-point results
compiled by `g77' versus by other compilers.

   Often the reason for this behavior is the fact that floating-point
values are represented on almost all Fortran systems by
*approximations*, and these approximations are inexact even for
apparently simple values like 0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9,
1.1, and so on.  Most Fortran systems, including all current ports of
`g77', use binary arithmetic to represent these approximations.

   Therefore, the exact value of any floating-point approximation as
manipulated by `g77'-compiled code is representable by adding some
combination of the values 1.0, 0.5, 0.25, 0.125, and so on (just keep
dividing by two) through the precision of the fraction (typically
around 23 bits for `REAL(KIND=1)', 52 for `REAL(KIND=2)'), then
multiplying the sum by a integral power of two (in Fortran, by `2**N')
that typically is between -127 and +128 for `REAL(KIND=1)' and -1023
and +1024 for `REAL(KIND=2)', then multiplying by -1 if the number is
negative.

   So, a value like 0.2 is exactly represented in decimal--since it is
a fraction, `2/10', with a denominator that is compatible with the base
of the number system (base 10).  However, `2/10' cannot be represented
by any finite number of sums of any of 1.0, 0.5, 0.25, and so on, so
0.2 cannot be exactly represented in binary notation.

   (On the other hand, decimal notation can represent any binary number
in a finite number of digits.  Decimal notation cannot do so with
ternary, or base-3, notation, which would represent floating-point
numbers as sums of any of `1/1', `1/3', `1/9', and so on.  After all,
no finite number of decimal digits can exactly represent `1/3'.
Fortunately, few systems use ternary notation.)

   Moreover, differences in the way run-time I/O libraries convert
between these approximations and the decimal representation often used
by programmers and the programs they write can result in apparent
differences between results that do not actually exist, or exist to
such a small degree that they usually are not worth worrying about.

   For example, consider the following program:

     PRINT *, 0.2
     END

   When compiled by `g77', the above program might output `0.20000003',
while another compiler might produce a executable that outputs `0.2'.

   This particular difference is due to the fact that, currently,
conversion of floating-point values by the `libg2c' library, used by
`g77', handles only double-precision values.

   Since `0.2' in the program is a single-precision value, it is
converted to double precision (still in binary notation) before being
converted back to decimal.  The conversion to binary appends *binary*
zero digits to the original value--which, again, is an inexact
approximation of 0.2--resulting in an approximation that is much less
exact than is connoted by the use of double precision.

   (The appending of binary zero digits has essentially the same effect
as taking a particular decimal approximation of `1/3', such as
`0.3333333', and appending decimal zeros to it, producing
`0.33333330000000000'.  Treating the resulting decimal approximation as
if it really had 18 or so digits of valid precision would make it seem
a very poor approximation of `1/3'.)

   As a result of converting the single-precision approximation to
double precision by appending binary zeros, the conversion of the
resulting double-precision value to decimal produces what looks like an
incorrect result, when in fact the result is *inexact*, and is probably
no less inaccurate or imprecise an approximation of 0.2 than is
produced by other compilers that happen to output the converted value
as "exactly" `0.2'.  (Some compilers behave in a way that can make them
appear to retain more accuracy across a conversion of a single-precision
constant to double precision.  Note: Context-Sensitive Constants, to
see why this practice is illusory and even dangerous.)

   Note that a more exact approximation of the constant is computed
when the program is changed to specify a double-precision constant:

     PRINT *, 0.2D0
     END

   Future versions of `g77' and/or `libg2c' might convert
single-precision values directly to decimal, instead of converting them
to double precision first.  This would tend to result in output that is
more consistent with that produced by some other Fortran
implementations.

   A useful source of information on floating-point computation is David
Goldberg, `What Every Computer Scientist Should Know About
Floating-Point Arithmetic', Computing Surveys, 23, March 1991, pp.
5-48.  An online version is available at `http://docs.sun.com/', and
there is a supplemented version, in PostScript form, at
`http://www.validgh.com/goldberg/paper.ps'.

   Information related to the IEEE 754 floating-point standard by a
leading light can be found at
`http://http.cs.berkeley.edu/%7Ewkahan/ieee754status/'; see also slides
from the short course referenced from
`http://http.cs.berkeley.edu/%7Efateman/'.
`http://www.linuxsupportline.com/%7Ebillm/' has a brief guide to IEEE
754, a somewhat x86-GNU/Linux-specific FAQ, and library code for
GNU/Linux x86 systems.

   The supplement to the PostScript-formatted Goldberg document,
referenced above, is available in HTML format.  See `Differences Among
IEEE 754 Implementations' by Doug Priest, available online at
`http://www.validgh.com/goldberg/addendum.html'.  This document
explores some of the issues surrounding computing of extended (80-bit)
results on processors such as the x86, especially when those results
are arbitrarily truncated to 32-bit or 64-bit values by the compiler as
"spills".

   (*Note:* `g77' specifically, and `gcc' generally, does arbitrarily
truncate 80-bit results during spills as of this writing.  It is not
yet clear whether a future version of the GNU compiler suite will offer
80-bit spills as an option, or perhaps even as the default behavior.)

   The GNU C library provides routines for controlling the FPU, and
other documentation about this.

   Note: Floating-point precision, regarding IEEE 754 conformance.


automatically generated by info2www version 1.2.2.9