GNU Info

Info Node: (libc.info)Floating Point Numbers

(libc.info)Floating Point Numbers


Next: Floating Point Classes Prev: Integer Division Up: Arithmetic
Enter node , (file) or (file)node

Floating Point Numbers
======================

   Most computer hardware has support for two different kinds of
numbers: integers (...-3, -2, -1, 0, 1, 2, 3...) and floating-point
numbers.  Floating-point numbers have three parts: the "mantissa", the
"exponent", and the "sign bit".  The real number represented by a
floating-point value is given by (s ? -1 : 1) * 2^e * M where s is the
sign bit, e the exponent, and M the mantissa.  Note: Floating Point
Concepts, for details.  (It is possible to have a different "base"
for the exponent, but all modern hardware uses 2.)

   Floating-point numbers can represent a finite subset of the real
numbers.  While this subset is large enough for most purposes, it is
important to remember that the only reals that can be represented
exactly are rational numbers that have a terminating binary expansion
shorter than the width of the mantissa.  Even simple fractions such as
1/5 can only be approximated by floating point.

   Mathematical operations and functions frequently need to produce
values that are not representable.  Often these values can be
approximated closely enough for practical purposes, but sometimes they
can't.  Historically there was no way to tell when the results of a
calculation were inaccurate.  Modern computers implement the IEEE 754
standard for numerical computations, which defines a framework for
indicating to the program when the results of calculation are not
trustworthy.  This framework consists of a set of "exceptions" that
indicate why a result could not be represented, and the special values
"infinity" and "not a number" (NaN).


automatically generated by info2www version 1.2.2.9