GNU Info

Info Node: (libc.info)Floating Point Concepts

(libc.info)Floating Point Concepts


Next: Floating Point Parameters Up: Floating Type Macros
Enter node , (file) or (file)node

Floating Point Representation Concepts
......................................

   This section introduces the terminology for describing floating point
representations.

   You are probably already familiar with most of these concepts in
terms of scientific or exponential notation for floating point numbers.
For example, the number `123456.0' could be expressed in exponential
notation as `1.23456e+05', a shorthand notation indicating that the
mantissa `1.23456' is multiplied by the base `10' raised to power `5'.

   More formally, the internal representation of a floating point number
can be characterized in terms of the following parameters:

   * The "sign" is either `-1' or `1'.

   * The "base" or "radix" for exponentiation, an integer greater than
     `1'.  This is a constant for a particular representation.

   * The "exponent" to which the base is raised.  The upper and lower
     bounds of the exponent value are constants for a particular
     representation.

     Sometimes, in the actual bits representing the floating point
     number, the exponent is "biased" by adding a constant to it, to
     make it always be represented as an unsigned quantity.  This is
     only important if you have some reason to pick apart the bit
     fields making up the floating point number by hand, which is
     something for which the GNU library provides no support.  So this
     is ignored in the discussion that follows.

   * The "mantissa" or "significand" is an unsigned integer which is a
     part of each floating point number.

   * The "precision" of the mantissa.  If the base of the representation
     is B, then the precision is the number of base-B digits in the
     mantissa.  This is a constant for a particular representation.

     Many floating point representations have an implicit "hidden bit"
     in the mantissa.  This is a bit which is present virtually in the
     mantissa, but not stored in memory because its value is always 1
     in a normalized number.  The precision figure (see above) includes
     any hidden bits.

     Again, the GNU library provides no facilities for dealing with such
     low-level aspects of the representation.

   The mantissa of a floating point number represents an implicit
fraction whose denominator is the base raised to the power of the
precision.  Since the largest representable mantissa is one less than
this denominator, the value of the fraction is always strictly less
than `1'.  The mathematical value of a floating point number is then
the product of this fraction, the sign, and the base raised to the
exponent.

   We say that the floating point number is "normalized" if the
fraction is at least `1/B', where B is the base.  In other words, the
mantissa would be too large to fit if it were multiplied by the base.
Non-normalized numbers are sometimes called "denormal"; they contain
less precision than the representation normally can hold.

   If the number is not normalized, then you can subtract `1' from the
exponent while multiplying the mantissa by the base, and get another
floating point number with the same value.  "Normalization" consists of
doing this repeatedly until the number is normalized.  Two distinct
normalized floating point numbers cannot be equal in value.

   (There is an exception to this rule: if the mantissa is zero, it is
considered normalized.  Another exception happens on certain machines
where the exponent is as small as the representation can hold.  Then it
is impossible to subtract `1' from the exponent, so a number may be
normalized even if its fraction is less than `1/B'.)


automatically generated by info2www version 1.2.2.9