(gmp.info)Assembler Carry Propagation


Carry Propagation
-----------------

   The problem that presents most challenges in GMP is propagating
carries from one limb to the next.  In functions like `mpn_addmul_1' and
`mpn_add_n', carries are the only dependencies between limb operations.

   On processors with carry flags, a straightforward CISC style `adc' is
generally best.  AMD K6 `mpn_addmul_1' however is an example of an
unusual set of circumstances where a branch works out better.

   On RISC processors generally an add and compare for overflow is
used.  This sort of thing can be seen in `mpn/generic/aors_n.c'.  Some
carry propagation schemes require 4 instructions, meaning at least 4
cycles per limb, but other schemes may use just 1 or 2.  On wide
superscalar processors performance may be completely determined by the
number of dependent instructions between carry-in and carry-out for
each limb.

   On vector processors good use can be made of the fact that a carry
bit only very rarely propagates more than one limb.  When adding a
single bit to a limb, there's only a carry out if that limb was
`0xFF...FF' which on random data will be only 1 in 2^mp_bits_per_limb.
`mpn/cray/add_n.c' is an example of this, it adds all limbs in
parallel, adds one set of carry bits in parallel and then only rarely
needs to fall through to a loop propagating further carries.

   On the x86s, GCC (as of version 2.95.2) doesn't generate
particularly good code for the RISC style idioms that are necessary to
handle carry bits in C.  Often conditional jumps are generated where
`adc' or `sbb' forms would be better.  And so unfortunately almost any
loop involving carry bits needs to be coded in assembler for best
results.

automatically generated by info2www version 1.2.2.9