GNU Info

Info Node: (gmp.info)Karatsuba Multiplication

(gmp.info)Karatsuba Multiplication


Next: Toom-Cook 3-Way Multiplication Prev: Basecase Multiplication Up: Multiplication Algorithms
Enter node , (file) or (file)node

Karatsuba Multiplication
------------------------

   The Karatsuba multiplication algorithm is described in Knuth section
4.3.3 part A, and various other textbooks.  A brief description is
given here.

   The inputs x and y are treated as each split into two parts of equal
length (or the most significant part one limb shorter if N is odd).

      high              low
     +----------+----------+
     |    x1    |    x0    |
     +----------+----------+
     
     +----------+----------+
     |    y1    |    y0    |
     +----------+----------+

   Let b be the power of 2 where the split occurs, ie. if x0 is k limbs
(y0 the same) then b=2^(k*mp_bits_per_limb).  With that x=x1*b+x0 and
y=y1*b+y0, and the following holds,

     x*y = (b^2+b)*x1*y1 - b*(x1-x0)*(y1-y0) + (b+1)*x0*y0

   This formula means doing only three multiplies of (N/2)x(N/2) limbs,
whereas a basecase multiply of NxN limbs is equivalent to four
multiplies of (N/2)x(N/2).  The factors (b^2+b) etc represent the
positions where the three products must be added.

      high                              low
     +--------+--------+ +--------+--------+
     |      x1*y1      | |      x0*y0      |
     +--------+--------+ +--------+--------+
               +--------+--------+
           add |      x1*y1      |
               +--------+--------+
               +--------+--------+
           add |      x0*y0      |
               +--------+--------+
               +--------+--------+
           sub | (x1-x0)*(y1-y0) |
               +--------+--------+

   The term (x1-x0)*(y1-y0) is best calculated as an absolute value,
and the sign used to choose to add or subtract.  Notice the sum
high(x0*y0)+low(x1*y1) occurs twice, so it's possible to do 5*k limb
additions, rather than 6*k, but in GMP extra function call overheads
outweigh the saving.

   Squaring is similar to multiplying, but with x=y the formula reduces
to an equivalent with three squares,

     x^2 = (b^2+b)*x1^2 - b*(x1-x0)^2 + (b+1)*x0^2

   The final result is accumulated from those three squares the same
way as for the three multiplies above.  The middle term (x1-x0)^2 is now
always positive.

   A similar formula for both multiplying and squaring can be
constructed with a middle term (x1+x0)*(y1+y0).  But those sums can
exceed k limbs, leading to more carry handling and additions than the
form above.

   Karatsuba multiplication is asymptotically an O(N^1.585) algorithm,
the exponent being log(3)/log(2), representing 3 multiplies each 1/2
the size of the inputs.  This is a big improvement over the basecase
multiply at O(N^2) and the advantage soon overcomes the extra additions
Karatsuba performs.

   `KARATSUBA_MUL_THRESHOLD' can be as little as 10 limbs.  The `SQR'
threshold is usually about twice the `MUL'.  The basecase algorithm will
take a time of the form M(N) = a*N^2 + b*N + c and the Karatsuba
algorithm K(N) = 3*M(N/2) + d*N + e.  Clearly per-crossproduct speedups
in the basecase code reduce a and decrease the threshold, but linear
style speedups reducing b will actually increase the threshold.  The
latter can be seen for instance when adding an optimized
`mpn_sqr_diagonal' to `mpn_sqr_basecase'.  Of course all speedups
reduce total time, and in that sense the algorithm thresholds are
merely of academic interest.


automatically generated by info2www version 1.2.2.9