Other Multiplication
--------------------
The 3-way Toom-Cook algorithm described above (Note:Toom-Cook 3-Way
Multiplication) generalizes to split into an arbitrary number of
pieces, as per Knuth section 4.3.3 algorithm C. This is not currently
used, though it's possible a Toom-4 might fit in between Toom-3 and the
FFTs. The notes here are merely for interest.
In general a split into r+1 pieces is made, and evaluations and
pointwise multiplications done at 2*r+1 points. A 4-way split does 7
pointwise multiplies, 5-way does 9, etc. Asymptotically an (r+1)-way
algorithm is O(N^(log(2*r+1)/log(r+1))). Only the pointwise
multiplications count towards big-O complexity, but the time spent in
the evaluate and interpolate stages grows with r and has a significant
practical impact, with the asymptotic advantage of each r realized only
at bigger and bigger sizes. The overheads grow as O(N*r), whereas in
an r=2^k FFT they grow only as O(N*log(r)).
Knuth algorithm C evaluates at points 0,1,2,...,2*r, but exercise 4
uses -r,...,0,...,r and the latter saves some small multiplies in the
evaluate stage (or rather trades them for additions), and has a further
saving of nearly half the interpolate steps. The idea is to separate
odd and even final coefficients and then perform algorithm C steps C7
and C8 on them separately. The divisors at step C7 become j^2 and the
multipliers at C8 become 2*t*j-j^2.
Splitting odd and even parts through positive and negative points
can be thought of as using -1 as a square root of unity. If a 4th root
of unity was available then a further split and speedup would be
possible, but no such root exists for plain integers. Going to complex
integers with i=sqrt(-1) doesn't help, essentially because in cartesian
form it takes three real multiplies to do a complex multiply. The
existence of 2^k'th roots of unity in a suitable ring or field lets the
fast fourier transform keep splitting and get to O(N*log(r)).