Assembler Coding
================
The assembler subroutines in GMP are the most significant source of
speed at small to moderate sizes. At larger sizes algorithm selection
becomes more important, but of course speedups in low level routines
will still speed up everything proportionally.
Carry handling and widening multiplies that are important for GMP
can't be easily expressed in C. GCC `asm' blocks help a lot and are
provided in `longlong.h', but hand coding low level routines invariably
offers a speedup over generic C by a factor of anything from 2 to 10.