Basecase Multiplication
-----------------------
Basecase NxM multiplication is a straightforward rectangular set of
cross-products, the same as long multiplication done by hand and for
that reason sometimes known as the schoolbook or grammar school method.
This is an O(N*M) algorithm. See Knuth section 4.3.1 algorithm M
(Note:References), and the `mpn/generic/mul_basecase.c' code.
Assembler implementations of `mpn_mul_basecase' are essentially the
same as the generic C code, but have all the usual assembler tricks and
obscurities introduced for speed.
A square can be done in roughly half the time of a multiply, by
using the fact that the cross products above and below the diagonal are
the same. A triangle of products below the diagonal is formed, doubled
(left shift by one bit), and then the products on the diagonal added.
This can be seen in `mpn/generic/sqr_basecase.c'. Again the assembler
implementations take essentially the same approach.
u0 u1 u2 u3 u4
+---+---+---+---+---+
u0 | d | | | | |
+---+---+---+---+---+
u1 | | d | | | |
+---+---+---+---+---+
u2 | | | d | | |
+---+---+---+---+---+
u3 | | | | d | |
+---+---+---+---+---+
u4 | | | | | d |
+---+---+---+---+---+
In practice squaring isn't a full 2x faster than multiplying, it's
usually around 1.5x. Less than 1.5x probably indicates
`mpn_sqr_basecase' wants improving on that CPU.
On some CPUs `mpn_mul_basecase' can be faster than the generic C
`mpn_sqr_basecase'. `BASECASE_SQR_THRESHOLD' is the size at which to
use `mpn_sqr_basecase', this will be zero if that routine should be
used always.