Assembler Basics
----------------
`mpn_addmul_1' and `mpn_submul_1' are the most important routines
for overall GMP performance. All multiplications and divisions come
down to repeated calls to these. `mpn_add_n', `mpn_sub_n',
`mpn_lshift' and `mpn_rshift' are next most important.
On some CPUs assembler versions of the internal functions
`mpn_mul_basecase' and `mpn_sqr_basecase' give significant speedups,
mainly through avoiding function call overheads. They can also
potentially make better use of a wide superscalar processor.
The restrictions on overlaps between sources and destinations (Note:Low-level Functions) are designed to facilitate a variety of
implementations. For example, knowing `mpn_add_n' won't have partly
overlapping sources and destination means reading can be done far ahead
of writing on superscalar processors, and loops can be vectorized on a
vector processor, depending on the carry handling.