(gmp.info)Assembler Cache Handling


Cache Handling
--------------

   GMP aims to perform well both on operands that fit entirely in L1
cache and those that don't.  In the assembler subroutines this means
prefetching, either always or when large enough operands are presented.

   Pre-fetching sources combines well with loop unrolling, since a
prefetch can be initiated once per unrolled loop (or more than once if
the loop processes more than one cache line).

   Pre-fetching destinations won't be necessary if the CPU has a big
enough store queue.  Older processors without a write-allocate L1
however will want destination prefetching, to avoid repeated
write-throughs, unless they can keep up with the rate at which
destination limbs are produced.

   The distance ahead to prefetch will be determined by the rate data is
processed versus the time it takes to bring a line up to L1.  Naturally
the net data rate from L2 or RAM will always limit the rate of data
processing.  Prefetch distance may also be limited by the number of
prefetches the processor can have in progress at any one time.

   If a special prefetch instruction doesn't exist then a plain load
can be used, so long as the CPU supports out-of-order loads.  But this
may mean having a second copy of a loop so that the last few limbs can
be processed without prefetching, since reading past the end of an
operand must be avoided.

automatically generated by info2www version 1.2.2.9