Copyright (C) 2000-2012 |
GNU Info (gmp.info)Assembler Cache HandlingCache Handling -------------- GMP aims to perform well both on operands that fit entirely in L1 cache and those that don't. In the assembler subroutines this means prefetching, either always or when large enough operands are presented. Pre-fetching sources combines well with loop unrolling, since a prefetch can be initiated once per unrolled loop (or more than once if the loop processes more than one cache line). Pre-fetching destinations won't be necessary if the CPU has a big enough store queue. Older processors without a write-allocate L1 however will want destination prefetching, to avoid repeated write-throughs, unless they can keep up with the rate at which destination limbs are produced. The distance ahead to prefetch will be determined by the rate data is processed versus the time it takes to bring a line up to L1. Naturally the net data rate from L2 or RAM will always limit the rate of data processing. Prefetch distance may also be limited by the number of prefetches the processor can have in progress at any one time. If a special prefetch instruction doesn't exist then a plain load can be used, so long as the CPU supports out-of-order loads. But this may mean having a second copy of a loop so that the last few limbs can be processed without prefetching, since reading past the end of an operand must be avoided. automatically generated by info2www version 1.2.2.9 |