GNU Info

Info Node: (fftw.info)gcc and Pentium hacks

(fftw.info)gcc and Pentium hacks


Next: Customizing the timer Prev: Installing FFTW in both single and double precision Up: Installation and Customization
Enter node , (file) or (file)node

`gcc' and Pentium hacks
=======================

   The `configure' option `--enable-i386-hacks' enables specific
optimizations for the Pentium and later x86 CPUs under gcc, which can
significantly improve performance of double-precision transforms.
Specifically, we have tested these hacks on Linux with `gcc' 2.[789]
and versions of `egcs' since 1.0.3.  These optimizations affect only
the performance and not the correctness of FFTW (i.e. it is always safe
to try them out).

   These hacks provide a workaround to the incorrect alignment of local
`double' variables in `gcc'.  The compiler aligns these variables to
multiples of 4 bytes, but execution is much faster (on Pentium and
PentiumPro) if `double's are aligned to a multiple of 8 bytes.  By
carefully counting the number of variables allocated by the compiler in
performance-critical regions of the code, we have been able to
introduce dummy allocations (using `alloca') that align the stack
properly.  The hack depends crucially on the compiler flags that are
used.  For example, it won't work without `-fomit-frame-pointer'.

   In principle, these hacks are no longer required under `gcc'
versions 2.95 and later, which automatically align the stack correctly
(see `-mpreferred-stack-boundary' in the `gcc' manual).  However, we
have encountered a
bug (http://egcs.cygnus.com/ml/gcc-bugs/1999-11/msg00259.html) in the
stack alignment of versions 2.95.[012] that causes FFTW's stack to be
misaligned under some circumstances.  The `configure' script
automatically detects this bug and disables `gcc''s stack alignment in
favor of our own hacks when `--enable-i386-hacks' is used.

   The `fftw_test' program outputs speed measurements that you can use
to see if these hacks are beneficial.

   The `configure' option `--enable-pentium-timer' enables the use of
the Pentium and PentiumPro cycle counter for timing purposes.  In order
to get correct results, you must define `FFTW_CYCLES_PER_SEC' in
`fftw/config.h' to be the clock speed of your processor; the resulting
FFTW library will be nonportable.  The use of this option is
deprecated.  On serious operating systems (such as Linux), FFTW uses
`gettimeofday()', which has enough resolution and is portable.  (Note
that Win32 has its own high-resolution timing routines as well.  FFTW
contains unsupported code to use these routines.)


automatically generated by info2www version 1.2.2.9