Multi-threaded FFTW
===================
In this section we document the parallel FFTW routines for
shared-memory threads on SMP hardware. These routines, which support
parallel one- and multi-dimensional transforms of both real and complex
data, are the easiest way to take advantage of multiple processors with
FFTW. They work just like the corresponding uniprocessor transform
routines, except that they take the number of parallel threads to use
as an extra parameter. Any program that uses the uniprocessor FFTW can
be trivially modified to use the multi-threaded FFTW.