The innovation (if it can be so called) in FFTW consists in having an
interpreter execute the transform. The program for the interpreter
(the plan) is computed at runtime according to the
characteristics of your machine/compiler. This peculiar software
architecture allows FFTW to adapt itself to almost any machine.
For more details, see the paper "The Fastest Fourier Transform in
the West", by M. Frigo and S. G. Johnson, available at
the FFTW web page. See also "FFTW: An Adaptive Software Architecture for the
FFT", in ICASSP '98.
This is a complex question, and there is no simple answer. In fact,
the authors do not fully know the answer, either. In addition to many
small performance hacks throughout FFTW, there are three general
reasons for FFTW's speed.
FFTW uses a code generator to produce highly-optimized
routines for computing small transforms.
FFTW uses explicit divide-and-conquer to take advantage
of the memory hierarchy.
For more details on these three topics, see the paper "The
Fastest Fourier Transform in the West", by M. Frigo and S. G. Johnson,
available at the FFTW web page.
wisdom is the name of the mechanism that FFTW uses to save
and restore plans. Rather than just saving plans, FFTW remembers what
it learns about your machine, and becomes wiser and wiser as time
passes by. You can save wisdom for later use.
wisdom could be implemented with less effort than a general
plan-saving mechanism would have required. In addition,
wisdom provides additional benefits. For example, if you
are planning transforms of size 1024, and later you want a transform
of size 2048, most of the calculations of the 1024 case can be reused.
In short, wisdom does more things with less effort, and seemed like The Right Thing to do.