GNU Info

Info Node: (fftw.info)Usage of MPI FFTW for Complex Multi-dimensional Transforms

(fftw.info)Usage of MPI FFTW for Complex Multi-dimensional Transforms


Next: MPI Data Layout Prev: MPI FFTW Installation Up: MPI FFTW
Enter node , (file) or (file)node

Usage of MPI FFTW for Complex Multi-dimensional Transforms
----------------------------------------------------------

   Usage of the MPI FFTW routines is similar to that of the uniprocessor
FFTW.  We assume that the reader already understands the usage of the
uniprocessor FFTW routines, described elsewhere in this manual.  Some
familiarity with MPI is also helpful.

   A typical program performing a complex two-dimensional MPI transform
might look something like:

     #include <fftw_mpi.h>
     
     int main(int argc, char **argv)
     {
           const int NX = ..., NY = ...;
           fftwnd_mpi_plan plan;
           fftw_complex *data;
     
           MPI_Init(&argc,&argv);
     
           plan = fftw2d_mpi_create_plan(MPI_COMM_WORLD,
                                         NX, NY,
                                         FFTW_FORWARD, FFTW_ESTIMATE);
     
           ...allocate and initialize data...
     
           fftwnd_mpi(p, 1, data, NULL, FFTW_NORMAL_ORDER);
     
           ...
     
           fftwnd_mpi_destroy_plan(plan);
           MPI_Finalize();
     }

   The calls to `MPI_Init' and `MPI_Finalize' are required in all MPI
programs; see the MPI home page (http://www.mcs.anl.gov/mpi/) for more
information.  Note that all of your processes run the program in
parallel, as a group; there is no explicit launching of
threads/processes in an MPI program.

   As in the ordinary FFTW, the first thing we do is to create a plan
(of type `fftwnd_mpi_plan'), using:

     fftwnd_mpi_plan fftw2d_mpi_create_plan(MPI_Comm comm,
                                            int nx, int ny,
                                            fftw_direction dir, int flags);

   Except for the first argument, the parameters are identical to those
of `fftw2d_create_plan'.  (There are also analogous
`fftwnd_mpi_create_plan' and `fftw3d_mpi_create_plan' functions.
Transforms of any rank greater than one are supported.)  The first
argument is an MPI "communicator", which specifies the group of
processes that are to be involved in the transform; the standard
constant `MPI_COMM_WORLD' indicates all available processes.

   Next, one has to allocate and initialize the data.  This is somewhat
tricky, because the transform data is distributed across the processes
involved in the transform.  It is discussed in detail by the next
section (Note: MPI Data Layout.).

   The actual computation of the transform is performed by the function
`fftwnd_mpi', which differs somewhat from its uniprocessor equivalent
and is described by:

     void fftwnd_mpi(fftwnd_mpi_plan p,
                     int n_fields,
                     fftw_complex *local_data, fftw_complex *work,
                     fftwnd_mpi_output_order output_order);

   There are several things to notice here:

   * First of all, all `fftw_mpi' transforms are in-place: the output is
     in the `local_data' parameter, and there is no need to specify
     `FFTW_IN_PLACE' in the plan flags.

   * The MPI transforms also only support a limited subset of the
     `howmany'/`stride'/`dist' functionality of the uniprocessor
     routines: the `n_fields' parameter is equivalent to
     `howmany=n_fields', `stride=n_fields', and `dist=1'.
     (Conceptually, the `n_fields' parameter allows you to transform an
     array of contiguous vectors, each with length `n_fields'.)
     `n_fields' is `1' if you are only transforming a single, ordinary
     array.

   * The `work' parameter is an optional workspace.  If it is not
     `NULL', it should be exactly the same size as the `local_data'
     array.  If it is provided, FFTW is able to use the built-in
     `MPI_Alltoall' primitive for (often) greater efficiency at the
     expense of extra storage space.

   * Finally, the last parameter specifies whether the output data has
     the same ordering as the input data (`FFTW_NORMAL_ORDER'), or if
     it is transposed (`FFTW_TRANSPOSED_ORDER').  Leaving the data
     transposed results in significant performance improvements due to
     a saved communication step (needed to un-transpose the data).
     Specifically, the first two dimensions of the array are
     transposed, as is described in more detail by the next section.

   The output of `fftwnd_mpi' is identical to that of the corresponding
uniprocessor transform.  In particular, you should recall our
conventions for normalization and the sign of the transform exponent.

   The same plan can be used to compute many transforms of the same
size.  After you are done with it, you should deallocate it by calling
`fftwnd_mpi_destroy_plan'.

   Important: The FFTW MPI routines must be called in the same order by
all processes involved in the transform.  You should assume that they
all are blocking, as if each contained a call to `MPI_Barrier'.

   Programs using the FFTW MPI routines should be linked with
`-lfftw_mpi -lfftw -lm' on Unix, in addition to whatever libraries are
required for MPI.


automatically generated by info2www version 1.2.2.9