GNU Info

Info Node: (fftw.info)Usage of MPI FFTW for Complex One-dimensional Transforms

(fftw.info)Usage of MPI FFTW for Complex One-dimensional Transforms


Next: MPI Tips Prev: Usage of MPI FFTW for Real Multi-dimensional Transforms Up: MPI FFTW
Enter node , (file) or (file)node

Usage of MPI FFTW for Complex One-dimensional Transforms
--------------------------------------------------------

   The MPI FFTW also includes routines for parallel one-dimensional
transforms of complex data (only).  Although the speedup is generally
worse than it is for the multi-dimensional routines,(1) these
distributed-memory one-dimensional transforms are especially useful for
performing one-dimensional transforms that don't fit into the memory of
a single machine.

   The usage of these routines is straightforward, and is similar to
that of the multi-dimensional MPI transform functions.  You first
include the header `<fftw_mpi.h>' and then create a plan by calling:

     fftw_mpi_plan fftw_mpi_create_plan(MPI_Comm comm, int n,
                                        fftw_direction dir, int flags);

   The last three arguments are the same as for `fftw_create_plan'
(except that all MPI transforms are automatically `FFTW_IN_PLACE').
The first argument specifies the group of processes you are using, and
is usually `MPI_COMM_WORLD' (all processes).  A plan can be used for
many transforms of the same size, and is destroyed when you are done
with it by calling `fftw_mpi_destroy_plan(plan)'.

   If you don't care about the ordering of the input or output data of
the transform, you can include `FFTW_SCRAMBLED_INPUT' and/or
`FFTW_SCRAMBLED_OUTPUT' in the `flags'.  These save some communications
at the expense of having the input and/or output reordered in an
undocumented way.  For example, if you are performing an FFT-based
convolution, you might use `FFTW_SCRAMBLED_OUTPUT' for the forward
transform and `FFTW_SCRAMBLED_INPUT' for the inverse transform.

   The transform itself is computed by:

     void fftw_mpi(fftw_mpi_plan p, int n_fields,
                   fftw_complex *local_data, fftw_complex *work);

   `n_fields', as in `fftwnd_mpi', is equivalent to `howmany=n_fields',
`stride=n_fields', and `dist=1', and should be `1' when you are
computing the transform of a single array.  `local_data' contains the
portion of the array local to the current process, described below.
`work' is either `NULL' or an array exactly the same size as
`local_data'; in the latter case, FFTW can use the `MPI_Alltoall'
communications primitive which is (usually) faster at the expense of
extra storage.  Upon return, `local_data' contains the portion of the
output local to the current process (see below).

   To find out what portion of the array is stored local to the current
process, you call the following routine:

     void fftw_mpi_local_sizes(fftw_mpi_plan p,
                               int *local_n, int *local_start,
                               int *local_n_after_transform,
                               int *local_start_after_transform,
                               int *total_local_size);

   `total_local_size' is the number of `fftw_complex' elements you
should actually allocate for `local_data' (and `work').  `local_n' and
`local_start' indicate that the current process stores `local_n'
elements corresponding to the indices `local_start' to
`local_start+local_n-1' in the "real" array.  *After the transform, the
process may store a different portion of the array.*  The portion of
the data stored on the process after the transform is given by
`local_n_after_transform' and `local_start_after_transform'.  This data
is exactly the same as a contiguous segment of the corresponding
uniprocessor transform output (i.e. an in-order sequence of sequential
frequency bins).

   Note that, if you compute both a forward and a backward transform of
the same size, the local sizes are guaranteed to be consistent.  That
is, the local size after the forward transform will be the same as the
local size before the backward transform, and vice versa.

   Programs using the FFTW MPI routines should be linked with
`-lfftw_mpi -lfftw -lm' on Unix, in addition to whatever libraries are
required for MPI.

   ---------- Footnotes ----------

   (1) The 1D transforms require much more communication.  All the
communication in our FFT routines takes the form of an all-to-all
communication: the multi-dimensional transforms require two all-to-all
communications (or one, if you use `FFTW_TRANSPOSED_ORDER'), while the
one-dimensional transforms require *three* (or two, if you use
scrambled input or output).


automatically generated by info2www version 1.2.2.9