GNU Info

Info Node: (fftw.info)MPI Data Layout

(fftw.info)MPI Data Layout


Next: Usage of MPI FFTW for Real Multi-dimensional Transforms Prev: Usage of MPI FFTW for Complex Multi-dimensional Transforms Up: MPI FFTW
Enter node , (file) or (file)node

MPI Data Layout
---------------

   The transform data used by the MPI FFTW routines is "distributed": a
distinct portion of it resides with each process involved in the
transform.  This allows the transform to be parallelized, for example,
over a cluster of workstations, each with its own separate memory, so
that you can take advantage of the total memory of all the processors
you are parallelizing over.

   In particular, the array is divided according to the rows (first
dimension) of the data: each process gets a subset of the rows of the
data.  (This is sometimes called a "slab decomposition.")  One
consequence of this is that you can't take advantage of more processors
than you have rows (e.g. `64x64x64' matrix can at most use 64
processors).  This isn't usually much of a limitation, however, as each
processor needs a fair amount of data in order for the
parallel-computation benefits to outweight the communications costs.

   Below, the first dimension of the data will be referred to as ``x''
and the second dimension as ``y''.

   FFTW supplies a routine to tell you exactly how much data resides on
the current process:

     void fftwnd_mpi_local_sizes(fftwnd_mpi_plan p,
                                 int *local_nx,
                                 int *local_x_start,
                                 int *local_ny_after_transpose,
                                 int *local_y_start_after_transpose,
                                 int *total_local_size);

   Given a plan `p', the other parameters of this routine are set to
values describing the required data layout, described below.

   `total_local_size' is the number of `fftw_complex' elements that you
must allocate for your local data (and workspace, if you choose).
(This value should, of course, be multiplied by `n_fields' if that
parameter to `fftwnd_mpi' is not `1'.)

   The data on the current process has `local_nx' rows, starting at row
`local_x_start'.  If `fftwnd_mpi' is called with
`FFTW_TRANSPOSED_ORDER' output, then `y' will be the first dimension of
the output, and the local `y' extent will be given by
`local_ny_after_transpose' and `local_y_start_after_transpose'.
Otherwise, the output has the same dimensions and layout as the input.

   For instance, suppose you want to transform three-dimensional data of
size `nx x ny x nz'.  Then, the current process will store a subset of
this data, of size `local_nx x ny x nz', where the `x' indices
correspond to the range `local_x_start' to `local_x_start+local_nx-1'
in the "real" (i.e. logical) array.  If `fftwnd_mpi' is called with
`FFTW_TRANSPOSED_ORDER' output, then the result will be a `ny x nx x
nz' array, of which a `local_ny_after_transpose x nx x nz' subset is
stored on the current process (corresponding to `y' values starting at
`local_y_start_after_transpose').

   The following is an example of allocating such a three-dimensional
array array (`local_data') before the transform and initializing it to
some function `f(x,y,z)':

             fftwnd_mpi_local_sizes(plan, &local_nx, &local_x_start,
                                    &local_ny_after_transpose,
                                    &local_y_start_after_transpose,
                                    &total_local_size);
     
             local_data = (fftw_complex*) malloc(sizeof(fftw_complex) *
                                                 total_local_size);
     
             for (x = 0; x < local_nx; ++x)
                     for (y = 0; y < ny; ++y)
                             for (z = 0; z < nz; ++z)
                                     local_data[(x*ny + y)*nz + z]
                                             = f(x + local_x_start, y, z);

   Some important things to remember:

   * Although the local data is of dimensions `local_nx x ny x nz' in
     the above example, do *not* allocate the array to be of size
     `local_nx*ny*nz'.  Use `total_local_size' instead.

   * The amount of data on each process will not necessarily be the
     same; in fact, `local_nx' may even be zero for some processes.
     (For example, suppose you are doing a `6x6' transform on four
     processors.  There is no way to effectively use the fourth
     processor in a slab decomposition, so we leave it empty.  Proof
     left as an exercise for the reader.)

   * All arrays are, of course, in row-major order (Note:
     Multi-dimensional Array Format.).

   * If you want to compute the inverse transform of the output of
     `fftwnd_mpi', the dimensions of the inverse transform are given by
     the dimensions of the output of the forward transform.  For
     example, if you are using `FFTW_TRANSPOSED_ORDER' output in the
     above example, then the inverse plan should be created with
     dimensions `ny x nx x nz'.

   * The data layout only depends upon the dimensions of the array, not
     on the plan, so you are guaranteed that different plans for the
     same size (or inverse plans) will use the same (consistent) data
     layouts.


automatically generated by info2www version 1.2.2.9