Copyright (C) 2000-2012 |
GNU Info (fftw.info)Usage of MPI FFTW for Complex Multi-dimensional TransformsUsage of MPI FFTW for Complex Multi-dimensional Transforms ---------------------------------------------------------- Usage of the MPI FFTW routines is similar to that of the uniprocessor FFTW. We assume that the reader already understands the usage of the uniprocessor FFTW routines, described elsewhere in this manual. Some familiarity with MPI is also helpful. A typical program performing a complex two-dimensional MPI transform might look something like: #include <fftw_mpi.h> int main(int argc, char **argv) { const int NX = ..., NY = ...; fftwnd_mpi_plan plan; fftw_complex *data; MPI_Init(&argc,&argv); plan = fftw2d_mpi_create_plan(MPI_COMM_WORLD, NX, NY, FFTW_FORWARD, FFTW_ESTIMATE); ...allocate and initialize data... fftwnd_mpi(p, 1, data, NULL, FFTW_NORMAL_ORDER); ... fftwnd_mpi_destroy_plan(plan); MPI_Finalize(); } The calls to `MPI_Init' and `MPI_Finalize' are required in all MPI programs; see the MPI home page (http://www.mcs.anl.gov/mpi/) for more information. Note that all of your processes run the program in parallel, as a group; there is no explicit launching of threads/processes in an MPI program. As in the ordinary FFTW, the first thing we do is to create a plan (of type `fftwnd_mpi_plan'), using: fftwnd_mpi_plan fftw2d_mpi_create_plan(MPI_Comm comm, int nx, int ny, fftw_direction dir, int flags); Except for the first argument, the parameters are identical to those of `fftw2d_create_plan'. (There are also analogous `fftwnd_mpi_create_plan' and `fftw3d_mpi_create_plan' functions. Transforms of any rank greater than one are supported.) The first argument is an MPI "communicator", which specifies the group of processes that are to be involved in the transform; the standard constant `MPI_COMM_WORLD' indicates all available processes. Next, one has to allocate and initialize the data. This is somewhat tricky, because the transform data is distributed across the processes involved in the transform. It is discussed in detail by the next section (Note: MPI Data Layout.). The actual computation of the transform is performed by the function `fftwnd_mpi', which differs somewhat from its uniprocessor equivalent and is described by: void fftwnd_mpi(fftwnd_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftwnd_mpi_output_order output_order); There are several things to notice here: * First of all, all `fftw_mpi' transforms are in-place: the output is in the `local_data' parameter, and there is no need to specify `FFTW_IN_PLACE' in the plan flags. * The MPI transforms also only support a limited subset of the `howmany'/`stride'/`dist' functionality of the uniprocessor routines: the `n_fields' parameter is equivalent to `howmany=n_fields', `stride=n_fields', and `dist=1'. (Conceptually, the `n_fields' parameter allows you to transform an array of contiguous vectors, each with length `n_fields'.) `n_fields' is `1' if you are only transforming a single, ordinary array. * The `work' parameter is an optional workspace. If it is not `NULL', it should be exactly the same size as the `local_data' array. If it is provided, FFTW is able to use the built-in `MPI_Alltoall' primitive for (often) greater efficiency at the expense of extra storage space. * Finally, the last parameter specifies whether the output data has the same ordering as the input data (`FFTW_NORMAL_ORDER'), or if it is transposed (`FFTW_TRANSPOSED_ORDER'). Leaving the data transposed results in significant performance improvements due to a saved communication step (needed to un-transpose the data). Specifically, the first two dimensions of the array are transposed, as is described in more detail by the next section. The output of `fftwnd_mpi' is identical to that of the corresponding uniprocessor transform. In particular, you should recall our conventions for normalization and the sign of the transform exponent. The same plan can be used to compute many transforms of the same size. After you are done with it, you should deallocate it by calling `fftwnd_mpi_destroy_plan'. Important: The FFTW MPI routines must be called in the same order by all processes involved in the transform. You should assume that they all are blocking, as if each contained a call to `MPI_Barrier'. Programs using the FFTW MPI routines should be linked with `-lfftw_mpi -lfftw -lm' on Unix, in addition to whatever libraries are required for MPI. automatically generated by info2www version 1.2.2.9 |