Copyright (C) 2000-2012 |
GNU Info (fftw.info)Usage of MPI FFTW for Complex One-dimensional TransformsUsage of MPI FFTW for Complex One-dimensional Transforms -------------------------------------------------------- The MPI FFTW also includes routines for parallel one-dimensional transforms of complex data (only). Although the speedup is generally worse than it is for the multi-dimensional routines,(1) these distributed-memory one-dimensional transforms are especially useful for performing one-dimensional transforms that don't fit into the memory of a single machine. The usage of these routines is straightforward, and is similar to that of the multi-dimensional MPI transform functions. You first include the header `<fftw_mpi.h>' and then create a plan by calling: fftw_mpi_plan fftw_mpi_create_plan(MPI_Comm comm, int n, fftw_direction dir, int flags); The last three arguments are the same as for `fftw_create_plan' (except that all MPI transforms are automatically `FFTW_IN_PLACE'). The first argument specifies the group of processes you are using, and is usually `MPI_COMM_WORLD' (all processes). A plan can be used for many transforms of the same size, and is destroyed when you are done with it by calling `fftw_mpi_destroy_plan(plan)'. If you don't care about the ordering of the input or output data of the transform, you can include `FFTW_SCRAMBLED_INPUT' and/or `FFTW_SCRAMBLED_OUTPUT' in the `flags'. These save some communications at the expense of having the input and/or output reordered in an undocumented way. For example, if you are performing an FFT-based convolution, you might use `FFTW_SCRAMBLED_OUTPUT' for the forward transform and `FFTW_SCRAMBLED_INPUT' for the inverse transform. The transform itself is computed by: void fftw_mpi(fftw_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work); `n_fields', as in `fftwnd_mpi', is equivalent to `howmany=n_fields', `stride=n_fields', and `dist=1', and should be `1' when you are computing the transform of a single array. `local_data' contains the portion of the array local to the current process, described below. `work' is either `NULL' or an array exactly the same size as `local_data'; in the latter case, FFTW can use the `MPI_Alltoall' communications primitive which is (usually) faster at the expense of extra storage. Upon return, `local_data' contains the portion of the output local to the current process (see below). To find out what portion of the array is stored local to the current process, you call the following routine: void fftw_mpi_local_sizes(fftw_mpi_plan p, int *local_n, int *local_start, int *local_n_after_transform, int *local_start_after_transform, int *total_local_size); `total_local_size' is the number of `fftw_complex' elements you should actually allocate for `local_data' (and `work'). `local_n' and `local_start' indicate that the current process stores `local_n' elements corresponding to the indices `local_start' to `local_start+local_n-1' in the "real" array. *After the transform, the process may store a different portion of the array.* The portion of the data stored on the process after the transform is given by `local_n_after_transform' and `local_start_after_transform'. This data is exactly the same as a contiguous segment of the corresponding uniprocessor transform output (i.e. an in-order sequence of sequential frequency bins). Note that, if you compute both a forward and a backward transform of the same size, the local sizes are guaranteed to be consistent. That is, the local size after the forward transform will be the same as the local size before the backward transform, and vice versa. Programs using the FFTW MPI routines should be linked with `-lfftw_mpi -lfftw -lm' on Unix, in addition to whatever libraries are required for MPI. ---------- Footnotes ---------- (1) The 1D transforms require much more communication. All the communication in our FFT routines takes the form of an all-to-all communication: the multi-dimensional transforms require two all-to-all communications (or one, if you use `FFTW_TRANSPOSED_ORDER'), while the one-dimensional transforms require *three* (or two, if you use scrambled input or output). |