Copyright (C) 2000-2012 |
GNU Info (fftw.info)MPI TipsMPI Tips -------- There are several things you should consider in order to get the best performance out of the MPI FFTW routines. First, if possible, the first and second dimensions of your data should be divisible by the number of processes you are using. (If only one can be divisible, then you should choose the first dimension.) This allows the computational load to be spread evenly among the processes, and also reduces the communications complexity and overhead. In the one-dimensional transform case, the size of the transform should ideally be divisible by the *square* of the number of processors. Second, you should consider using the `FFTW_TRANSPOSED_ORDER' output format if it is not too burdensome. The speed gains from communications savings are usually substantial. Third, you should consider allocating a workspace for `(r)fftw(nd)_mpi', as this can often (but not always) improve performance (at the cost of extra storage). Fourth, you should experiment with the best number of processors to use for your problem. (There comes a point of diminishing returns, when the communications costs outweigh the computational benefits.(1)) The `fftw_mpi_test' program can output helpful performance benchmarks. It accepts the same parameters as the uniprocessor test programs (c.f. `tests/README') and is run like an ordinary MPI program. For example, `mpirun -np 4 fftw_mpi_test -s 128x128x128' will benchmark a `128x128x128' transform on four processors, reporting timings and parallel speedups for all variants of `fftwnd_mpi' (transposed, with workspace, etcetera). (Note also that there is the `rfftw_mpi_test' program for the real transforms.) ---------- Footnotes ---------- (1) An FFT is particularly hard on communications systems, as it requires an "all-to-all" communication, which is more or less the worst possible case. automatically generated by info2www version 1.2.2.9 |