Copyright (C) 2000-2012 |
GNU Info (fftw.info)MPI Data LayoutMPI Data Layout --------------- The transform data used by the MPI FFTW routines is "distributed": a distinct portion of it resides with each process involved in the transform. This allows the transform to be parallelized, for example, over a cluster of workstations, each with its own separate memory, so that you can take advantage of the total memory of all the processors you are parallelizing over. In particular, the array is divided according to the rows (first dimension) of the data: each process gets a subset of the rows of the data. (This is sometimes called a "slab decomposition.") One consequence of this is that you can't take advantage of more processors than you have rows (e.g. `64x64x64' matrix can at most use 64 processors). This isn't usually much of a limitation, however, as each processor needs a fair amount of data in order for the parallel-computation benefits to outweight the communications costs. Below, the first dimension of the data will be referred to as ``x'' and the second dimension as ``y''. FFTW supplies a routine to tell you exactly how much data resides on the current process: void fftwnd_mpi_local_sizes(fftwnd_mpi_plan p, int *local_nx, int *local_x_start, int *local_ny_after_transpose, int *local_y_start_after_transpose, int *total_local_size); Given a plan `p', the other parameters of this routine are set to values describing the required data layout, described below. `total_local_size' is the number of `fftw_complex' elements that you must allocate for your local data (and workspace, if you choose). (This value should, of course, be multiplied by `n_fields' if that parameter to `fftwnd_mpi' is not `1'.) The data on the current process has `local_nx' rows, starting at row `local_x_start'. If `fftwnd_mpi' is called with `FFTW_TRANSPOSED_ORDER' output, then `y' will be the first dimension of the output, and the local `y' extent will be given by `local_ny_after_transpose' and `local_y_start_after_transpose'. Otherwise, the output has the same dimensions and layout as the input. For instance, suppose you want to transform three-dimensional data of size `nx x ny x nz'. Then, the current process will store a subset of this data, of size `local_nx x ny x nz', where the `x' indices correspond to the range `local_x_start' to `local_x_start+local_nx-1' in the "real" (i.e. logical) array. If `fftwnd_mpi' is called with `FFTW_TRANSPOSED_ORDER' output, then the result will be a `ny x nx x nz' array, of which a `local_ny_after_transpose x nx x nz' subset is stored on the current process (corresponding to `y' values starting at `local_y_start_after_transpose'). The following is an example of allocating such a three-dimensional array array (`local_data') before the transform and initializing it to some function `f(x,y,z)': fftwnd_mpi_local_sizes(plan, &local_nx, &local_x_start, &local_ny_after_transpose, &local_y_start_after_transpose, &total_local_size); local_data = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size); for (x = 0; x < local_nx; ++x) for (y = 0; y < ny; ++y) for (z = 0; z < nz; ++z) local_data[(x*ny + y)*nz + z] = f(x + local_x_start, y, z); Some important things to remember: * Although the local data is of dimensions `local_nx x ny x nz' in the above example, do *not* allocate the array to be of size `local_nx*ny*nz'. Use `total_local_size' instead. * The amount of data on each process will not necessarily be the same; in fact, `local_nx' may even be zero for some processes. (For example, suppose you are doing a `6x6' transform on four processors. There is no way to effectively use the fourth processor in a slab decomposition, so we leave it empty. Proof left as an exercise for the reader.) * All arrays are, of course, in row-major order (Note: Multi-dimensional Array Format.). * If you want to compute the inverse transform of the output of `fftwnd_mpi', the dimensions of the inverse transform are given by the dimensions of the output of the forward transform. For example, if you are using `FFTW_TRANSPOSED_ORDER' output in the above example, then the inverse plan should be created with dimensions `ny x nx x nz'. * The data layout only depends upon the dimensions of the array, not on the plan, so you are guaranteed that different plans for the same size (or inverse plans) will use the same (consistent) data layouts. automatically generated by info2www version 1.2.2.9 |