[mpich-discuss] MPI_Alltoall problem

Sat Oct 8 04:52:50 CDT 2011

> Hi, I am working on some application that heavily uses MPI_Alltoall---matrix transpose.
> ... So the performance of MPI_Alltoall becomes very critical. Does anyone know an
> alternative to directly calling the MPI_Alltoall routine and reduce the run time?

One possibility would be trying FFTW 3.3's MPI transpose capabilities
[1].  You pay a
one-time planning cost while FFTW figures out what the fastest way is
to perform your
transpose (Alltoall, pairwise sendrecv, etc) and then you can
repeatedly execute the
optimum choice.

If this looks like a good option, be sure to read the entire FFTW MPI chapter as
many of the useful tidbits are buried within it (e.g. [2]).  Lastly,
if you structure your
computation so that you perform an MPI transpose of your matrix A and in the
"transposed" logic you compute with data strided like A^T, you may find that
FFTW_MPI_TRANSPOSED_OUT [3] will improve your runtime.

Hope that helps,
Rhys

[1] http://www.fftw.org/fftw3_doc/FFTW-MPI-Transposes.html#FFTW-MPI-Transposes
[2] http://www.fftw.org/fftw3_doc/FFTW-MPI-Performance-Tips.html#FFTW-MPI-Performance-Tips
[3] http://www.fftw.org/fftw3_doc/Transposed-distributions.html#Transposed-distributions