[mpich-discuss] MPI_Alltoall problem
Rhys Ulerich
rhys.ulerich at gmail.com
Sat Oct 8 05:55:39 CDT 2011
Hi Jie,
>>> Hi, I am working on some application that heavily uses MPI_Alltoall---matrix transpose.
>>> ... So the performance of MPI_Alltoall becomes very critical. Does anyone know an
>>> alternative to directly calling the MPI_Alltoall routine and reduce the run time?
>> One possibility would be trying FFTW 3.3's MPI transpose capabilities
> Actually my question stems from using fftw. I do not feel fftw fast enough, considering
> thatI need to do the transforms millions, even billions times. I timed its routine and
> found that it was actually the transpose of the data that killed me.
If it is the on-node data transpose that consumes most of the time, as
I hinted, try restructuring your calculation to work with transposed
on-node data and use
FFTW_MPI_TRANSPOSED_OUT. Note that an in-place MPI transpose will
incur higher on-node transpose costs than an out-of-place plan because
non-square, in-place matrix transposes are expensive.
> I just hope that someone here might have a smarter solution that what fftw provides...
If you're fully taking advantage of what FFTW MPI can do, I would be
surprised if you find something better. If you do, please write back
as I'd be curious.
Best of luck,
Rhys
More information about the mpich-discuss
mailing list