[mpich-discuss] MPI_Alltoall problem

Jie Chen jiechen at mcs.anl.gov
Fri Oct 7 17:33:10 CDT 2011


Hi, I am working on some application that heavily uses MPI_Alltoall---matrix transpose. Say, the matrix is 10^4*10^4, and it is partitioned into row slabs by using 32 processes. So matrix transpose means each process wants to hold a column slab of the data. Unfortunately I have to do matrix transpose many times, say 10 million times. So the performance of MPI_Alltoall becomes very critical. Does anyone know an alternative to directly calling the MPI_Alltoall routine and reduce the run time?

Jie



More information about the mpich-discuss mailing list