[mpich-discuss] MPI_Alltoall problem

Fri Oct 7 17:33:10 CDT 2011

Hi, I am working on some application that heavily uses MPI_Alltoall---matrix transpose. Say, the matrix is 10^4*10^4, and it is partitioned into row slabs by using 32 processes. So matrix transpose means each process wants to hold a column slab of the data. Unfortunately I have to do matrix transpose many times, say 10 million times. So the performance of MPI_Alltoall becomes very critical. Does anyone know an alternative to directly calling the MPI_Alltoall routine and reduce the run time?

Jie