[mpich-discuss] mpi query

Tue May 31 09:16:51 CDT 2011

On May 30, 2011, at 3:54 AM CDT, SHEETAL MALU wrote:

> Respected Sir,
> 
> When I am trying to create 8 no. of process on 2 servers having 8 processor each, like process(0,2,4,6)  on server1 and remaining process(1,3,5,7) on server2 then the results for MPI_Allgather are faster upto 16Kb of data and for data size greater than 16Kb MPI_Gather+MPI_Bcast performs faster.
> 
> If I try to create process(0,1,2,3) on server1 and process(4,5,6,7) on server2 then MPI_Gather+MPI_Bcast is faster than MPI_Allgather.Sir, I am unable to get reason for this because MPI_Allgather should perform faster than MPI_Gather+MPI_Bcast but I am getting the contradictory results when creating process(0,1,2,3) on server1 and process(4,5,6,7) on server2.
> 
> Does protocol switch from eager to rendezvous also affects the collective operations like in point to point communication.

Yes, a similar effect occurs because of algorithm selections cutoffs in each collective.  Those cutoffs could be tuned incorrectly for your particular platform.

Most of our collectives are written under the assumption of a nearly "flat" network where communication costs are similar between any two processes in the system.  This is obviously not true on your two-node system.  Furthermore, while MPI_Allgather and MPI_Gather are currently written in this fashion, the MPI_Bcast code has been optimized to take advantage of the SMP layout of your system.  So MPI_Bcast will be less sensitive to changes in process layout, which could be why gather+bcast is faster in some cases.

Also, accurately measuring MPI collective performance is not an easy task.  If you are not absolutely sure that you know what you are doing, you might want to use benchmarks like IMB [1] or SKaMPI [2].

> Please Sir, let me know the solution.

There may not be a way to fix this performance problem without hacking on the mpich2 code itself.  However you can try to fix it by reading the relevant collectives code in src/mpi/coll and playing with the relevant cutoff parameters listed in README.envvar:

----8<----
% grep 'MPICH_\(ALLGATHER\|GATHER\|BCAST\)_' README.envvar
MPICH_BCAST_MIN_PROCS
MPICH_BCAST_SHORT_MSG_SIZE
MPICH_BCAST_LONG_MSG_SIZE
MPICH_ALLGATHER_SHORT_MSG_SIZE
MPICH_ALLGATHER_LONG_MSG_SIZE
MPICH_GATHER_VSMALL_MSG_SIZE
MPICH_GATHER_INTER_SHORT_MSG_SIZE
----8<----

HTH,
-Dave

[1] http://software.intel.com/en-us/articles/intel-mpi-benchmarks/
[2] http://liinwww.ira.uka.de/~skampi/