[MPICH] Idle processes at 100% load with ch3:ssm and nemesis, but not ch3:sock

Darius Buntinas buntinas at mcs.anl.gov
Fri Feb 23 08:44:02 CST 2007


Nemesis does busy waiting by design.  By polling for messages in this 
fashion we can achive the best communication performance.  If we were to 
use blocking synchronization operations (which would not use the cpu) this 
would incur a large overhead, especially for shared-memory communication 
where it could more than double the one-way latency of small messages.

That said, we do intend to provide an optional blocking mode for 
situations where the processors are oversubscribed (i.e., more than one 
process per processor).

Darius


On Fri, 23 Feb 2007, Nate Crawford wrote:

> Hi All,
>
>  I am having a significant problem using shared memory for intra-node
> communications with MPICH2-1.0.5p3 (and some earlier versions). Some of
> our parallel programs are based on a master/slave arrangement where the
> master coordinates the slaves, but does no real work. When using sockets
> for intra-node message passing, things work as intended: the master is
> nearly always idle, and all but one of the slaves are idle during the
> sequential parts. With ch3:ssm or nemesis, all processes are always
> using 100% CPU, even when they are only waiting for the signal to
> proceed.
>
>  I suspect that the waiting processes are not using the best method for
> receiving messages, but do not know what to look for. Is this known
> behavior? I compiled MPICH2 with gcc 4.1.2 and pgf90 6.2-2 on SuSE 10.2
> (x86-64).
>
> Thanks,
> Nate
>
>




More information about the mpich-discuss mailing list