[MPICH] Idle processes at 100% load with ch3:ssm and nemesis, but not ch3:sock
Darius Buntinas
buntinas at mcs.anl.gov
Fri Feb 23 08:44:02 CST 2007
Nemesis does busy waiting by design. By polling for messages in this
fashion we can achive the best communication performance. If we were to
use blocking synchronization operations (which would not use the cpu) this
would incur a large overhead, especially for shared-memory communication
where it could more than double the one-way latency of small messages.
That said, we do intend to provide an optional blocking mode for
situations where the processors are oversubscribed (i.e., more than one
process per processor).
Darius
On Fri, 23 Feb 2007, Nate Crawford wrote:
> Hi All,
>
> I am having a significant problem using shared memory for intra-node
> communications with MPICH2-1.0.5p3 (and some earlier versions). Some of
> our parallel programs are based on a master/slave arrangement where the
> master coordinates the slaves, but does no real work. When using sockets
> for intra-node message passing, things work as intended: the master is
> nearly always idle, and all but one of the slaves are idle during the
> sequential parts. With ch3:ssm or nemesis, all processes are always
> using 100% CPU, even when they are only waiting for the signal to
> proceed.
>
> I suspect that the waiting processes are not using the best method for
> receiving messages, but do not know what to look for. Is this known
> behavior? I compiled MPICH2 with gcc 4.1.2 and pgf90 6.2-2 on SuSE 10.2
> (x86-64).
>
> Thanks,
> Nate
>
>
More information about the mpich-discuss
mailing list