[MPICH] An idle communication process use the same CPU as computation process on multi-core chips

Darius Buntinas buntinas at mcs.anl.gov
Fri Sep 14 09:00:18 CDT 2007


Sylvain is correct.  The shared-memory implementations (including 
Nemesis) use busy polling to wait for incoming messages, and call 
sched_yield() periodically.  What this means in the master/slave case is 
that the master process will poll for a short while, then give up the 
rest of its timeslice when nothing is received.  At this point a slave 
process is run for until its timeslice runs out, then the master is run 
again until it yields.  Because the master polls for such a short while, 
relative to the scheduling quanta, that, in our tests, there is 
negligible effect on the slaves.

Because the master calls yield, rather than blocking on a semaphore or 
something, the process will remain in the runnable queue, so if you have 
one master and four slaves on a single node, you'll see a load value of 
five.  I'm not sure how top calculates cpu percent usage, but I imagine 
that if it's based on the fraction of time the process spends in the 
runnable queue, it will show 100%, even though the process is not 
running all the time.  It would be interesting to see what percent usage 
top would show for N+1 processes constantly calling sched_yield on an N 
processor machine.

Hope that explains things a little.

-d

On 09/14/2007 05:38 AM, Reuti wrote:
> Hi,
> 
> Am 14.09.2007 um 00:41 schrieb Yusong Wang:
> 
>> I have a program which is implemented with a master/slave model and the
>> master just do very little computation. In my test, the master spent
>> most of its time to wait other process to finish MPI_Gather
>> communication (confirmed with jumpshot/MPE). In several tests on
>> different multi-core chips (dual-core, quad-core, 8-core), I found the
>> master use the same amount of CPU as the slaves, which should do all the
>> computation.
> 
> what do you mean in detail - you have let's say the master process 
> running and 4 slaves and see a CPU usage of 500% on a machine with 8 cores?
> 
> Having his programming style, you need a special configured machinefile 
> if you use a queuing system, as otherwise one idling slot will be wasted 
> for the master process.
> 
> -- Reuti
> 
> 
>> . There are only two exceptions that the master use near 0%
>> CPU (one on Window, one on Linux), which is what I expect. The tests
>> were did on both Fedora Linux and Widows with MPICH2 (shm/nemesis
>> mpd/smpd). I don't know if it is a software/system issue or caused by
>> different hardware. I would think this is  (at least )related with
>> hardware. As with the same operating system, I got different CPU usage
>> (near 0% or near 100%) for the master on different multi-core nodes of
>> our clusters.
>>
>> Is there any documents I can check out for this issue?
>>
>> Thanks,
>>
>> Yusong
> 




More information about the mpich-discuss mailing list