[MPICH] An idle communication process use the same CPU as computation process on multi-core chips

Sylvain Jeaugey sylvain.jeaugey at bull.net
Fri Sep 14 09:18:37 CDT 2007


On Fri, 14 Sep 2007, Darius Buntinas wrote:

> I'm not sure how top calculates cpu percent usage, but I imagine that if it's 
> based on the fraction of time the process spends in the runnable queue, it 
> will show 100%, even though the process is not running all the time.
I'm pretty sure of the contrary. I once had to do such master/slaves 
things and had 1 master running at 100% and 3 slaves running at 100% in 
top.

When I launched 4 slaves, the master dropped to 0-1% and the slave was 
running at almost 100%.

Doing a busy loop on sched_yield is not a problem, since it is a "soft" 
polling which will let his place to anything else doing _real_ 
computation. Top won't show 500% cpu usage anyway. The sum should always 
be 100% * ncpus.

Sylvain

> On 09/14/2007 05:38 AM, Reuti wrote:
>> Hi,
>> 
>> Am 14.09.2007 um 00:41 schrieb Yusong Wang:
>> 
>>> I have a program which is implemented with a master/slave model and the
>>> master just do very little computation. In my test, the master spent
>>> most of its time to wait other process to finish MPI_Gather
>>> communication (confirmed with jumpshot/MPE). In several tests on
>>> different multi-core chips (dual-core, quad-core, 8-core), I found the
>>> master use the same amount of CPU as the slaves, which should do all the
>>> computation.
>> 
>> what do you mean in detail - you have let's say the master process running 
>> and 4 slaves and see a CPU usage of 500% on a machine with 8 cores?
>> 
>> Having his programming style, you need a special configured machinefile if 
>> you use a queuing system, as otherwise one idling slot will be wasted for 
>> the master process.
>> 
>> -- Reuti
>> 
>> 
>>> . There are only two exceptions that the master use near 0%
>>> CPU (one on Window, one on Linux), which is what I expect. The tests
>>> were did on both Fedora Linux and Widows with MPICH2 (shm/nemesis
>>> mpd/smpd). I don't know if it is a software/system issue or caused by
>>> different hardware. I would think this is  (at least )related with
>>> hardware. As with the same operating system, I got different CPU usage
>>> (near 0% or near 100%) for the master on different multi-core nodes of
>>> our clusters.
>>> 
>>> Is there any documents I can check out for this issue?
>>> 
>>> Thanks,
>>> 
>>> Yusong
>> 
>
>




More information about the mpich-discuss mailing list