[MPICH] An idle communication process use the same CPU as computation process on multi-core chips
Sylvain Jeaugey
sylvain.jeaugey at bull.net
Fri Sep 14 09:18:37 CDT 2007
On Fri, 14 Sep 2007, Darius Buntinas wrote:
> I'm not sure how top calculates cpu percent usage, but I imagine that if it's
> based on the fraction of time the process spends in the runnable queue, it
> will show 100%, even though the process is not running all the time.
I'm pretty sure of the contrary. I once had to do such master/slaves
things and had 1 master running at 100% and 3 slaves running at 100% in
top.
When I launched 4 slaves, the master dropped to 0-1% and the slave was
running at almost 100%.
Doing a busy loop on sched_yield is not a problem, since it is a "soft"
polling which will let his place to anything else doing _real_
computation. Top won't show 500% cpu usage anyway. The sum should always
be 100% * ncpus.
Sylvain
> On 09/14/2007 05:38 AM, Reuti wrote:
>> Hi,
>>
>> Am 14.09.2007 um 00:41 schrieb Yusong Wang:
>>
>>> I have a program which is implemented with a master/slave model and the
>>> master just do very little computation. In my test, the master spent
>>> most of its time to wait other process to finish MPI_Gather
>>> communication (confirmed with jumpshot/MPE). In several tests on
>>> different multi-core chips (dual-core, quad-core, 8-core), I found the
>>> master use the same amount of CPU as the slaves, which should do all the
>>> computation.
>>
>> what do you mean in detail - you have let's say the master process running
>> and 4 slaves and see a CPU usage of 500% on a machine with 8 cores?
>>
>> Having his programming style, you need a special configured machinefile if
>> you use a queuing system, as otherwise one idling slot will be wasted for
>> the master process.
>>
>> -- Reuti
>>
>>
>>> . There are only two exceptions that the master use near 0%
>>> CPU (one on Window, one on Linux), which is what I expect. The tests
>>> were did on both Fedora Linux and Widows with MPICH2 (shm/nemesis
>>> mpd/smpd). I don't know if it is a software/system issue or caused by
>>> different hardware. I would think this is (at least )related with
>>> hardware. As with the same operating system, I got different CPU usage
>>> (near 0% or near 100%) for the master on different multi-core nodes of
>>> our clusters.
>>>
>>> Is there any documents I can check out for this issue?
>>>
>>> Thanks,
>>>
>>> Yusong
>>
>
>
More information about the mpich-discuss
mailing list