[MPICH] An idle communication process use the same CPU as computation process on multi-core chips

Sylvain Jeaugey sylvain.jeaugey at bull.net
Fri Sep 14 09:48:55 CDT 2007


That's unfortunate.

Still, I did two programs. A master :
----------------------
int main() {
         while (1) {
             sched_yield();
         }
         return 0;
}
----------------------
and a slave :
----------------------
int main() {
         while (1);
         return 0;
}
----------------------

I launch 4 slaves and 1 master on a bi dual-core machine. Here is the 
result in top :

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12361 sylvain   25   0  2376  244  188 R  100  0.0   0:18.26 slave
12362 sylvain   25   0  2376  244  188 R  100  0.0   0:18.12 slave
12360 sylvain   25   0  2376  244  188 R  100  0.0   0:18.23 slave
12363 sylvain   25   0  2376  244  188 R  100  0.0   0:18.15 slave
12364 sylvain   20   0  2376  248  192 R    0  0.0   0:00.00 master
12365 sylvain   16   0  6280 1120  772 R    0  0.0   0:00.08 top

If you are seeing 66% each, I guess that your master is not 
sched_yield'ing as much as expected. Maybe you should look at environment 
variables to force yield when no message is available, and maybe your 
master isn't so idle after all and has message to send continuously, thus 
not yield'ing.

Anyway, strings <binary|mpi library> | grep YIELD may find environment 
variables to control that.

Sylvain

On Fri, 14 Sep 2007, Yusong Wang wrote:

> On Fri, 2007-09-14 at 09:42 +0200, Sylvain Jeaugey wrote:
>> Yusong,
>>
>> I may be wrong for nemesis, but most shm-based MPI implementations rely on
>> busy polling, which make them appear as using 100% CPU. It may not be a
>> problem though because thay also call frequently sched_yield() when they
>> have nothing to receive, which means that if another task is running on
>> the same CPU, the "master" task will give all his CPU time to the
>> other task.
> Unfortunately, on my dell Latitude D630 laptop (dual-core), this didn't
> happen. I launched 3 processes and each process uses %66 CPU. It seems
> to me the process switches between the cores, as any two of them will be
> over 100%. On another test with more cores, I launched n_core+1
> processes, 2 of the processes use 50% CPU, the remaining use 100% CPU.
>
>
>>
>> So, it's not really a problem to have task 0 at 100% CPU. Just launch an
>> additionnal task and see if it takes the CPU cycles of the master. You
>> might also use taskset (at least on Fedora) to bind tasks on CPUs.
>>
>> Sylvain
>>
>> On Thu, 13 Sep 2007, Yusong Wang wrote:
>>
>>> Hi all,
>>>
>>> I have a program which is implemented with a master/slave model and the
>>> master just do very little computation. In my test, the master spent
>>> most of its time to wait other process to finish MPI_Gather
>>> communication (confirmed with jumpshot/MPE). In several tests on
>>> different multi-core chips (dual-core, quad-core, 8-core), I found the
>>> master use the same amount of CPU as the slaves, which should do all the
>>> computation. There are only two exceptions that the master use near 0%
>>> CPU (one on Window, one on Linux), which is what I expect. The tests
>>> were did on both Fedora Linux and Widows with MPICH2 (shm/nemesis
>>> mpd/smpd). I don't know if it is a software/system issue or caused by
>>> different hardware. I would think this is  (at least )related with
>>> hardware. As with the same operating system, I got different CPU usage
>>> (near 0% or near 100%) for the master on different multi-core nodes of
>>> our clusters.
>>>
>>> Is there any documents I can check out for this issue?
>>>
>>> Thanks,
>>>
>>> Yusong
>>>
>>>
>
>




More information about the mpich-discuss mailing list