[MPICH] An idle communication process use the same CPU as computation process on multi-core chips
Sylvain Jeaugey
sylvain.jeaugey at bull.net
Fri Sep 14 09:48:55 CDT 2007
That's unfortunate.
Still, I did two programs. A master :
----------------------
int main() {
while (1) {
sched_yield();
}
return 0;
}
----------------------
and a slave :
----------------------
int main() {
while (1);
return 0;
}
----------------------
I launch 4 slaves and 1 master on a bi dual-core machine. Here is the
result in top :
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12361 sylvain 25 0 2376 244 188 R 100 0.0 0:18.26 slave
12362 sylvain 25 0 2376 244 188 R 100 0.0 0:18.12 slave
12360 sylvain 25 0 2376 244 188 R 100 0.0 0:18.23 slave
12363 sylvain 25 0 2376 244 188 R 100 0.0 0:18.15 slave
12364 sylvain 20 0 2376 248 192 R 0 0.0 0:00.00 master
12365 sylvain 16 0 6280 1120 772 R 0 0.0 0:00.08 top
If you are seeing 66% each, I guess that your master is not
sched_yield'ing as much as expected. Maybe you should look at environment
variables to force yield when no message is available, and maybe your
master isn't so idle after all and has message to send continuously, thus
not yield'ing.
Anyway, strings <binary|mpi library> | grep YIELD may find environment
variables to control that.
Sylvain
On Fri, 14 Sep 2007, Yusong Wang wrote:
> On Fri, 2007-09-14 at 09:42 +0200, Sylvain Jeaugey wrote:
>> Yusong,
>>
>> I may be wrong for nemesis, but most shm-based MPI implementations rely on
>> busy polling, which make them appear as using 100% CPU. It may not be a
>> problem though because thay also call frequently sched_yield() when they
>> have nothing to receive, which means that if another task is running on
>> the same CPU, the "master" task will give all his CPU time to the
>> other task.
> Unfortunately, on my dell Latitude D630 laptop (dual-core), this didn't
> happen. I launched 3 processes and each process uses %66 CPU. It seems
> to me the process switches between the cores, as any two of them will be
> over 100%. On another test with more cores, I launched n_core+1
> processes, 2 of the processes use 50% CPU, the remaining use 100% CPU.
>
>
>>
>> So, it's not really a problem to have task 0 at 100% CPU. Just launch an
>> additionnal task and see if it takes the CPU cycles of the master. You
>> might also use taskset (at least on Fedora) to bind tasks on CPUs.
>>
>> Sylvain
>>
>> On Thu, 13 Sep 2007, Yusong Wang wrote:
>>
>>> Hi all,
>>>
>>> I have a program which is implemented with a master/slave model and the
>>> master just do very little computation. In my test, the master spent
>>> most of its time to wait other process to finish MPI_Gather
>>> communication (confirmed with jumpshot/MPE). In several tests on
>>> different multi-core chips (dual-core, quad-core, 8-core), I found the
>>> master use the same amount of CPU as the slaves, which should do all the
>>> computation. There are only two exceptions that the master use near 0%
>>> CPU (one on Window, one on Linux), which is what I expect. The tests
>>> were did on both Fedora Linux and Widows with MPICH2 (shm/nemesis
>>> mpd/smpd). I don't know if it is a software/system issue or caused by
>>> different hardware. I would think this is (at least )related with
>>> hardware. As with the same operating system, I got different CPU usage
>>> (near 0% or near 100%) for the master on different multi-core nodes of
>>> our clusters.
>>>
>>> Is there any documents I can check out for this issue?
>>>
>>> Thanks,
>>>
>>> Yusong
>>>
>>>
>
>
More information about the mpich-discuss
mailing list