[MPICH] An idle communication process use the same CPU as computation process on multi-core chips
Darius Buntinas
buntinas at mcs.anl.gov
Fri Sep 14 10:05:08 CDT 2007
Yusong,
Can you give me more information about your application? Is this an
MPICH2 application? What channel are you using? What configure flags
did you use? I believe that sched_yield() can be disabled depending on
certain configure flags.
Thanks,
-d
On 09/14/2007 09:48 AM, Sylvain Jeaugey wrote:
> That's unfortunate.
>
> Still, I did two programs. A master :
> ----------------------
> int main() {
> while (1) {
> sched_yield();
> }
> return 0;
> }
> ----------------------
> and a slave :
> ----------------------
> int main() {
> while (1);
> return 0;
> }
> ----------------------
>
> I launch 4 slaves and 1 master on a bi dual-core machine. Here is the
> result in top :
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 12361 sylvain 25 0 2376 244 188 R 100 0.0 0:18.26 slave
> 12362 sylvain 25 0 2376 244 188 R 100 0.0 0:18.12 slave
> 12360 sylvain 25 0 2376 244 188 R 100 0.0 0:18.23 slave
> 12363 sylvain 25 0 2376 244 188 R 100 0.0 0:18.15 slave
> 12364 sylvain 20 0 2376 248 192 R 0 0.0 0:00.00 master
> 12365 sylvain 16 0 6280 1120 772 R 0 0.0 0:00.08 top
>
> If you are seeing 66% each, I guess that your master is not
> sched_yield'ing as much as expected. Maybe you should look at
> environment variables to force yield when no message is available, and
> maybe your master isn't so idle after all and has message to send
> continuously, thus not yield'ing.
>
> Anyway, strings <binary|mpi library> | grep YIELD may find environment
> variables to control that.
>
> Sylvain
>
> On Fri, 14 Sep 2007, Yusong Wang wrote:
>
>> On Fri, 2007-09-14 at 09:42 +0200, Sylvain Jeaugey wrote:
>>> Yusong,
>>>
>>> I may be wrong for nemesis, but most shm-based MPI implementations
>>> rely on
>>> busy polling, which make them appear as using 100% CPU. It may not be a
>>> problem though because thay also call frequently sched_yield() when they
>>> have nothing to receive, which means that if another task is running on
>>> the same CPU, the "master" task will give all his CPU time to the
>>> other task.
>> Unfortunately, on my dell Latitude D630 laptop (dual-core), this didn't
>> happen. I launched 3 processes and each process uses %66 CPU. It seems
>> to me the process switches between the cores, as any two of them will be
>> over 100%. On another test with more cores, I launched n_core+1
>> processes, 2 of the processes use 50% CPU, the remaining use 100% CPU.
>>
>>
>>>
>>> So, it's not really a problem to have task 0 at 100% CPU. Just launch an
>>> additionnal task and see if it takes the CPU cycles of the master. You
>>> might also use taskset (at least on Fedora) to bind tasks on CPUs.
>>>
>>> Sylvain
>>>
>>> On Thu, 13 Sep 2007, Yusong Wang wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a program which is implemented with a master/slave model and the
>>>> master just do very little computation. In my test, the master spent
>>>> most of its time to wait other process to finish MPI_Gather
>>>> communication (confirmed with jumpshot/MPE). In several tests on
>>>> different multi-core chips (dual-core, quad-core, 8-core), I found the
>>>> master use the same amount of CPU as the slaves, which should do all
>>>> the
>>>> computation. There are only two exceptions that the master use near 0%
>>>> CPU (one on Window, one on Linux), which is what I expect. The tests
>>>> were did on both Fedora Linux and Widows with MPICH2 (shm/nemesis
>>>> mpd/smpd). I don't know if it is a software/system issue or caused by
>>>> different hardware. I would think this is (at least )related with
>>>> hardware. As with the same operating system, I got different CPU usage
>>>> (near 0% or near 100%) for the master on different multi-core nodes of
>>>> our clusters.
>>>>
>>>> Is there any documents I can check out for this issue?
>>>>
>>>> Thanks,
>>>>
>>>> Yusong
>>>>
>>>>
>>
>>
>
More information about the mpich-discuss
mailing list