[MPICH] An idle communication process use the same CPU as computation process on multi-core chips

Fri Sep 14 12:29:16 CDT 2007

Looking at the code, it should be calling yield.  I know in nemesis, 
--enable-fast disables sched yield in the current release, but that's 
not the case with shm.

Try doing:
   nm <binary|mpi library> | grep -i yield

-d

On 09/14/2007 10:19 AM, Yusong Wang wrote:
> Darius,
> 
> I downloaded the most recent source code for MPICH2 and configured with 
> configure --with-device=ch3:shm --enable-fast 
> 
> By the way, how to check if the shed_yield() is disabled or not? The
> command "strings <binary|mpi library> | grep YIELD" Sylvain sent to me
> doesn't work on my system.
> 
> Thanks for your help!
> 
> Yusong 
> On Fri, 2007-09-14 at 10:05 -0500, Darius Buntinas wrote:
>> Yusong,
>>
>> Can you give me more information about your application?  Is this an 
>> MPICH2 application?  What channel are you using?  What configure flags 
>> did you use?  I believe that sched_yield() can be disabled depending on 
>> certain configure flags.
>>
>> Thanks,
>> -d
>>
>> On 09/14/2007 09:48 AM, Sylvain Jeaugey wrote:
>>> That's unfortunate.
>>>
>>> Still, I did two programs. A master :
>>> ----------------------
>>> int main() {
>>>         while (1) {
>>>             sched_yield();
>>>         }
>>>         return 0;
>>> }
>>> ----------------------
>>> and a slave :
>>> ----------------------
>>> int main() {
>>>         while (1);
>>>         return 0;
>>> }
>>> ----------------------
>>>
>>> I launch 4 slaves and 1 master on a bi dual-core machine. Here is the 
>>> result in top :
>>>
>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 12361 sylvain   25   0  2376  244  188 R  100  0.0   0:18.26 slave
>>> 12362 sylvain   25   0  2376  244  188 R  100  0.0   0:18.12 slave
>>> 12360 sylvain   25   0  2376  244  188 R  100  0.0   0:18.23 slave
>>> 12363 sylvain   25   0  2376  244  188 R  100  0.0   0:18.15 slave
>>> 12364 sylvain   20   0  2376  248  192 R    0  0.0   0:00.00 master
>>> 12365 sylvain   16   0  6280 1120  772 R    0  0.0   0:00.08 top
>>>
>>> If you are seeing 66% each, I guess that your master is not 
>>> sched_yield'ing as much as expected. Maybe you should look at 
>>> environment variables to force yield when no message is available, and 
>>> maybe your master isn't so idle after all and has message to send 
>>> continuously, thus not yield'ing.
>>>
>>> Anyway, strings <binary|mpi library> | grep YIELD may find environment 
>>> variables to control that.
>>>
>>> Sylvain
>>>
>>> On Fri, 14 Sep 2007, Yusong Wang wrote:
>>>
>>>> On Fri, 2007-09-14 at 09:42 +0200, Sylvain Jeaugey wrote:
>>>>> Yusong,
>>>>>
>>>>> I may be wrong for nemesis, but most shm-based MPI implementations 
>>>>> rely on
>>>>> busy polling, which make them appear as using 100% CPU. It may not be a
>>>>> problem though because thay also call frequently sched_yield() when they
>>>>> have nothing to receive, which means that if another task is running on
>>>>> the same CPU, the "master" task will give all his CPU time to the
>>>>> other task.
>>>> Unfortunately, on my dell Latitude D630 laptop (dual-core), this didn't
>>>> happen. I launched 3 processes and each process uses %66 CPU. It seems
>>>> to me the process switches between the cores, as any two of them will be
>>>> over 100%. On another test with more cores, I launched n_core+1
>>>> processes, 2 of the processes use 50% CPU, the remaining use 100% CPU.
>>>>
>>>>
>>>>> So, it's not really a problem to have task 0 at 100% CPU. Just launch an
>>>>> additionnal task and see if it takes the CPU cycles of the master. You
>>>>> might also use taskset (at least on Fedora) to bind tasks on CPUs.
>>>>>
>>>>> Sylvain
>>>>>
>>>>> On Thu, 13 Sep 2007, Yusong Wang wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a program which is implemented with a master/slave model and the
>>>>>> master just do very little computation. In my test, the master spent
>>>>>> most of its time to wait other process to finish MPI_Gather
>>>>>> communication (confirmed with jumpshot/MPE). In several tests on
>>>>>> different multi-core chips (dual-core, quad-core, 8-core), I found the
>>>>>> master use the same amount of CPU as the slaves, which should do all 
>>>>>> the
>>>>>> computation. There are only two exceptions that the master use near 0%
>>>>>> CPU (one on Window, one on Linux), which is what I expect. The tests
>>>>>> were did on both Fedora Linux and Widows with MPICH2 (shm/nemesis
>>>>>> mpd/smpd). I don't know if it is a software/system issue or caused by
>>>>>> different hardware. I would think this is  (at least )related with
>>>>>> hardware. As with the same operating system, I got different CPU usage
>>>>>> (near 0% or near 100%) for the master on different multi-core nodes of
>>>>>> our clusters.
>>>>>>
>>>>>> Is there any documents I can check out for this issue?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yusong
>>>>>>
>>>>>>
>>>>
>