[MPICH] An idle communication process use the same CPU as computation process on multi-core chips

Darius Buntinas buntinas at mcs.anl.gov
Mon Sep 17 12:39:46 CDT 2007


I can verify that I saw the same problem Yusong did when starting the 
master first on a dual quadcore machine.  But assigning each slave to 
its own core (using taskset) fixed that.

Interestingly, when there are less than 8 slaves, top shows that the 
master has 100% usage (when top is in "irix mode", and 12.5% (1/8) when 
not in irix mode).  When I have 8 slaves, the usage of the master 
process goes to 0.

Yusong, I'm betting that if you set the cpu affinity for the slaves, 
you'll see no impact of the master on the slaves.  Can you try that?

e.g.,:
   ./master &
   for i in `seq 0 3` ; do taskset -c $i ./slave & done

-d

On 09/17/2007 02:31 AM, Sylvain Jeaugey wrote:
> This seems to be the key of the problem. When the master is launched 
> before others, it takes one CPU and this won't change until for any 
> scheduling reason he comes to share its CPU (with a slave). It then 
> falls to 0% and we're saved.
> 
> So, to conduct you experiment, you definetely need to taskset your 
> slaves. Just launch them with
> taskset -c <cpu> ./slave (1 process per cpu)
> or use the -p option of taskset to do it after launch and ensure that 
> each slave _will_ take one CPU. Thus, the master will be obliged to 
> share the cpu with others and sched_yield() will be effective.
> 
> Sylvain
> 
> On Sun, 16 Sep 2007, Yusong Wang wrote:
> 
>> I did the experiments on  four types of muti-core chips (2 dual-core, 
>> 1 quad-core and 1 eight-core).  All of my tests shows the idle master 
>> process has a big impact on the other slave processes except for the 
>> test of the quad-core, in which I found the order does matter: when 
>> the master was launched after the slave processes were launched, there 
>> is no affect, while if the master started first, two slaves processes 
>> would go to the same core and cause the two processes to slow down 
>> significantly than others.
>>
>> Yusong
>>
>> ----- Original Message -----
>> From: Darius Buntinas <buntinas at mcs.anl.gov>
>> Date: Friday, September 14, 2007 12:55 pm
>> Subject: Re: [MPICH] An idle communication process use the same CPU as 
>> computation process on multi-core chips
>>
>>>
>>> It's possible that different versions of the kernel/os/top compute
>>> %cpu
>>> differently.  "CPU utilization" is really a nebulous term.  What
>>> you
>>> really want to know is whether the master is stealing significant
>>> cycles
>>> from the slaves.  A test of this would be to replace Sylvain's
>>> slave
>>> code with this:
>>>
>>> #include <sys/time.h>
>>> int main() {
>>>     while (1) {
>>>         int i;
>>>         struct timeval t0,t1;
>>>         double usec;
>>>
>>>         gettimeofday(&t0, 0);
>>>         for (i = 0; i < 100000000; ++i)
>>>             ;
>>>         gettimeofday(&t1, 0);
>>>
>>>         usec = (t1.tv_sec * 1e6 + t1.tv_usec) - (t0.tv_sec * 1e6 +
>>> t0.tv_usec);
>>>         printf ("%8.0f\n", usec);
>>>     }
>>>     return 0;
>>> }
>>>
>>> This will repeatedly time the inner loop.  On an N core system, run
>>> N of
>>> these, and look at the times reported.  Then start the master and
>>> see if
>>> the timings change.  If the master does steal significant cycles
>>> from
>>> the slaves, then you'll see the timings reported by the slaves
>>> increase.
>>>  On my single processor laptop (fc6, 2.6.20), running one slave, I
>>> see
>>> no impact from the master.
>>>
>>> Please let me know what you find.
>>>
>>> As far as slave processes hopping around on processors, you can set
>>> processor affinity ( http://www.linuxjournal.com/article/6799 has a
>>> good
>>> description) on the slaves.
>>>
>>> -d
>>>
>>> On 09/14/2007 12:11 PM, Bob Soliday wrote:
>>>> Sylvain Jeaugey wrote:
>>>>> That's unfortunate.
>>>>>
>>>>> Still, I did two programs. A master :
>>>>> ----------------------
>>>>> int main() {
>>>>>         while (1) {
>>>>>             sched_yield();
>>>>>         }
>>>>>         return 0;
>>>>> }
>>>>> ----------------------
>>>>> and a slave :
>>>>> ----------------------
>>>>> int main() {
>>>>>         while (1);
>>>>>         return 0;
>>>>> }
>>>>> ----------------------
>>>>>
>>>>> I launch 4 slaves and 1 master on a bi dual-core machine. Here
>>> is the
>>>>> result in top :
>>>>>
>>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>> COMMAND>> 12361 sylvain   25   0  2376  244  188 R  100  0.0
>>> 0:18.26 slave
>>>>> 12362 sylvain   25   0  2376  244  188 R  100  0.0   0:18.12 slave
>>>>> 12360 sylvain   25   0  2376  244  188 R  100  0.0   0:18.23 slave
>>>>> 12363 sylvain   25   0  2376  244  188 R  100  0.0   0:18.15 slave
>>>>> 12364 sylvain   20   0  2376  248  192 R    0  0.0   0:00.00 master
>>>>> 12365 sylvain   16   0  6280 1120  772 R    0  0.0   0:00.08 top
>>>>>
>>>>> If you are seeing 66% each, I guess that your master is not
>>>>> sched_yield'ing as much as expected. Maybe you should look at
>>>>> environment variables to force yield when no message is
>>> available, and
>>>>> maybe your master isn't so idle after all and has message to
>>> send
>>>>> continuously, thus not yield'ing.
>>>>>
>>>>
>>>> On our FC5 nodes with 4 cores we get similar results. But on our
>>> FC7
>>>> nodes with 8 cores we don't. The kernel seems to think that all 9
>>> jobs
>>>> require 100% and they end up jumping from one core to another.
>>> Often the
>>>> master job is left on it's own core while two slaves run on another.
>>>>
>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P
>>> COMMAND> 20127 ywang25   20   0  106m  22m 4168 R   68  0.5
>>> 0:06.84 0 slave
>>>> 20131 ywang25   20   0  106m  22m 4184 R   73  0.5   0:07.26 1 slave
>>>> 20133 ywang25   20   0  106m  22m 4196 R   75  0.5   0:07.49 2 slave
>>>> 20129 ywang25   20   0  106m  22m 4176 R   84  0.5   0:08.44 3 slave
>>>> 20135 ywang25   20   0  106m  22m 4176 R   73  0.5   0:07.29 4 slave
>>>> 20132 ywang25   20   0  106m  22m 4188 R   70  0.5   0:07.04 4 slave
>>>> 20128 ywang25   20   0  106m  22m 4180 R   78  0.5   0:07.79 5 slave
>>>> 20130 ywang25   20   0  106m  22m 4180 R   74  0.5   0:07.45 6 slave
>>>> 20134 ywang25   20   0  106m  24m 6708 R   80  0.6   0:07.98 7
>>> master>
>>>> 20135 ywang25   20   0  106m  22m 4176 R   75  0.5   0:14.75 0 slave
>>>> 20132 ywang25   20   0  106m  22m 4188 R   79  0.5   0:14.96 1 slave
>>>> 20130 ywang25   20   0  106m  22m 4180 R   99  0.5   0:17.32 2 slave
>>>> 20129 ywang25   20   0  106m  22m 4176 R  100  0.5   0:18.44 3 slave
>>>> 20127 ywang25   20   0  106m  22m 4168 R   75  0.5   0:14.36 4 slave
>>>> 20133 ywang25   20   0  106m  22m 4196 R   96  0.5   0:17.09 5 slave
>>>> 20131 ywang25   20   0  106m  22m 4184 R   78  0.5   0:15.02 6 slave
>>>> 20128 ywang25   20   0  106m  22m 4180 R   99  0.5   0:17.70 6 slave
>>>> 20134 ywang25   20   0  106m  24m 6708 R  100  0.6   0:17.97 7
>>> master>
>>>> 20130 ywang25   20   0  106m  22m 4180 R   87  0.5   0:25.99 0 slave
>>>> 20132 ywang25   20   0  106m  22m 4188 R   79  0.5   0:22.83 0 slave
>>>> 20127 ywang25   20   0  106m  22m 4168 R   75  0.5   0:21.89 1 slave
>>>> 20133 ywang25   20   0  106m  22m 4196 R   98  0.5   0:26.94 2 slave
>>>> 20129 ywang25   20   0  106m  22m 4176 R  100  0.5   0:28.45 3 slave
>>>> 20135 ywang25   20   0  106m  22m 4176 R   74  0.5   0:22.12 4 slave
>>>> 20134 ywang25   20   0  106m  24m 6708 R   98  0.6   0:27.73 5
>>> master> 20128 ywang25   20   0  106m  22m 4180 R   90  0.5
>>> 0:26.72 6 slave
>>>> 20131 ywang25   20   0  106m  22m 4184 R   99  0.5   0:24.96 7 slave
>>>>
>>>> 20133 ywang25   20   0 91440 5756 4852 R   87  0.1   0:44.20 0 slave
>>>> 20132 ywang25   20   0 91436 5764 4860 R   80  0.1   0:39.32 0
>>> slave
>>>>                                                            20134
>>>> ywang25   20   0  112m  36m  11m R   96  0.9   0:47.35 5 master
>>>> 20129 ywang25   20   0 91440 5736 4832 R   91  0.1   0:46.84 1 slave
>>>> 20130 ywang25   20   0 91440 5748 4844 R   83  0.1   0:43.07 3 slave
>>>> 20131 ywang25   20   0 91432 5744 4840 R   84  0.1   0:41.20 4 slave
>>>> 20134 ywang25   20   0  112m  36m  11m R   96  0.9   0:47.35 5
>>> master> 20128 ywang25   20   0 91432 5752 4844 R   93  0.1
>>> 0:45.36 5 slave
>>>> 20127 ywang25   20   0 91440 5724 4824 R   94  0.1   0:40.56 6 slave
>>>> 20135 ywang25   20   0 91440 5736 4832 R   92  0.1   0:39.75 7 slave
>>>>
>>>>
>>>>
>>>>
>>>
>>
> 




More information about the mpich-discuss mailing list