[MPICH] An idle communication process use the same CPU as computation process on multi-core chips
Darius Buntinas
buntinas at mcs.anl.gov
Tue Sep 25 13:04:39 CDT 2007
Hmm. Maybe things aren't as bad as I thought. It looks like Linus is
pushing for the previous yield() behavior.
http://kerneltrap.org/Linux/CFS_and_sched_yield
-d
On 09/18/2007 01:23 PM, Darius Buntinas wrote:
>
> From the discussion on lkml and the fact that they see programs that
> use sched_yield() this way as "fundamentally broken", it seems that this
> patch is only temporary, and eventually the pre-2.6.22 kernel behavior
> won't be supported.
>
> -d
>
> On 09/18/2007 12:45 PM, Bob Soliday wrote:
>> Well I reported the bug and it turns out they already have a patch for
>> it that will be included in a future release so that it will be
>> possible to emulate the old scheduler.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=295071
>>
>> http://lkml.org/lkml/2007/9/14/157
>>
>> --Bob
>>
>> Bob Soliday wrote:
>>> It turns out the problem is not related to the number of cores. Only
>>> the newest versions of the Fedora 7 kernel show the problem. I think
>>> it is related to the CFS scheduler in these kernels.
>>>
>>> When I run one slave and one master on the same core with
>>> kernel-2.6.21-1.3194 using Darius's slave code I see the slave task
>>> use 100% of the CPU and see the same timing values as when I run the
>>> slave on a different core.
>>>
>>> When I do the same test with kernel-2.6.22.4-65 or kernel-2.6.22.5.76
>>> the timing values double as the slave can only get 50% of the CPU
>>> time when on the same core.
>>>
>>> --Bob
>>>
>>> Darius Buntinas wrote:
>>>
>>>>
>>>> I can verify that I saw the same problem Yusong did when starting
>>>> the master first on a dual quadcore machine. But assigning each
>>>> slave to its own core (using taskset) fixed that.
>>>>
>>>> Interestingly, when there are less than 8 slaves, top shows that the
>>>> master has 100% usage (when top is in "irix mode", and 12.5% (1/8)
>>>> when not in irix mode). When I have 8 slaves, the usage of the
>>>> master process goes to 0.
>>>>
>>>> Yusong, I'm betting that if you set the cpu affinity for the slaves,
>>>> you'll see no impact of the master on the slaves. Can you try that?
>>>>
>>>> e.g.,:
>>>> ./master &
>>>> for i in `seq 0 3` ; do taskset -c $i ./slave & done
>>>>
>>>> -d
>>>>
>>>> On 09/17/2007 02:31 AM, Sylvain Jeaugey wrote:
>>>>
>>>>> This seems to be the key of the problem. When the master is
>>>>> launched before others, it takes one CPU and this won't change
>>>>> until for any scheduling reason he comes to share its CPU (with a
>>>>> slave). It then falls to 0% and we're saved.
>>>>>
>>>>> So, to conduct you experiment, you definetely need to taskset your
>>>>> slaves. Just launch them with
>>>>> taskset -c <cpu> ./slave (1 process per cpu)
>>>>> or use the -p option of taskset to do it after launch and ensure
>>>>> that each slave _will_ take one CPU. Thus, the master will be
>>>>> obliged to share the cpu with others and sched_yield() will be
>>>>> effective.
>>>>>
>>>>> Sylvain
>>>>>
>>>>> On Sun, 16 Sep 2007, Yusong Wang wrote:
>>>>>
>>>>>> I did the experiments on four types of muti-core chips (2
>>>>>> dual-core, 1 quad-core and 1 eight-core). All of my tests shows
>>>>>> the idle master process has a big impact on the other slave
>>>>>> processes except for the test of the quad-core, in which I found
>>>>>> the order does matter: when the master was launched after the
>>>>>> slave processes were launched, there is no affect, while if the
>>>>>> master started first, two slaves processes would go to the same
>>>>>> core and cause the two processes to slow down significantly than
>>>>>> others.
>>>>>>
>>>>>> Yusong
>>>>>>
>>>>>> ----- Original Message -----
>>>>>> From: Darius Buntinas <buntinas at mcs.anl.gov>
>>>>>> Date: Friday, September 14, 2007 12:55 pm
>>>>>> Subject: Re: [MPICH] An idle communication process use the same
>>>>>> CPU as computation process on multi-core chips
>>>>>>
>>>>>>>
>>>>>>> It's possible that different versions of the kernel/os/top compute
>>>>>>> %cpu
>>>>>>> differently. "CPU utilization" is really a nebulous term. What
>>>>>>> you
>>>>>>> really want to know is whether the master is stealing significant
>>>>>>> cycles
>>>>>>> from the slaves. A test of this would be to replace Sylvain's
>>>>>>> slave
>>>>>>> code with this:
>>>>>>>
>>>>>>> #include <sys/time.h>
>>>>>>> int main() {
>>>>>>> while (1) {
>>>>>>> int i;
>>>>>>> struct timeval t0,t1;
>>>>>>> double usec;
>>>>>>>
>>>>>>> gettimeofday(&t0, 0);
>>>>>>> for (i = 0; i < 100000000; ++i)
>>>>>>> ;
>>>>>>> gettimeofday(&t1, 0);
>>>>>>>
>>>>>>> usec = (t1.tv_sec * 1e6 + t1.tv_usec) - (t0.tv_sec * 1e6 +
>>>>>>> t0.tv_usec);
>>>>>>> printf ("%8.0f\n", usec);
>>>>>>> }
>>>>>>> return 0;
>>>>>>> }
>>>>>>>
>>>>>>> This will repeatedly time the inner loop. On an N core system, run
>>>>>>> N of
>>>>>>> these, and look at the times reported. Then start the master and
>>>>>>> see if
>>>>>>> the timings change. If the master does steal significant cycles
>>>>>>> from
>>>>>>> the slaves, then you'll see the timings reported by the slaves
>>>>>>> increase.
>>>>>>> On my single processor laptop (fc6, 2.6.20), running one slave, I
>>>>>>> see
>>>>>>> no impact from the master.
>>>>>>>
>>>>>>> Please let me know what you find.
>>>>>>>
>>>>>>> As far as slave processes hopping around on processors, you can set
>>>>>>> processor affinity ( http://www.linuxjournal.com/article/6799 has a
>>>>>>> good
>>>>>>> description) on the slaves.
>>>>>>>
>>>>>>> -d
>>>>>>>
>>>>>>> On 09/14/2007 12:11 PM, Bob Soliday wrote:
>>>>>>>
>>>>>>>> Sylvain Jeaugey wrote:
>>>>>>>>
>>>>>>>>> That's unfortunate.
>>>>>>>>>
>>>>>>>>> Still, I did two programs. A master :
>>>>>>>>> ----------------------
>>>>>>>>> int main() {
>>>>>>>>> while (1) {
>>>>>>>>> sched_yield();
>>>>>>>>> }
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>> ----------------------
>>>>>>>>> and a slave :
>>>>>>>>> ----------------------
>>>>>>>>> int main() {
>>>>>>>>> while (1);
>>>>>>>>> return 0;
>>>>>>>>> }
>>>>>>>>> ----------------------
>>>>>>>>>
>>>>>>>>> I launch 4 slaves and 1 master on a bi dual-core machine. Here
>>>>>>>
>>>>>>>
>>>>>>> is the
>>>>>>>
>>>>>>>>> result in top :
>>>>>>>>>
>>>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>>>>>>>
>>>>>>>
>>>>>>> COMMAND>> 12361 sylvain 25 0 2376 244 188 R 100 0.0
>>>>>>> 0:18.26 slave
>>>>>>>
>>>>>>>>> 12362 sylvain 25 0 2376 244 188 R 100 0.0 0:18.12 slave
>>>>>>>>> 12360 sylvain 25 0 2376 244 188 R 100 0.0 0:18.23 slave
>>>>>>>>> 12363 sylvain 25 0 2376 244 188 R 100 0.0 0:18.15 slave
>>>>>>>>> 12364 sylvain 20 0 2376 248 192 R 0 0.0 0:00.00
>>>>>>>>> master
>>>>>>>>> 12365 sylvain 16 0 6280 1120 772 R 0 0.0 0:00.08 top
>>>>>>>>>
>>>>>>>>> If you are seeing 66% each, I guess that your master is not
>>>>>>>>> sched_yield'ing as much as expected. Maybe you should look at
>>>>>>>>> environment variables to force yield when no message is
>>>>>>>
>>>>>>>
>>>>>>> available, and
>>>>>>>
>>>>>>>>> maybe your master isn't so idle after all and has message to
>>>>>>>
>>>>>>>
>>>>>>> send
>>>>>>>
>>>>>>>>> continuously, thus not yield'ing.
>>>>>>>>>
>>>>>>>>
>>>>>>>> On our FC5 nodes with 4 cores we get similar results. But on our
>>>>>>>
>>>>>>>
>>>>>>> FC7
>>>>>>>
>>>>>>>> nodes with 8 cores we don't. The kernel seems to think that all 9
>>>>>>>
>>>>>>>
>>>>>>> jobs
>>>>>>>
>>>>>>>> require 100% and they end up jumping from one core to another.
>>>>>>>
>>>>>>>
>>>>>>> Often the
>>>>>>>
>>>>>>>> master job is left on it's own core while two slaves run on
>>>>>>>> another.
>>>>>>>>
>>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P
>>>>>>>
>>>>>>>
>>>>>>> COMMAND> 20127 ywang25 20 0 106m 22m 4168 R 68 0.5
>>>>>>> 0:06.84 0 slave
>>>>>>>
>>>>>>>> 20131 ywang25 20 0 106m 22m 4184 R 73 0.5 0:07.26 1
>>>>>>>> slave
>>>>>>>> 20133 ywang25 20 0 106m 22m 4196 R 75 0.5 0:07.49 2
>>>>>>>> slave
>>>>>>>> 20129 ywang25 20 0 106m 22m 4176 R 84 0.5 0:08.44 3
>>>>>>>> slave
>>>>>>>> 20135 ywang25 20 0 106m 22m 4176 R 73 0.5 0:07.29 4
>>>>>>>> slave
>>>>>>>> 20132 ywang25 20 0 106m 22m 4188 R 70 0.5 0:07.04 4
>>>>>>>> slave
>>>>>>>> 20128 ywang25 20 0 106m 22m 4180 R 78 0.5 0:07.79 5
>>>>>>>> slave
>>>>>>>> 20130 ywang25 20 0 106m 22m 4180 R 74 0.5 0:07.45 6
>>>>>>>> slave
>>>>>>>> 20134 ywang25 20 0 106m 24m 6708 R 80 0.6 0:07.98 7
>>>>>>>
>>>>>>>
>>>>>>> master>
>>>>>>>
>>>>>>>> 20135 ywang25 20 0 106m 22m 4176 R 75 0.5 0:14.75 0
>>>>>>>> slave
>>>>>>>> 20132 ywang25 20 0 106m 22m 4188 R 79 0.5 0:14.96 1
>>>>>>>> slave
>>>>>>>> 20130 ywang25 20 0 106m 22m 4180 R 99 0.5 0:17.32 2
>>>>>>>> slave
>>>>>>>> 20129 ywang25 20 0 106m 22m 4176 R 100 0.5 0:18.44 3
>>>>>>>> slave
>>>>>>>> 20127 ywang25 20 0 106m 22m 4168 R 75 0.5 0:14.36 4
>>>>>>>> slave
>>>>>>>> 20133 ywang25 20 0 106m 22m 4196 R 96 0.5 0:17.09 5
>>>>>>>> slave
>>>>>>>> 20131 ywang25 20 0 106m 22m 4184 R 78 0.5 0:15.02 6
>>>>>>>> slave
>>>>>>>> 20128 ywang25 20 0 106m 22m 4180 R 99 0.5 0:17.70 6
>>>>>>>> slave
>>>>>>>> 20134 ywang25 20 0 106m 24m 6708 R 100 0.6 0:17.97 7
>>>>>>>
>>>>>>>
>>>>>>> master>
>>>>>>>
>>>>>>>> 20130 ywang25 20 0 106m 22m 4180 R 87 0.5 0:25.99 0
>>>>>>>> slave
>>>>>>>> 20132 ywang25 20 0 106m 22m 4188 R 79 0.5 0:22.83 0
>>>>>>>> slave
>>>>>>>> 20127 ywang25 20 0 106m 22m 4168 R 75 0.5 0:21.89 1
>>>>>>>> slave
>>>>>>>> 20133 ywang25 20 0 106m 22m 4196 R 98 0.5 0:26.94 2
>>>>>>>> slave
>>>>>>>> 20129 ywang25 20 0 106m 22m 4176 R 100 0.5 0:28.45 3
>>>>>>>> slave
>>>>>>>> 20135 ywang25 20 0 106m 22m 4176 R 74 0.5 0:22.12 4
>>>>>>>> slave
>>>>>>>> 20134 ywang25 20 0 106m 24m 6708 R 98 0.6 0:27.73 5
>>>>>>>
>>>>>>>
>>>>>>> master> 20128 ywang25 20 0 106m 22m 4180 R 90 0.5
>>>>>>> 0:26.72 6 slave
>>>>>>>
>>>>>>>> 20131 ywang25 20 0 106m 22m 4184 R 99 0.5 0:24.96 7
>>>>>>>> slave
>>>>>>>>
>>>>>>>> 20133 ywang25 20 0 91440 5756 4852 R 87 0.1 0:44.20 0
>>>>>>>> slave
>>>>>>>> 20132 ywang25 20 0 91436 5764 4860 R 80 0.1 0:39.32 0
>>>>>>>
>>>>>>>
>>>>>>> slave
>>>>>>>
>>>>>>>> 20134
>>>>>>>> ywang25 20 0 112m 36m 11m R 96 0.9 0:47.35 5 master
>>>>>>>> 20129 ywang25 20 0 91440 5736 4832 R 91 0.1 0:46.84 1
>>>>>>>> slave
>>>>>>>> 20130 ywang25 20 0 91440 5748 4844 R 83 0.1 0:43.07 3
>>>>>>>> slave
>>>>>>>> 20131 ywang25 20 0 91432 5744 4840 R 84 0.1 0:41.20 4
>>>>>>>> slave
>>>>>>>> 20134 ywang25 20 0 112m 36m 11m R 96 0.9 0:47.35 5
>>>>>>>
>>>>>>>
>>>>>>> master> 20128 ywang25 20 0 91432 5752 4844 R 93 0.1
>>>>>>> 0:45.36 5 slave
>>>>>>>
>>>>>>>> 20127 ywang25 20 0 91440 5724 4824 R 94 0.1 0:40.56 6
>>>>>>>> slave
>>>>>>>> 20135 ywang25 20 0 91440 5736 4832 R 92 0.1 0:39.75 7
>>>>>>>> slave
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>
>
More information about the mpich-discuss
mailing list