[MPICH] An idle communication process use the same CPU as computation process on multi-core chips

Bob Soliday soliday at aps.anl.gov
Fri Sep 14 12:11:16 CDT 2007


Sylvain Jeaugey wrote:
> That's unfortunate.
> 
> Still, I did two programs. A master :
> ----------------------
> int main() {
>         while (1) {
>             sched_yield();
>         }
>         return 0;
> }
> ----------------------
> and a slave :
> ----------------------
> int main() {
>         while (1);
>         return 0;
> }
> ----------------------
> 
> I launch 4 slaves and 1 master on a bi dual-core machine. Here is the 
> result in top :
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 12361 sylvain   25   0  2376  244  188 R  100  0.0   0:18.26 slave
> 12362 sylvain   25   0  2376  244  188 R  100  0.0   0:18.12 slave
> 12360 sylvain   25   0  2376  244  188 R  100  0.0   0:18.23 slave
> 12363 sylvain   25   0  2376  244  188 R  100  0.0   0:18.15 slave
> 12364 sylvain   20   0  2376  248  192 R    0  0.0   0:00.00 master
> 12365 sylvain   16   0  6280 1120  772 R    0  0.0   0:00.08 top
> 
> If you are seeing 66% each, I guess that your master is not 
> sched_yield'ing as much as expected. Maybe you should look at 
> environment variables to force yield when no message is available, and 
> maybe your master isn't so idle after all and has message to send 
> continuously, thus not yield'ing.
> 

On our FC5 nodes with 4 cores we get similar results. But on our FC7 
nodes with 8 cores we don't. The kernel seems to think that all 9 jobs 
require 100% and they end up jumping from one core to another. Often the 
master job is left on it's own core while two slaves run on another.

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND 

20127 ywang25   20   0  106m  22m 4168 R   68  0.5   0:06.84 0 slave 

20131 ywang25   20   0  106m  22m 4184 R   73  0.5   0:07.26 1 slave 

20133 ywang25   20   0  106m  22m 4196 R   75  0.5   0:07.49 2 slave 

20129 ywang25   20   0  106m  22m 4176 R   84  0.5   0:08.44 3 slave 

20135 ywang25   20   0  106m  22m 4176 R   73  0.5   0:07.29 4 slave 

20132 ywang25   20   0  106m  22m 4188 R   70  0.5   0:07.04 4 slave 

20128 ywang25   20   0  106m  22m 4180 R   78  0.5   0:07.79 5 slave 

20130 ywang25   20   0  106m  22m 4180 R   74  0.5   0:07.45 6 slave 

20134 ywang25   20   0  106m  24m 6708 R   80  0.6   0:07.98 7 master 


20135 ywang25   20   0  106m  22m 4176 R   75  0.5   0:14.75 0 slave 

20132 ywang25   20   0  106m  22m 4188 R   79  0.5   0:14.96 1 slave 

20130 ywang25   20   0  106m  22m 4180 R   99  0.5   0:17.32 2 slave 

20129 ywang25   20   0  106m  22m 4176 R  100  0.5   0:18.44 3 slave 

20127 ywang25   20   0  106m  22m 4168 R   75  0.5   0:14.36 4 slave 

20133 ywang25   20   0  106m  22m 4196 R   96  0.5   0:17.09 5 slave 

20131 ywang25   20   0  106m  22m 4184 R   78  0.5   0:15.02 6 slave 

20128 ywang25   20   0  106m  22m 4180 R   99  0.5   0:17.70 6 slave 

20134 ywang25   20   0  106m  24m 6708 R  100  0.6   0:17.97 7 master 


20130 ywang25   20   0  106m  22m 4180 R   87  0.5   0:25.99 0 slave 

20132 ywang25   20   0  106m  22m 4188 R   79  0.5   0:22.83 0 slave 

20127 ywang25   20   0  106m  22m 4168 R   75  0.5   0:21.89 1 slave 

20133 ywang25   20   0  106m  22m 4196 R   98  0.5   0:26.94 2 slave 

20129 ywang25   20   0  106m  22m 4176 R  100  0.5   0:28.45 3 slave 

20135 ywang25   20   0  106m  22m 4176 R   74  0.5   0:22.12 4 slave 

20134 ywang25   20   0  106m  24m 6708 R   98  0.6   0:27.73 5 master 

20128 ywang25   20   0  106m  22m 4180 R   90  0.5   0:26.72 6 slave 

20131 ywang25   20   0  106m  22m 4184 R   99  0.5   0:24.96 7 slave 


20133 ywang25   20   0 91440 5756 4852 R   87  0.1   0:44.20 0 slave 

20132 ywang25   20   0 91436 5764 4860 R   80  0.1   0:39.32 0 slave 
                                                            20134 
ywang25   20   0  112m  36m  11m R   96  0.9   0:47.35 5 master 

20129 ywang25   20   0 91440 5736 4832 R   91  0.1   0:46.84 1 slave 

20130 ywang25   20   0 91440 5748 4844 R   83  0.1   0:43.07 3 slave 

20131 ywang25   20   0 91432 5744 4840 R   84  0.1   0:41.20 4 slave 

20134 ywang25   20   0  112m  36m  11m R   96  0.9   0:47.35 5 master 

20128 ywang25   20   0 91432 5752 4844 R   93  0.1   0:45.36 5 slave 

20127 ywang25   20   0 91440 5724 4824 R   94  0.1   0:40.56 6 slave 

20135 ywang25   20   0 91440 5736 4832 R   92  0.1   0:39.75 7 slave 








More information about the mpich-discuss mailing list