[MPICH] An idle communication process use the same CPU as computation process on multi-core chips
Bob Soliday
soliday at aps.anl.gov
Fri Sep 14 12:11:16 CDT 2007
Sylvain Jeaugey wrote:
> That's unfortunate.
>
> Still, I did two programs. A master :
> ----------------------
> int main() {
> while (1) {
> sched_yield();
> }
> return 0;
> }
> ----------------------
> and a slave :
> ----------------------
> int main() {
> while (1);
> return 0;
> }
> ----------------------
>
> I launch 4 slaves and 1 master on a bi dual-core machine. Here is the
> result in top :
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 12361 sylvain 25 0 2376 244 188 R 100 0.0 0:18.26 slave
> 12362 sylvain 25 0 2376 244 188 R 100 0.0 0:18.12 slave
> 12360 sylvain 25 0 2376 244 188 R 100 0.0 0:18.23 slave
> 12363 sylvain 25 0 2376 244 188 R 100 0.0 0:18.15 slave
> 12364 sylvain 20 0 2376 248 192 R 0 0.0 0:00.00 master
> 12365 sylvain 16 0 6280 1120 772 R 0 0.0 0:00.08 top
>
> If you are seeing 66% each, I guess that your master is not
> sched_yield'ing as much as expected. Maybe you should look at
> environment variables to force yield when no message is available, and
> maybe your master isn't so idle after all and has message to send
> continuously, thus not yield'ing.
>
On our FC5 nodes with 4 cores we get similar results. But on our FC7
nodes with 8 cores we don't. The kernel seems to think that all 9 jobs
require 100% and they end up jumping from one core to another. Often the
master job is left on it's own core while two slaves run on another.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
20127 ywang25 20 0 106m 22m 4168 R 68 0.5 0:06.84 0 slave
20131 ywang25 20 0 106m 22m 4184 R 73 0.5 0:07.26 1 slave
20133 ywang25 20 0 106m 22m 4196 R 75 0.5 0:07.49 2 slave
20129 ywang25 20 0 106m 22m 4176 R 84 0.5 0:08.44 3 slave
20135 ywang25 20 0 106m 22m 4176 R 73 0.5 0:07.29 4 slave
20132 ywang25 20 0 106m 22m 4188 R 70 0.5 0:07.04 4 slave
20128 ywang25 20 0 106m 22m 4180 R 78 0.5 0:07.79 5 slave
20130 ywang25 20 0 106m 22m 4180 R 74 0.5 0:07.45 6 slave
20134 ywang25 20 0 106m 24m 6708 R 80 0.6 0:07.98 7 master
20135 ywang25 20 0 106m 22m 4176 R 75 0.5 0:14.75 0 slave
20132 ywang25 20 0 106m 22m 4188 R 79 0.5 0:14.96 1 slave
20130 ywang25 20 0 106m 22m 4180 R 99 0.5 0:17.32 2 slave
20129 ywang25 20 0 106m 22m 4176 R 100 0.5 0:18.44 3 slave
20127 ywang25 20 0 106m 22m 4168 R 75 0.5 0:14.36 4 slave
20133 ywang25 20 0 106m 22m 4196 R 96 0.5 0:17.09 5 slave
20131 ywang25 20 0 106m 22m 4184 R 78 0.5 0:15.02 6 slave
20128 ywang25 20 0 106m 22m 4180 R 99 0.5 0:17.70 6 slave
20134 ywang25 20 0 106m 24m 6708 R 100 0.6 0:17.97 7 master
20130 ywang25 20 0 106m 22m 4180 R 87 0.5 0:25.99 0 slave
20132 ywang25 20 0 106m 22m 4188 R 79 0.5 0:22.83 0 slave
20127 ywang25 20 0 106m 22m 4168 R 75 0.5 0:21.89 1 slave
20133 ywang25 20 0 106m 22m 4196 R 98 0.5 0:26.94 2 slave
20129 ywang25 20 0 106m 22m 4176 R 100 0.5 0:28.45 3 slave
20135 ywang25 20 0 106m 22m 4176 R 74 0.5 0:22.12 4 slave
20134 ywang25 20 0 106m 24m 6708 R 98 0.6 0:27.73 5 master
20128 ywang25 20 0 106m 22m 4180 R 90 0.5 0:26.72 6 slave
20131 ywang25 20 0 106m 22m 4184 R 99 0.5 0:24.96 7 slave
20133 ywang25 20 0 91440 5756 4852 R 87 0.1 0:44.20 0 slave
20132 ywang25 20 0 91436 5764 4860 R 80 0.1 0:39.32 0 slave
20134
ywang25 20 0 112m 36m 11m R 96 0.9 0:47.35 5 master
20129 ywang25 20 0 91440 5736 4832 R 91 0.1 0:46.84 1 slave
20130 ywang25 20 0 91440 5748 4844 R 83 0.1 0:43.07 3 slave
20131 ywang25 20 0 91432 5744 4840 R 84 0.1 0:41.20 4 slave
20134 ywang25 20 0 112m 36m 11m R 96 0.9 0:47.35 5 master
20128 ywang25 20 0 91432 5752 4844 R 93 0.1 0:45.36 5 slave
20127 ywang25 20 0 91440 5724 4824 R 94 0.1 0:40.56 6 slave
20135 ywang25 20 0 91440 5736 4832 R 92 0.1 0:39.75 7 slave
More information about the mpich-discuss
mailing list