Dear all,<br><br>I have a MPI application where one of the ranks enters a barrier a lot later than the others. When using the ch3:shm device, I noticed that the other N-1 ranks chew up all the CPU and seem to be calling sched_yield. When I start up another CPU-intensive application, the N-1 ranks waiting on the barrier do not seem to yield to the application that actually requires the CPU. Is there anything I can do to force the waiting processes to yield CPU (isn't this the whole point of sched_yield?)? Would it work if I reimplemented MPI_Barrier to not use MPI_Sendrecv (it would still have to call MPI_Wait somewhere though..)
<br><br>I am able to reproduce the problem with MPICH 2-1.0.5p3 (compiled with the Intel C compiler version 9.1.045) running on a 4 way Opteron SMP machine (uname -a shows: Linux gezora4 2.6.5-7.283-smp #1 SMP Wed Nov 29 16:55:53 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux). The following simple program seems to trigger the problem:
<br>#include <stdio.h><br>#include <unistd.h><br>#include <mpi.h><br><br>int main(void)<br>{<br> int comm_rank;<br><br> MPI_Init(NULL, NULL);<br> MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);<br><br>
printf("rank = %d\t process ID = %d\n", comm_rank, getpid());<br><br> if (comm_rank == 0) {<br> sleep(600);<br> }<br><br> MPI_Barrier(MPI_COMM_WORLD);<br> MPI_Finalize();<br>}<br><br><br>Thank you much in advance.
<br><br>Regards,<br>Sudarshan<br>