[MPICH] MPI_Barrier on ch3:shm and sched_yield
Sudarshan Raghunathan
rdarshan at gmail.com
Mon Feb 26 11:42:03 CST 2007
Dear all,
I have a MPI application where one of the ranks enters a barrier a lot later
than the others. When using the ch3:shm device, I noticed that the other N-1
ranks chew up all the CPU and seem to be calling sched_yield. When I start
up another CPU-intensive application, the N-1 ranks waiting on the barrier
do not seem to yield to the application that actually requires the CPU. Is
there anything I can do to force the waiting processes to yield CPU (isn't
this the whole point of sched_yield?)? Would it work if I reimplemented
MPI_Barrier to not use MPI_Sendrecv (it would still have to call MPI_Wait
somewhere though..)
I am able to reproduce the problem with MPICH 2-1.0.5p3 (compiled with the
Intel C compiler version 9.1.045) running on a 4 way Opteron SMP machine
(uname -a shows: Linux gezora4 2.6.5-7.283-smp #1 SMP Wed Nov 29 16:55:53
UTC 2006 x86_64 x86_64 x86_64 GNU/Linux). The following simple program
seems to trigger the problem:
#include <stdio.h>
#include <unistd.h>
#include <mpi.h>
int main(void)
{
int comm_rank;
MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
printf("rank = %d\t process ID = %d\n", comm_rank, getpid());
if (comm_rank == 0) {
sleep(600);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
Thank you much in advance.
Regards,
Sudarshan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070226/65c004ca/attachment.htm>
More information about the mpich-discuss
mailing list