[MPICH] MPI_Barrier on ch3:shm and sched_yield

Rajeev Thakur thakur at mcs.anl.gov
Mon Feb 26 14:58:57 CST 2007


Sudarshan,
                 How many MPI processes are you running on the 4-way SMP and
how many processes of the other application? Does the other application get
any CPU at all? sched_yield just gives up the current time slice for the
process and moves it to the end of the run queue. It will get run again when
its turn comes. In other words, the process doesn't sleep until some event
happens. It shares the CPU with other processes. 
 
Rajeev
 
 


  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Sudarshan Raghunathan
Sent: Monday, February 26, 2007 11:42 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] MPI_Barrier on ch3:shm and sched_yield


Dear all,

I have a MPI application where one of the ranks enters a barrier a lot later
than the others. When using the ch3:shm device, I noticed that the other N-1
ranks chew up all the CPU and seem to be calling sched_yield. When I start
up another CPU-intensive application, the N-1 ranks waiting on the barrier
do not seem to yield to the application that actually requires the CPU. Is
there anything I can do to force the waiting processes to yield CPU (isn't
this the whole point of sched_yield?)? Would it work if I reimplemented
MPI_Barrier to not use MPI_Sendrecv (it would still have to call MPI_Wait
somewhere though..) 

I am able to reproduce the problem with MPICH 2-1.0.5p3 (compiled with the
Intel C compiler version 9.1.045) running on a 4 way Opteron SMP machine
(uname -a shows: Linux gezora4 2.6.5-7.283-smp #1 SMP Wed Nov 29 16:55:53
UTC 2006 x86_64 x86_64 x86_64 GNU/Linux). The following simple  program
seems to trigger the problem: 
#include <stdio.h>
#include <unistd.h>
#include <mpi.h>

int main(void)
{
  int comm_rank;

  MPI_Init(NULL, NULL);
  MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);

  printf("rank = %d\t process ID = %d\n", comm_rank, getpid());

  if (comm_rank == 0) {
    sleep(600);
  }

  MPI_Barrier(MPI_COMM_WORLD);
  MPI_Finalize();
}


Thank you much in advance. 

Regards,
Sudarshan


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070226/9be8c770/attachment.htm>


More information about the mpich-discuss mailing list