[MPICH2-dev] Caused collective abort:

Amit H Kumar AHKumar at odu.edu
Tue Nov 27 10:55:54 CST 2007



Hi MPICH2,

I am having synchronization problem. Which is causing collective abort.

What I am trying to accomplish.
As shown below, basic idea is to have every process send some data to
lowered number process in an overlapped fashion. And I can only continue
with the computation after I have received the required data.   Data can be
large at times, for simplicity I am only showing a single valued buff in
the code.

I am using MPI_Irecv and MPI_Isend  as show below.

My Question is:
When using Non-blocking communication,  each sender initiates a "Send
start" and continues with computation and waits on a Non-Blocking receive
with MPI_Waitall as show below: Is it okay to have the sender Wait on both
of its MPI_Isend and MPI_Irecv initiated at different points in the code ?

I believe I am doing something wrong here,  because everything works okay
for a smaller problem size.

Any help or feedback will be of great help.

Thank you,
Amit


/////////////////CODE

for(i=NUMBER; i>0; i--)
 {
  printf("\nDoing Initialization ...\n");

 if(i==NUMBER){
      /*  DO COMPUTATION */
  } else{


        /*  Receive Data from the higher numberd process-rank , with
Process rank Zero receivng from Rank=Size-1 */

        err = MPI_Irecv(&buff,sizeof(double),
MPI_DOUBLE,((rank+1)%size),0,MPI_COMM_WORLD, &mrequest[1] );
        MPI_Waitall(2,mrequest,mstatus);

      /* After Receiving Continue with Computation*/

  }/*Else  End*/

  /*Communication Block*/
  if(i>1){
      /*  SEND DATA TO lowered number process-rank*/
        err = MPI_Isend(&buff,sizeof(double),
MPI_DOUBLE,((rank+(size-1))%size),0, MPI_COMM_WORLD, &mrequest[0]);
  }/*if(i>1)*/

 }/*End of Main For loop*/




More information about the mpich2-dev mailing list