[mpich-discuss] MPI_Comm_Connect bug?

Biddiscombe, John A. biddisco at cscs.ch
Thu Oct 14 13:11:05 CDT 2010


To try to catch a problem that occurs when MPI_Comm_connect fails, I wrapped the call with an error handler with the aim of gracefully exiting.

rank 0, detects an error, aborts and displays the message. But other ranks hang waiting for something to happen. I think that when rank 0 aborts, it should first signal the other ranks to also abort. 

Am I doing it wrong, or is this a bug?

thanks. snippet below

JB

  MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
  int error_code = MPI_Comm_connect(this->DsmMasterHostName, MPI_INFO_NULL, 0, this->Comm, &this->InterComm);
  if (error_code== MPI_SUCCESS) {
    H5FDdsmDebug("Id = " << this->Id << " MPI_Comm_connect returned SUCCESS");
    isConnected = H5FD_DSM_SUCCESS;
  } else {
   char error_string[1024];
   int length_of_error_string;
   MPI_Error_string(error_code, error_string, &length_of_error_string);
   H5FDdsmError("\nMPI_Comm_connect failed with error : \n" << error_string << "\n\n");
  }
  // reset to MPI_ERRORS_ARE_FATAL for normal debug purposes
  MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);


-- 
John Biddiscombe,                            email:biddisco @ cscs.ch
http://www.cscs.ch/
CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland      | Fax:  +41 (91) 610.82.82




More information about the mpich-discuss mailing list