[mpich-discuss] MPI_Comm_Connect bug?

Biddiscombe, John A. biddisco at cscs.ch
Fri Oct 15 04:34:56 CDT 2010


Dave

If I call MPI_Abort (where you suggest), then the app terminates. What I would like is just the current MPI_Comm_connect to abort and the program to continue with a polite -connection failed message. Rank 0 returns fine - how do I force the other ranks to exit and return?

Thanks

JB
-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
Sent: 14 October 2010 20:20
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_Connect bug?

Where is your MPI_Abort call below?  IMO it should come immediately after your H5FDdsmError call.

Now, that said, there may be a bug in our dynamic process implementation, especially w.r.t. error handling.  I would be surprised if it was actually usable after an error occurs during connect.

-Dave

On Oct 14, 2010, at 1:11 PM CDT, Biddiscombe, John A. wrote:

> To try to catch a problem that occurs when MPI_Comm_connect fails, I wrapped the call with an error handler with the aim of gracefully exiting.
> 
> rank 0, detects an error, aborts and displays the message. But other ranks hang waiting for something to happen. I think that when rank 0 aborts, it should first signal the other ranks to also abort. 
> 
> Am I doing it wrong, or is this a bug?
> 
> thanks. snippet below
> 
> JB
> 
>  MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>  int error_code = MPI_Comm_connect(this->DsmMasterHostName, MPI_INFO_NULL, 0, this->Comm, &this->InterComm);
>  if (error_code== MPI_SUCCESS) {
>    H5FDdsmDebug("Id = " << this->Id << " MPI_Comm_connect returned SUCCESS");
>    isConnected = H5FD_DSM_SUCCESS;
>  } else {
>   char error_string[1024];
>   int length_of_error_string;
>   MPI_Error_string(error_code, error_string, &length_of_error_string);
>   H5FDdsmError("\nMPI_Comm_connect failed with error : \n" << error_string << "\n\n");
>  }
>  // reset to MPI_ERRORS_ARE_FATAL for normal debug purposes
>  MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
> 
> 
> -- 
> John Biddiscombe,                            email:biddisco @ cscs.ch
> http://www.cscs.ch/
> CSCS, Swiss National Supercomputing Centre  | Tel:  +41 (91) 610.82.07
> Via Cantonale, 6928 Manno, Switzerland      | Fax:  +41 (91) 610.82.82
> 
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list