[mpich-discuss] MPI_Comm_Connect bug?
Biddiscombe, John A.
biddisco at cscs.ch
Fri Oct 15 04:34:56 CDT 2010
Dave
If I call MPI_Abort (where you suggest), then the app terminates. What I would like is just the current MPI_Comm_connect to abort and the program to continue with a polite -connection failed message. Rank 0 returns fine - how do I force the other ranks to exit and return?
Thanks
JB
-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Dave Goodell
Sent: 14 October 2010 20:20
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] MPI_Comm_Connect bug?
Where is your MPI_Abort call below? IMO it should come immediately after your H5FDdsmError call.
Now, that said, there may be a bug in our dynamic process implementation, especially w.r.t. error handling. I would be surprised if it was actually usable after an error occurs during connect.
-Dave
On Oct 14, 2010, at 1:11 PM CDT, Biddiscombe, John A. wrote:
> To try to catch a problem that occurs when MPI_Comm_connect fails, I wrapped the call with an error handler with the aim of gracefully exiting.
>
> rank 0, detects an error, aborts and displays the message. But other ranks hang waiting for something to happen. I think that when rank 0 aborts, it should first signal the other ranks to also abort.
>
> Am I doing it wrong, or is this a bug?
>
> thanks. snippet below
>
> JB
>
> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
> int error_code = MPI_Comm_connect(this->DsmMasterHostName, MPI_INFO_NULL, 0, this->Comm, &this->InterComm);
> if (error_code== MPI_SUCCESS) {
> H5FDdsmDebug("Id = " << this->Id << " MPI_Comm_connect returned SUCCESS");
> isConnected = H5FD_DSM_SUCCESS;
> } else {
> char error_string[1024];
> int length_of_error_string;
> MPI_Error_string(error_code, error_string, &length_of_error_string);
> H5FDdsmError("\nMPI_Comm_connect failed with error : \n" << error_string << "\n\n");
> }
> // reset to MPI_ERRORS_ARE_FATAL for normal debug purposes
> MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_ARE_FATAL);
>
>
> --
> John Biddiscombe, email:biddisco @ cscs.ch
> http://www.cscs.ch/
> CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
> Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list