[mpich-discuss] recovering from a communicator failure

Darius Buntinas buntinas at mcs.anl.gov
Wed Sep 22 10:53:25 CDT 2010


The MPI standard says that a process can only call MPI_Init once.  Unfortunately, this means that you can't do MPI_Init(); MPI_Finalize(); MPI_Init(), regardless of the thread.

We (MPICH2 and others at the MPI Forum) are working on allowing the app to recover from failures.  The idea would be that you would only get a "fatal" error when you _really_ can't go on.

-d

On Sep 20, 2010, at 10:21 AM, Hiatt, Dave M wrote:

> I am currently doing mpi_init in thread 1 (thread 0 is my master thread), on the main app process for what becomes node 0.  If I have a fatal MPI error, and can catch it, is terminating the initiating thread (thread 1) sufficient to allow me to execute a new MPI_Init and recover?
>  
> “People get held back by the voice inside em” – K’naan – In the Beginning
>  
> Dave M. Hiatt
> Director, Risk Analytics
> CitiMortgage
> 1000 Technology Drive
> O'Fallon, MO 63368-2240
>  
> Telephone:  636-261-1408
>  
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list