[mpich-discuss] error event

Anthony Chan chan at mcs.anl.gov
Thu Jun 12 15:14:29 CDT 2008


Have you looked into creating your own MPI error handler ?
Instead of calling MPI_Abort(), you call MPI_Comm_call_errhandler().

http://www.mpi-forum.org/docs/mpi-20-html/node160.htm
http://www.mpi-forum.org/docs/mpi-11-html/node148.html

A.Chan



----- "Eugenio Chiavaccini" <Eugenio.Chiavaccini at cst.com> wrote:

> Hallo.
> 
> I´m dealing with some error event in MPI, using a c++ implementation.
> 
> In particular I would like to signal a sort of "error event" across
> the whole MPI cluster.
> For the moment, the only way I´ve found to report this emergency
> station is to call an MPI Abort function, but this is of course too
> rude for my purpose, as the whole mpi cluster is aborted without any
> additional control from the programmer side.
> 
> 
> 
> What I would like to do is to intercept a MPI error. This happens,
> just suppose, on one executable (say node 0). Than Node 0 catches it
> with an exception mechanism (and this is quite easy, just setting
> MPI_ERRORS_THROW_EXCEPTION standard handler, or setting an appropriate
> other one). And then Node 0 communicate the error also to the other
> executables and machines belonging to the cluster, so that they also
> reach the same emergency situation, possibly throwing the same
> MPI::Exception ..
> 
> 
> 
> Is anyone aware of possible strategies or solutions??
> 
> 
> 
> Suggestions are really welcome.
> 
> Thanks a lot
> 
> Eugenio




More information about the mpich-discuss mailing list