[mpich-discuss] error event
Anthony Chan
chan at mcs.anl.gov
Thu Jun 12 15:14:29 CDT 2008
Have you looked into creating your own MPI error handler ?
Instead of calling MPI_Abort(), you call MPI_Comm_call_errhandler().
http://www.mpi-forum.org/docs/mpi-20-html/node160.htm
http://www.mpi-forum.org/docs/mpi-11-html/node148.html
A.Chan
----- "Eugenio Chiavaccini" <Eugenio.Chiavaccini at cst.com> wrote:
> Hallo.
>
> I´m dealing with some error event in MPI, using a c++ implementation.
>
> In particular I would like to signal a sort of "error event" across
> the whole MPI cluster.
> For the moment, the only way I´ve found to report this emergency
> station is to call an MPI Abort function, but this is of course too
> rude for my purpose, as the whole mpi cluster is aborted without any
> additional control from the programmer side.
>
>
>
> What I would like to do is to intercept a MPI error. This happens,
> just suppose, on one executable (say node 0). Than Node 0 catches it
> with an exception mechanism (and this is quite easy, just setting
> MPI_ERRORS_THROW_EXCEPTION standard handler, or setting an appropriate
> other one). And then Node 0 communicate the error also to the other
> executables and machines belonging to the cluster, so that they also
> reach the same emergency situation, possibly throwing the same
> MPI::Exception ..
>
>
>
> Is anyone aware of possible strategies or solutions??
>
>
>
> Suggestions are really welcome.
>
> Thanks a lot
>
> Eugenio
More information about the mpich-discuss
mailing list