[mpich-discuss] error event

Eugenio Chiavaccini Eugenio.Chiavaccini at cst.com
Thu Jun 12 04:48:06 CDT 2008


Hallo.

I´m dealing with some error event in MPI, using a c++ implementation.

In particular I would like to signal a sort of "error event" across the whole MPI cluster.
For the moment, the only way I´ve found to report this emergency station is to call an MPI Abort function, but this is of course too rude for my purpose, as the whole mpi cluster is aborted without any additional control from the programmer side.



What I would like to do is to intercept a MPI error. This happens, just suppose, on one executable (say node 0). Than Node 0 catches it with an exception mechanism (and this is quite easy, just setting MPI_ERRORS_THROW_EXCEPTION standard handler, or setting an appropriate other one). And then Node 0 communicate the error also to the other executables and machines belonging to the cluster, so that they also reach the same emergency situation, possibly throwing the same MPI::Exception ..



Is anyone aware of possible strategies or solutions??



Suggestions are really welcome.

Thanks a lot

Eugenio

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080612/242daaa0/attachment.htm>


More information about the mpich-discuss mailing list