[MPICH] Error handler

Blankenship, David David.Blankenship at Kla-Tencor.com
Tue Apr 3 14:24:04 CDT 2007


I am new to MPICH, and I have a lot of questions about error handling,
but I will start with just one easy one.

I am up and running with MPICH and C++ on Red Hat Enterprise 4. I have a
fairly simple application where the master process divides the work and
sends it out to each of the workers. The workers do their part of the
work independently, and then the master assembles the results into a
report.

Eventually, I will want to be able handle failures in the worker
processes by resubmitting the work to another worker to try to get my
job complete. For now, I would like to just catch the error and report
the problem in my application output.

When I run the application and have one of my workers exit, it "caused
collective abort of all ranks." At this point, I replaced the default
error handler with ERRORS_THROW_EXCEPTIONS error handler, but I still
get the same results. My MPICH initialization looks like:

MPI::Init( argC, argV );
MPI::COMM_WORLD.Set_errhandler( MPI::ERRORS_THROW_EXCEPTIONS );

I have also tried:

MPI_Errhandler_set( MPI_COMM_WORLD, MPI::ERRORS_THROW_EXCEPTIONS ); 

with the same results.

All I want to do right now is to catch the error, add the error to my
results and exit cleanly. 

What might I be doing wrong here? (I suppose that I could be testing
this incorrectly.)
Is there a way to force MPICH to generate errors for testing?

Is there some documentation or articles about error handling with MPICH
that might answer some of my other questions?

Thanks,

David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070403/da44b53c/attachment.htm>


More information about the mpich-discuss mailing list