<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Error handler</TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16414" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=071273620-03042007><FONT face=Arial
color=#0000ff size=2>The current version of MPICH2 cannot recover from a
catastrophic error such as the death of a process because of a
segmentation fault. Simpler errors such as incorrect parameters to functions can
be caught.</FONT></SPAN><SPAN class=071273620-03042007> </SPAN><SPAN
class=071273620-03042007><FONT face=Arial color=#0000ff size=2>We plan to
support fault tolerance sometime in the future. </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=071273620-03042007><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=071273620-03042007><FONT face=Arial
color=#0000ff size=2>Rajeev</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> owner-mpich-discuss@mcs.anl.gov
[mailto:owner-mpich-discuss@mcs.anl.gov] <B>On Behalf Of </B>Blankenship,
David<BR><B>Sent:</B> Tuesday, April 03, 2007 2:24 PM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> [MPICH] Error
handler<BR></FONT><BR></DIV>
<DIV></DIV><!-- Converted from text/rtf format -->
<P><FONT face=Arial size=2>I am new to MPICH, and I have a lot of questions
about error handling, but I will start with just one easy one.</FONT> </P>
<P><FONT face=Arial size=2>I am up and running with MPICH and C++ on Red Hat
Enterprise 4. I have a fairly simple application where the master process
divides the work and sends it out to each of the workers. The workers do their
part of the work independently, and then the master assembles the results into
a report.</FONT></P>
<P><FONT face=Arial size=2>Eventually, I will want to be able handle failures
in the worker processes by resubmitting the work to another worker to try to
get my job complete. For now, I would like to just catch the error and report
the problem in my application output.</FONT></P>
<P><FONT face=Arial size=2>When I run the application and have one of my
workers exit, it "caused collective abort of all ranks." At this point, I
replaced the default error handler with ERRORS_THROW_EXCEPTIONS error handler,
but I still get the same results. My MPICH initialization looks
like:</FONT></P>
<P><FONT face=Arial size=2>MPI::Init( argC, argV );</FONT> <BR><FONT
face=Arial size=2>MPI::COMM_WORLD.Set_errhandler( MPI::ERRORS_THROW_EXCEPTIONS
);</FONT> </P>
<P><FONT face=Arial size=2>I have also tried:</FONT> </P>
<P><FONT face=Arial size=2>MPI_Errhandler_set( MPI_COMM_WORLD,
MPI::ERRORS_THROW_EXCEPTIONS ); </FONT></P>
<P><FONT face=Arial size=2>with the same results.</FONT> </P>
<P><FONT face=Arial size=2>All I want to do right now is to catch the error,
add the error to my results and exit cleanly. </FONT></P>
<P><FONT face=Arial size=2>What might I be doing wrong here? (I suppose that I
could be testing this incorrectly.)</FONT> <BR><FONT face=Arial size=2>Is
there a way to force MPICH to generate errors for testing?</FONT> </P>
<P><FONT face=Arial size=2>Is there some documentation or articles about error
handling with MPICH that might answer some of my other questions?</FONT> </P>
<P><FONT face=Arial size=2>Thanks,</FONT> </P>
<P><FONT face=Arial size=2>David</FONT> </P><BR></BLOCKQUOTE></BODY></HTML>