<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18812"></HEAD>
<BODY>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN
class=232593016-13102009>Hi,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN
class=232593016-13102009> Did you try using the MPI error handlers
(MPI_Comm_create_errhandler() / MPI_ERRORS_RETURN)?</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN
class=232593016-13102009></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN
class=232593016-13102009>Regards,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN
class=232593016-13102009>Jayesh</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN
class=232593016-13102009></SPAN></FONT> </DIV>
<DIV dir=ltr align=left>
<HR tabIndex=-1>
</DIV>
<DIV dir=ltr align=left><FONT size=2 face=Tahoma><B>From:</B> abhishek pandey
[mailto:hipandey@gmail.com] <BR><B>Sent:</B> Tuesday, October 13, 2009 11:02
AM<BR><B>To:</B> Jayesh Krishna<BR><B>Subject:</B> Re: [mpich-discuss] If one
process of Cluster crashes<BR></FONT><BR></DIV>
<DIV></DIV>Hi Jayesh,<BR><BR>Thanks for reply.<BR><BR>This is an
application/network error. I am running several instances of my application on
different machines for very long time. So there is possibility of either crash
of one process or loss of network connectivity to any machine. In this
case, the cluster would goes down for now. But I want to ensure the other
processes should be running irrespective of one or more process
failure.<BR><BR>Is there any way, I can handle this situation ?
<BR><BR>Thanks,<BR>Abhishek<BR><BR>
<DIV class=gmail_quote>On Tue, Oct 13, 2009 at 8:20 PM, Jayesh Krishna <SPAN
dir=ltr><<A
href="mailto:jayesh@mcs.anl.gov">jayesh@mcs.anl.gov</A>></SPAN> wrote:<BR>
<BLOCKQUOTE
style="BORDER-LEFT: rgb(204,204,204) 1px solid; MARGIN: 0pt 0pt 0pt 0.8ex; PADDING-LEFT: 1ex"
class=gmail_quote>
<DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2
face=Arial><SPAN>Hi,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN> We
are currently working on adding fault-tolerance to MPICH2. So in couple of
months we might have something that you can work with.</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2 face=Arial><SPAN> On a
side note, what kind of process crash do you see ? Is this an application
error (which you should fix anyway)? Is it due to an internal MPICH2 error ?
Please provide us more details.</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2
face=Arial><SPAN></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2
face=Arial><SPAN>Regards,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT color=#0000ff size=2
face=Arial><SPAN>Jayesh</SPAN></FONT></DIV><BR>
<DIV dir=ltr lang=en-us align=left>
<HR>
<FONT size=2 face=Tahoma><B>From:</B> <A
href="mailto:mpich-discuss-bounces@mcs.anl.gov"
target=_blank>mpich-discuss-bounces@mcs.anl.gov</A> [mailto:<A
href="mailto:mpich-discuss-bounces@mcs.anl.gov"
target=_blank>mpich-discuss-bounces@mcs.anl.gov</A>] <B>On Behalf Of
</B>abhishek pandey<BR><B>Sent:</B> Tuesday, October 13, 2009 7:23
AM<BR><B>To:</B> <A href="mailto:mpich-discuss@mcs.anl.gov"
target=_blank>mpich-discuss@mcs.anl.gov</A><BR><B>Subject:</B> [mpich-discuss]
If one process of Cluster crashes<BR></FONT><BR></DIV>
<DIV>
<DIV></DIV>
<DIV class=h5>
<DIV></DIV>Hi,<BR><BR>I am using MPICH2 on windows and sometime I face the
problem of crashing of one process in cluster. Is there any way to handle this
? I do not want to start the cluster all over again.<BR>As far as I know, if
one process of cluster goes down anyhow then the cluster also goes down.
<BR><BR><BR>Thanks,<BR>Abhishek.<BR></DIV></DIV></DIV></BLOCKQUOTE></DIV><BR></BODY></HTML>