[mpich-discuss] If one process of Cluster crashes

Jayesh Krishna jayesh at mcs.anl.gov
Tue Oct 13 09:50:15 CDT 2009

 We are currently working on adding fault-tolerance to MPICH2. So in
couple of months we might have something that you can work with.
 On a side note, what kind of process crash do you see ? Is this an
application error (which you should fix anyway)? Is it due to an internal
MPICH2 error ? Please provide us more details.


From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of abhishek pandey
Sent: Tuesday, October 13, 2009 7:23 AM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] If one process of Cluster crashes


I am using MPICH2 on windows and sometime I face the problem of crashing
of one process in cluster. Is there any way to handle this ? I do not want
to start the cluster all over again.
As far as I know, if one process of cluster goes down anyhow then the
cluster also goes down. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20091013/534a326e/attachment.htm>

More information about the mpich-discuss mailing list