[mpich-discuss] Questions on fault tolerance implementations of MPICH2-1.3.1
王睿
wangraying at gmail.com
Sun Dec 5 06:38:11 CST 2010
Hi,
I have some questions on the fault tolerance support of MPICH2-1.3.1
1> Can the newest version of MPICH2 detect a process failure? If so, how the
other processes get notified? (From a programmer's view)
2> Can MPICH2-1.3.1 support user-defined error handler? If not, how to do
some recovery work after a process failure?
3> If one process is killed, it will not affect other processes' Send/Recv,
but the MPI environment seems to wait the dead process. How to get the whole
job normally exited instead of using 'Ctrl+C'.
Best Regards,
--
Rui Wang
Institute of Computing Technology, CAS, Beijing, P.R.China
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20101205/87b881ff/attachment.htm>
More information about the mpich-discuss
mailing list