[mpich-discuss] Questions on fault tolerance implementations of MPICH2-1.3.1

王睿 wangraying at gmail.com
Sun Dec 5 06:38:11 CST 2010


Hi,

I have some questions on the fault tolerance support of MPICH2-1.3.1

1> Can the newest version of MPICH2 detect a process failure? If so, how the
other processes get notified? (From a programmer's view)

2> Can MPICH2-1.3.1 support user-defined error handler? If not, how to do
some recovery work after a process failure?

3> If one process is killed, it will not affect other processes' Send/Recv,
but the MPI environment seems to wait the dead process. How to get the whole
job normally exited instead of using 'Ctrl+C'.

Best Regards,
-- 
Rui Wang
Institute of Computing Technology, CAS,  Beijing, P.R.China
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20101205/87b881ff/attachment.htm>


More information about the mpich-discuss mailing list