Hi, <br><br>I have some questions on the fault tolerance support of MPICH2-1.3.1<br><br>1> Can the newest version of MPICH2 detect a process failure? If so, how the other processes get notified? (From a programmer's view)<br>
<br>2> Can MPICH2-1.3.1 support user-defined error handler? If not, how to do some recovery work after a process failure?<br><br>3> If one process is killed, it will not affect other processes' Send/Recv, but the MPI environment seems to wait the dead process. How to get the whole job normally exited instead of using 'Ctrl+C'.<br>
<br>Best Regards,<br>-- <br>Rui Wang<br>Institute of Computing Technology, CAS, Beijing, P.R.China<br>