[MPICH] process failure on a node

SUNDAR J jsundar at iitk.ac.in
Thu Mar 9 21:55:22 CST 2006


Suppose I am running a program in which process 0 sends message to all other
processes. and if suppose anyone of the node fails to respond properly
(say it
is switched off , or it hangs at runtime ) then is there anyway to get
around the problem. is there a way for the main process to detect that the
message it has send to one of the process has failed and redirect it to
some other process. What if the main process itself fails. will bring down
the whole program crashing.




More information about the mpich-discuss mailing list