[mpich-discuss] Fault Tolerant MPICH2 >= 1.3.2
Rob Stewart
R.Stewart at hw.ac.uk
Thu Sep 8 11:17:20 CDT 2011
Thanks Jayesh,
On 08/09/11 17:13, Jayesh Krishna wrote:
> Darius,
> Can you help him out ?
Jayesh described here:
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2011-January/008791.html
The notion that from 1.3.2 on Linux systems, node failure would not
result in the termination of an MPI job.
I have just compiled mpich2 with the intention of running a simple MPI
program on our 32 node cluster to test the behaviour I think Jayesh is
describing.
Do you have a simple unit test C file, or a simple example that I can
use to test the continuation of jobs in the face of node failure?
Regards,
--
Rob Stewart
Computer Science
Heriot Watt University
Edinburgh
T: 0131 4514196
E: rs46 at hw.ac.uk
More information about the mpich-discuss
mailing list