[mpich-discuss] Fault Tolerant MPICH2 >= 1.3.2
    Rob Stewart 
    R.Stewart at hw.ac.uk
       
    Thu Sep  8 11:17:20 CDT 2011
    
    
  
Thanks Jayesh,
On 08/09/11 17:13, Jayesh Krishna wrote:
> Darius,
>   Can you help him out ?
Jayesh described here:
http://lists.mcs.anl.gov/pipermail/mpich-discuss/2011-January/008791.html
The notion that from 1.3.2 on Linux systems, node failure would not 
result in the termination of an MPI job.
I have just compiled mpich2 with the intention of running a simple MPI 
program on our 32 node cluster to test the behaviour I think Jayesh is 
describing.
Do you have a simple unit test C file, or a simple example that I can 
use to test the continuation of jobs in the face of node failure?
Regards,
-- 
Rob Stewart
Computer Science
Heriot Watt University
Edinburgh
T: 0131 4514196
E: rs46 at hw.ac.uk
    
    
More information about the mpich-discuss
mailing list