[mpich-discuss] questions about -disable-auto-cleanup

Darius Buntinas buntinas at mcs.anl.gov
Tue Jan 10 11:26:41 CST 2012


Hi Bin,

Currently fault tolerance only works with nemesis using the tcp netmod.  Notification of failed spawned processes isn't supported in 1.4.1.  The MPICH_ATTR_FAILED_PROCESSES attribute is only defined on MPI_COMM_WORLD.  I'm not sure what would happen if a spawned process fails.  It may mistakenly be reported in MPI_COMM_WORLD.

The svn trunk has the failed process query mechanisms from the MPI 3 fault-tolerance proposal which allows one to query for failed processes in a particular communicator.  Unfortunately, notification of failed spawned processes aren't supported there either yet.  Though, it's high on our list of things to implement.

-d


On Jan 9, 2012, at 8:10 PM, Bin Jia wrote:

> Hi,
> 
> I am doing experiments with mpich2-1.4.1 fault tolerance. It seems that -disable-auto-cleanup does not work when configured with --with-device=ch3:sock. Is it not supported or am I doing something wrong? 
> And, how to query which process created by MPI_Comm_spawn failed? Is query of MPICH_ATTR_FAILED_PROCESSES only possible to MPI_COMM_WORLD? 
> 
> Thanks
> - Bin
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list