[mpich-discuss] questions about -disable-auto-cleanup
Darius Buntinas
buntinas at mcs.anl.gov
Tue Jan 10 11:26:41 CST 2012
Hi Bin,
Currently fault tolerance only works with nemesis using the tcp netmod. Notification of failed spawned processes isn't supported in 1.4.1. The MPICH_ATTR_FAILED_PROCESSES attribute is only defined on MPI_COMM_WORLD. I'm not sure what would happen if a spawned process fails. It may mistakenly be reported in MPI_COMM_WORLD.
The svn trunk has the failed process query mechanisms from the MPI 3 fault-tolerance proposal which allows one to query for failed processes in a particular communicator. Unfortunately, notification of failed spawned processes aren't supported there either yet. Though, it's high on our list of things to implement.
-d
On Jan 9, 2012, at 8:10 PM, Bin Jia wrote:
> Hi,
>
> I am doing experiments with mpich2-1.4.1 fault tolerance. It seems that -disable-auto-cleanup does not work when configured with --with-device=ch3:sock. Is it not supported or am I doing something wrong?
> And, how to query which process created by MPI_Comm_spawn failed? Is query of MPICH_ATTR_FAILED_PROCESSES only possible to MPI_COMM_WORLD?
>
> Thanks
> - Bin
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list