[mpich2-commits] r7799 - mpich2/trunk
balaji at mcs.anl.gov
balaji at mcs.anl.gov
Thu Jan 20 22:16:00 CST 2011
Author: balaji
Date: 2011-01-20 22:16:00 -0600 (Thu, 20 Jan 2011)
New Revision: 7799
Modified:
mpich2/trunk/README.vin
Log:
Added a note in the README about the MPICH_ATTR_FAILED_PROCESSES being
only available on MPI_COMM_WORLD.
Modified: mpich2/trunk/README.vin
===================================================================
--- mpich2/trunk/README.vin 2011-01-20 23:47:13 UTC (rev 7798)
+++ mpich2/trunk/README.vin 2011-01-21 04:16:00 UTC (rev 7799)
@@ -768,10 +768,10 @@
returning MPI_SUCCESS on a given process means that the part of the
collective performed by that process has been successful.
- MPICH2 release specific note: There is currently a bug in MPICH2
- that might cause the collective operation to return
- MPI_SUCCESS on a process even if data is corrupted on that
- process.
+ MPICH2 release specific note: There is currently a bug in
+ MPICH2 that might cause the collective operation to return
+ MPI_SUCCESS on a process even if data is corrupted on that
+ process.
- PROCESS MANAGER: If used with the hydra process manager, hydra will
detect failed processes and notify the MPICH2 library. Users can
@@ -780,14 +780,18 @@
The attribute value is an integer array containing the ranks of the
failed processes. The array is terminated by MPI_PROC_NULL.
- MPICH2 release specific note: The user needs to declare the
- following extern within the application in order to use the
- attribute (this ideally should be added to mpi.h, but has not
- been done so, to preserve ABI compatibility in the 1.3.x
- release series):
+ MPICH2 release specific note: The user needs to declare the
+ following extern within the application in order to use the
+ attribute (this ideally should be added to mpi.h, but has not
+ been done so, to preserve ABI compatibility in the 1.3.x
+ release series):
extern int MPICH_ATTR_FAILED_PROCESSES;
+ MPICH2 release specific note: The MPICH_ATTR_FAILED_PROCESSES
+ attribute is currently only defined on MPI_COMM_WORLD, but not
+ on other communicators.
+
Note that hydra by default will abort the entire application when
any process terminates before calling MPI_Finalize. In order to
allow an application to continue running despite failed processes,
@@ -805,10 +809,10 @@
signal handler.
In future releases, the plan is to provide a call such as
- MPIX_Failure_notification that will allow the user to register a
- callback function that will be called on process failures. This
- mechanism has not been added yet to preserve ABI compatibility in
- the 1.3.x release series.
+ MPIX_Failure_notify that will allow the user to register a callback
+ function that will be called on process failures. This mechanism
+ has not been added yet to preserve ABI compatibility in the 1.3.x
+ release series.
Checkpoint and Restart
More information about the mpich2-commits
mailing list