[mpich-discuss] Fault tolerance on collectives
Jed Brown
jedbrown at mcs.anl.gov
Tue Feb 28 06:54:05 CST 2012
On Tue, Feb 28, 2012 at 06:10, Anatoly G <anatolyrishon at gmail.com> wrote:
> Can you please answer on more specific questions:
>
> - May I get MPI_SUCCESS on collective operation, If this collective
> operation called after one of communicator processes failed. (Fail happened
> before operation call).
>
> MPI_Scan seems like a natural example. Low rank processes could complete
even if high rank processes failed without calling the collective.
Same with MPI_Reduce, non-root ranks could complete even though others were
down.
MPI_Allreduce is different because there is a data dependency, so all
processes would have to enter the call before any process can successfully
complete (although some could fail before it completes).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120228/efe49e3b/attachment.htm>
More information about the mpich-discuss
mailing list