[mpich-discuss] Fault tolerance on collectives

Jed Brown jedbrown at mcs.anl.gov
Tue Feb 28 06:54:05 CST 2012


On Tue, Feb 28, 2012 at 06:10, Anatoly G <anatolyrishon at gmail.com> wrote:

> Can you please answer on more specific questions:
>
>    - May I get MPI_SUCCESS on collective operation,  If this collective
>    operation called after one of communicator processes failed. (Fail happened
>    before operation call).
>
> MPI_Scan seems like a natural example. Low rank processes could complete
even if high rank processes failed without calling the collective.

Same with MPI_Reduce, non-root ranks could complete even though others were
down.

MPI_Allreduce is different because there is a data dependency, so all
processes would have to enter the call before any process can successfully
complete (although some could fail before it completes).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120228/efe49e3b/attachment.htm>


More information about the mpich-discuss mailing list