[mpich-discuss] disable-auto-cleanup send/receive example

Darius Buntinas buntinas at mcs.anl.gov
Thu Nov 3 08:13:14 CDT 2011


The MPICH library will detect and handle the failed process only when a communication operation is being performed, so Iprobe is just there to give the library an opportunity to do that.  The sleep is there to make sure the process manager has an opportunity to notify the library.  You might be able to get rid of the sleep, but it's all depends on the timing of events.

-d


On Nov 2, 2011, at 8:14 PM, Stewart, Robert wrote:

> On 11/02/2011 07:24 PM, Pavan Balaji wrote:
> 
> Hi Darius, Pavan,
> 
> > In your program, if I added a
> >     sleep(1);
> >     MPI_Iprobe(0, 0, MPI_COMM_WORLD, &flag, MPI_STATUS_IGNORE);
> > in the error case of the receive, it did run as you expected. 
> > There's still a race, it's just less likely to hit in your example.
> 
> You're right, that did seem to work. May I ask what the need for the `sleep(1)' is? I'm trying now with another architecture - passing a message round a ring, that is slightly more unpredictable than my previous master/slace example.
> 
> Your suggestion does work for that too, but a lot more slowly once a remote process was killed. Is there a possibility to avoid the sleep(1) ? And can you explain for me why the MPI_Iprobe has any effect on fault tolerant behaviour. It does, as you have shown. But I don't know why.
> 
> Pavan,
> 
> > You cannot just "refresh" MPI_COMM_WORLD by resizing it, as the process
> > ranks will be completely messed up if you do that. What you really want
> > to do is create a new communicator with the remaining "alive" processes.
> 
> That's interesting, this is something I'd like to try. I can't find any compilable examples that make use of MPIX_Group_comm_create, would you be able to provide on? I tried, but wasn't sure how to create/manipulate the MPI_Group, for the 2nd parameter. Are there any code examples?
> 
> 
> thanks,
> 
> --
> Rob Stewart
> 
> 
> 
> <hw_uni_of_year.jpg>   Heriot-Watt University is the Sunday Times
>   Scottish University of the Year 2011-2012 
> 
>   Heriot-Watt University is a Scottish charity
>   registered under charity number SC000278. 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list