[mpich-discuss] 答复:Re: Can MPICH2 handle the fault that some processes die irregularly

ejoywx ejoywx at 163.com
Thu Jan 6 05:11:32 CST 2011


Thanks for Darius's idea. And I find in the blocking communiction, if a process failed, all others process will die.
Today I attemped to program in the nonblocking communiction. However in my test programs, if a process failed to die ,another process still was alive and able to do its business normally.
Thank mpich-discuss mailing list!


At 2011-01-06 03:40:32,"Darius Buntinas" <buntinas at mcs.anl.gov> wrote:

>
>The latest release has some support for tolerating communication failures, such as those due to a failed process, however it doesn't do a good job of detecting failed processes, so you can get a process that hangs in recv waiting for a message from a failed process.  We are working on improving detection of and tolerance to failed processes.  The next release should include many improvements.
>
>In addition to setting an error handler, you'll need to tell the process manager not to terminate the job when a process fails.  If you're using the hydra process manager (which is the default in the latest release), you can give the -disable-auto-cleanup option to mpiexec.
>
>-d
>
>On Jan 4, 2011, at 7:41 PM, ejoywx wrote:
>
>> Dear Sir,
>> 
>> Sorry to trouble you!
>> 
>> Maybe I am to ask this question. But for me, "Can MPICH2 handle the fault that some processes die irregularly" , it is very important: In our computer cluster, I find if a process dies in some node or a node is shutdown, all process of the cluster will die. We attempt to register a error handler to deal with such fault, unfortunately, We fail!
>> 
>> I admit that I do not know MPICH2, but I hope I am able to get help from you!  "Can MPICH2 handle the fault that some processes die irregularly?"
>> 
>> I look forward to receiving your e-mail.Thanks.
>> 
>> ejoywx
>> 2011-01-05
>> 
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>_______________________________________________
>mpich-discuss mailing list
>mpich-discuss at mcs.anl.gov
>https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110106/39f7bab2/attachment.htm>


More information about the mpich-discuss mailing list