[mpich-discuss] Question/problem with MPI mvapich hydra.

Pavan Balaji balaji at mcs.anl.gov
Sat Oct 15 10:25:33 CDT 2011


On 10/15/2011 09:03 AM, Anatoly G wrote:
> The problem is, that I need at master size to detect which one of slaves
> failed, delete it from my distribution list, and continue to work with
> only live slaves. The questions are:
> 1) What I should do in order to recognize, which slave dead?

The signal handler that Darius mentioned should work. It's just that if 
you are using SIGUSR1, you cannot overwrite what is set by MPICH2. You 
need to chain them, i.e., override the signal handler, do whatever in 
your signal handler and then call the old signal handler once you are done.

> 2) How cat I get slave's fail status: some info about failure?

I'll let Darius answer this.

  -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list