[mpich-discuss] Question/problem with MPI mvapich hydra.
Pavan Balaji
balaji at mcs.anl.gov
Sat Oct 15 10:25:33 CDT 2011
On 10/15/2011 09:03 AM, Anatoly G wrote:
> The problem is, that I need at master size to detect which one of slaves
> failed, delete it from my distribution list, and continue to work with
> only live slaves. The questions are:
> 1) What I should do in order to recognize, which slave dead?
The signal handler that Darius mentioned should work. It's just that if
you are using SIGUSR1, you cannot overwrite what is set by MPICH2. You
need to chain them, i.e., override the signal handler, do whatever in
your signal handler and then call the old signal handler once you are done.
> 2) How cat I get slave's fail status: some info about failure?
I'll let Darius answer this.
-- Pavan
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list