[mpich-discuss] Process termination in Hydra

Thu Mar 10 21:14:12 CST 2011

Hi Rohit, just saw your repost on this.

I'd rather not guess about Hydra's signal handling (know too little to
be of help).

However, in much more general terms, and just in case it helps, I can
tell you this:

> How well Hydra handles abrupt termination of processes and its spawned
> sub-processes?

I wouldn't know, but in my experience with my own programs it has
always seemed pretty robust. In particular I mostly mean MPI C/C++
apps with no custom signal handlers and no subprocesses of their own
(of course Hydra does both; what I meant is: no signal() or fork()
calls in my own code).

Under such scenarios, most abrupt deaths are either due to segfaults
or to me hitting ctrl-c. Neither seems to leave zombie processes
behind, nor any live orphaned subprocesses. The few times I've seen
any orphans were always riskier cases where my own code was spawning
some (not via MPI calls, but by means of my own --possibly somewhat
clumsy-- forks/execs/etc).

> There could be various conditions process may be terminating:

Are you aware that those are all flavors of the same concept? (signals!)

> -          Ctrl-c
http://en.wikipedia.org/wiki/SIGINT
> -          Segv
http://en.wikipedia.org/wiki/SIGSEGV
> -          Kill -9
http://en.wikipedia.org/wiki/SIGKILL

A few tips, possibly obvious but just in case:

* The KILL signal (see man 3 signal) is not the same thing as the kill
program (man 1 kill). You can use the kill program to send any signal,
not just that one.

* The KILL signal (aka "kill -9") should not be used light-heartedly,
but as a last resource if nothing else works. Try something milder
first, like INTerrupt (aka Ctrl-c), or TERMinate (aka "kill -TERM");
these give the target process a chance to clean up before dying, while
SIGKILL never does.

Cheers
Nicolás