[mpich-discuss] Process termination in Hydra

Jain, Rohit Rohit_Jain at mentor.com
Fri Mar 11 01:36:34 CST 2011


Hi Nicolas, 
Please see my responses below:


> How well Hydra handles abrupt termination of processes and its spawned
> sub-processes?

I wouldn't know, but in my experience with my own programs it has
always seemed pretty robust. In particular I mostly mean MPI C/C++
apps with no custom signal handlers and no subprocesses of their own
(of course Hydra does both; what I meant is: no signal() or fork()
calls in my own code).

Under such scenarios, most abrupt deaths are either due to segfaults
or to me hitting ctrl-c. Neither seems to leave zombie processes
behind, nor any live orphaned subprocesses. The few times I've seen
any orphans were always riskier cases where my own code was spawning
some (not via MPI calls, but by means of my own --possibly somewhat
clumsy-- forks/execs/etc).

Rohit: my application forks sub-processes. While I see that parent application process quits, but forked sub-processed are still left around. Another example is using valgrind (memcheck spawned process). I often see that sub-process are left around if I kill application in between. That makes me think that Hydra isn't cleaning sub-processes properly.




> There could be various conditions process may be terminating:

Are you aware that those are all flavors of the same concept? (signals!)

Rohit: Yes, they are. But, they could possibly be handled and trapped in a different way by different tools. Not sure how Hydra deals with them. For example, Hydra may be handling Ctrl-c better, but not other forms.


> -          Ctrl-c
http://en.wikipedia.org/wiki/SIGINT
> -          Segv
http://en.wikipedia.org/wiki/SIGSEGV
> -          Kill -9
http://en.wikipedia.org/wiki/SIGKILL


A few tips, possibly obvious but just in case:

* The KILL signal (see man 3 signal) is not the same thing as the kill
program (man 1 kill). You can use the kill program to send any signal,
not just that one.

* The KILL signal (aka "kill -9") should not be used light-heartedly,
but as a last resource if nothing else works. Try something milder
first, like INTerrupt (aka Ctrl-c), or TERMinate (aka "kill -TERM");
these give the target process a chance to clean up before dying, while
SIGKILL never does.

Rohit: I understand that. I just gave examples of possibly termination of an application that Hydra may be handling. 'kill -9' being the hardcore kill, won't leave chance to trap and take action on it.



More information about the mpich-discuss mailing list