[mpich-discuss] Hydra SIGTSTP (Ctrl+Z) handling
Yauheni Zelenko
zelenko at cadence.com
Tue Nov 30 20:00:22 CST 2010
Unfortunately our reapplication is legacy one and MPI is only one of modes of functioning. Also SIGUSR1 is used for other purposes.
Why signals could not be trapped in Hydra without relaying to launchers (child processes if I'm correct) in default way? Then it's only matter of transferring signals to proxies via socket.
Eugene.
________________________________________
From: Pavan Balaji [balaji at mcs.anl.gov]
Sent: Tuesday, November 30, 2010 5:18 PM
To: mpich-discuss at mcs.anl.gov
Cc: Yauheni Zelenko
Subject: Re: [mpich-discuss] Hydra SIGTSTP (Ctrl+Z) handling
On 11/30/2010 12:31 PM, Yauheni Zelenko wrote:
> However same problem exists for rsh.
Yes -- as I mentioned, we cannot fix this for all bootstrap servers,
unless we change the design drastically in some way -- but it's not
clear what that way will be. See explanation below.
> As far as I know (but I may be mistaken) Hydra communicates with
> proxies (which launch actual application processes) via sockets. I
> think it (at least theoretically) possible to catch signal in Hydra
> and send related information via socket. After receiving proxy will
> relay signals to application processes. In this case signal handling
> will be independent of actual launcher.
Correct. In fact, after 1.3.1 was released, I did rework STDOUT/STDERR
to use the control socket without relying on the launcher. I'm currently
working on doing the same with STDIN. But that'll still leave us with
the problem with the launcher closing its sockets on a SIGTSTP signal --
we cannot stop this part. At the very least, all of the proxy debug and
error messages will get lost because of these closed sockets.
Is there any chance of explaining to the application developers that
catching SIGTSTP in an MPI application is a very bad idea and they
shouldn't be doing that?
-- Pavan
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list