[mpich-discuss] Hydra SIGTSTP (Ctrl+Z) handling

Yauheni Zelenko zelenko at cadence.com
Mon Nov 29 19:18:36 CST 2010


Hi!

Problem that our application (in non-MPI mode) supports SIGTSTP, so customers will expect such support from distributed versions too.

Is it possible to trap inside Hydra and propagate signals from Hydra to remote processes with other means? For example, socket?

Eugene.
________________________________________
From: Pavan Balaji [balaji at mcs.anl.gov]
Sent: Monday, November 29, 2010 5:14 PM
To: mpich-discuss at mcs.anl.gov
Cc: Yauheni Zelenko
Subject: Re: [mpich-discuss] Hydra SIGTSTP (Ctrl+Z) handling

On 11/29/2010 06:15 PM, Yauheni Zelenko wrote:
> Hydra have inconsistency in SIGTSTP handling (Ctrl-Z). It's works
> when processes started on same host as mpiexec and not working when
> mpiexec start processes remotely.

Not all signals are supported by Hydra. SIGTSTP is one of the
unsupported signals. The reason you are seeing inconsistent behavior is
because the signal handling depends on the launcher. When you launch
locally, the "fork" launcher is used, and when you launch remotely, the
"ssh" launcher is used. SSH goes all crazy when it sees an SIGTSTP and
that is outside Hydra's control (it cannot stop other processes from
calling a signal handler).

If you are looking to checkpoint the running MPI application, you
should: (1) configure MPICH2 with checkpointing support, and (2) send
the SIGUSR1 signal to mpiexec (or give the -ckpoint-interval option to
mpiexec).

  -- Pavan

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list