[MPICH] MPICH2 suspend

Darius Buntinas buntinas at mcs.anl.gov
Thu May 18 03:54:23 CDT 2006


I believe that if you want to signal a single process of a job, you can
simply send that process a signal.  I don't think there are any signals
being caught in the MPI process itself (the manager may be catching
signals but not the process).  However, MPD only provides mechanisms for
signalling all processes of a job (by signalling mpiexec, or using
mpdsigjob), not for signalling individual processes.

-d

On Wed, 17 May 2006, Jason Crane wrote:

> Hi,  Thanks for the information.  Is the behavior defined if only a
> single application process is signaled directly (SIGSTOP, SIGCONT),
> without signaling the other application processes or the mpiexec?
> Thanks, -Jason
>
> the On Wed, 17
> May 2006, Rusty Lusk wrote:
>
> > All I can tell you is about mpd, not smpd.  In that case the only
> > suspension mechanism currently implemented is to issue a SIGSTOP signal
> > to the mpiexec process.  This signal is caught and results in a SIGSTOP
> > signal being sent to all the application processes.  Then the mpiexec
> > process (but not the mpd's or the manager processes) is suspended.  It
> > and the application processes can then be continued by sending a SIGCONT
> > signal to mpiexec.  The signals (which also include SIGKILL) can be send
> > via keyboard commands or any other mechanisms for delivering signals.
> >
> >
> >
> > From: "Jason Crane" <jasonc at mrsc.ucsf.edu>
> > Subject: [MPICH] MPICH2 suspend
> > Date: Wed, 17 May 2006 15:50:17 -0700
> >
> >> Hi,
> >>
> >> The MPICH2 user's guide documentation (section 7.1) indicates that it is
> >> possible to suspend and continue MPICH2 jobs, at least under mpd.  I'm
> >> interested in trying this under smpd process management from Sun's Grid
> >> Engine and would like to know if there are any limitations or
> >> requirements for job suspension to work correctly without issuing a
> >> ctrl-z to the mpiexec process.  In particular, is it possible to suspend
> >> the processes on a single arbitrary node within the job, or is it
> >> necessary to signal all processes in the job simultaneously?
> >>
> >> thanks for any help,
> >> -Jason
> >>
> >>
> >
>
>




More information about the mpich-discuss mailing list