[MPICH] MPICH2 suspend

Rusty Lusk lusk at mcs.anl.gov
Wed May 17 22:21:18 CDT 2006


All I can tell you is about mpd, not smpd.  In that case the only
suspension mechanism currently implemented is to issue a SIGSTOP signal
to the mpiexec process.  This signal is caught and results in a SIGSTOP
signal being sent to all the application processes.  Then the mpiexec
process (but not the mpd's or the manager processes) is suspended.  It
and the application processes can then be continued by sending a SIGCONT
signal to mpiexec.  The signals (which also include SIGKILL) can be send
via keyboard commands or any other mechanisms for delivering signals.



From: "Jason Crane" <jasonc at mrsc.ucsf.edu>
Subject: [MPICH] MPICH2 suspend
Date: Wed, 17 May 2006 15:50:17 -0700

> Hi,
> 
> The MPICH2 user's guide documentation (section 7.1) indicates that it is
> possible to suspend and continue MPICH2 jobs, at least under mpd.  I'm
> interested in trying this under smpd process management from Sun's Grid
> Engine and would like to know if there are any limitations or
> requirements for job suspension to work correctly without issuing a
> ctrl-z to the mpiexec process.  In particular, is it possible to suspend
> the processes on a single arbitrary node within the job, or is it
> necessary to signal all processes in the job simultaneously?
> 
> thanks for any help,
> -Jason
> 
> 




More information about the mpich-discuss mailing list