[MPICH] MPICH2 suspend

Jason Crane jasonc at mrsc.ucsf.edu
Thu May 18 01:33:06 CDT 2006


Hi,  Thanks for the information.  Is the behavior defined if only a 
single application process is signaled directly (SIGSTOP, SIGCONT), 
without signaling the other application processes or the mpiexec? 
Thanks, -Jason

the On Wed, 17 
May 2006, Rusty Lusk wrote:

> All I can tell you is about mpd, not smpd.  In that case the only
> suspension mechanism currently implemented is to issue a SIGSTOP signal
> to the mpiexec process.  This signal is caught and results in a SIGSTOP
> signal being sent to all the application processes.  Then the mpiexec
> process (but not the mpd's or the manager processes) is suspended.  It
> and the application processes can then be continued by sending a SIGCONT
> signal to mpiexec.  The signals (which also include SIGKILL) can be send
> via keyboard commands or any other mechanisms for delivering signals.
>
>
>
> From: "Jason Crane" <jasonc at mrsc.ucsf.edu>
> Subject: [MPICH] MPICH2 suspend
> Date: Wed, 17 May 2006 15:50:17 -0700
>
>> Hi,
>>
>> The MPICH2 user's guide documentation (section 7.1) indicates that it is
>> possible to suspend and continue MPICH2 jobs, at least under mpd.  I'm
>> interested in trying this under smpd process management from Sun's Grid
>> Engine and would like to know if there are any limitations or
>> requirements for job suspension to work correctly without issuing a
>> ctrl-z to the mpiexec process.  In particular, is it possible to suspend
>> the processes on a single arbitrary node within the job, or is it
>> necessary to signal all processes in the job simultaneously?
>>
>> thanks for any help,
>> -Jason
>>
>>
>




More information about the mpich-discuss mailing list