[MPICH] MPICH2 suspend

Reuti reuti at staff.uni-marburg.de
Thu May 18 01:04:54 CDT 2006


Hi,

Am 18.05.2006 um 05:21 schrieb Rusty Lusk:

> All I can tell you is about mpd, not smpd.  In that case the only
> suspension mechanism currently implemented is to issue a SIGSTOP  
> signal
> to the mpiexec process.  This signal is caught and results in a  
> SIGSTOP

AFAIK the two signals SIGSTOP and SIGKILL are the ones which can't be  
trapped. Do you mean SIGINT instead?

-- Reuti


> signal being sent to all the application processes.  Then the mpiexec
> process (but not the mpd's or the manager processes) is suspended.  It
> and the application processes can then be continued by sending a  
> SIGCONT
> signal to mpiexec.  The signals (which also include SIGKILL) can be  
> send
> via keyboard commands or any other mechanisms for delivering signals.
>
>
>
> From: "Jason Crane" <jasonc at mrsc.ucsf.edu>
> Subject: [MPICH] MPICH2 suspend
> Date: Wed, 17 May 2006 15:50:17 -0700
>
>> Hi,
>>
>> The MPICH2 user's guide documentation (section 7.1) indicates that  
>> it is
>> possible to suspend and continue MPICH2 jobs, at least under mpd.   
>> I'm
>> interested in trying this under smpd process management from Sun's  
>> Grid
>> Engine and would like to know if there are any limitations or
>> requirements for job suspension to work correctly without issuing a
>> ctrl-z to the mpiexec process.  In particular, is it possible to  
>> suspend
>> the processes on a single arbitrary node within the job, or is it
>> necessary to signal all processes in the job simultaneously?
>>
>> thanks for any help,
>> -Jason
>>
>>




More information about the mpich-discuss mailing list