[MPICH] MPICH2 suspend
Jason Crane
jasonc at mrsc.ucsf.edu
Thu May 18 01:33:06 CDT 2006
Hi, Thanks for the information. Is the behavior defined if only a
single application process is signaled directly (SIGSTOP, SIGCONT),
without signaling the other application processes or the mpiexec?
Thanks, -Jason
the On Wed, 17
May 2006, Rusty Lusk wrote:
> All I can tell you is about mpd, not smpd. In that case the only
> suspension mechanism currently implemented is to issue a SIGSTOP signal
> to the mpiexec process. This signal is caught and results in a SIGSTOP
> signal being sent to all the application processes. Then the mpiexec
> process (but not the mpd's or the manager processes) is suspended. It
> and the application processes can then be continued by sending a SIGCONT
> signal to mpiexec. The signals (which also include SIGKILL) can be send
> via keyboard commands or any other mechanisms for delivering signals.
>
>
>
> From: "Jason Crane" <jasonc at mrsc.ucsf.edu>
> Subject: [MPICH] MPICH2 suspend
> Date: Wed, 17 May 2006 15:50:17 -0700
>
>> Hi,
>>
>> The MPICH2 user's guide documentation (section 7.1) indicates that it is
>> possible to suspend and continue MPICH2 jobs, at least under mpd. I'm
>> interested in trying this under smpd process management from Sun's Grid
>> Engine and would like to know if there are any limitations or
>> requirements for job suspension to work correctly without issuing a
>> ctrl-z to the mpiexec process. In particular, is it possible to suspend
>> the processes on a single arbitrary node within the job, or is it
>> necessary to signal all processes in the job simultaneously?
>>
>> thanks for any help,
>> -Jason
>>
>>
>
More information about the mpich-discuss
mailing list