[MPICH] MPICH2 suspend
Reuti
reuti at staff.uni-marburg.de
Thu May 18 01:04:54 CDT 2006
Hi,
Am 18.05.2006 um 05:21 schrieb Rusty Lusk:
> All I can tell you is about mpd, not smpd. In that case the only
> suspension mechanism currently implemented is to issue a SIGSTOP
> signal
> to the mpiexec process. This signal is caught and results in a
> SIGSTOP
AFAIK the two signals SIGSTOP and SIGKILL are the ones which can't be
trapped. Do you mean SIGINT instead?
-- Reuti
> signal being sent to all the application processes. Then the mpiexec
> process (but not the mpd's or the manager processes) is suspended. It
> and the application processes can then be continued by sending a
> SIGCONT
> signal to mpiexec. The signals (which also include SIGKILL) can be
> send
> via keyboard commands or any other mechanisms for delivering signals.
>
>
>
> From: "Jason Crane" <jasonc at mrsc.ucsf.edu>
> Subject: [MPICH] MPICH2 suspend
> Date: Wed, 17 May 2006 15:50:17 -0700
>
>> Hi,
>>
>> The MPICH2 user's guide documentation (section 7.1) indicates that
>> it is
>> possible to suspend and continue MPICH2 jobs, at least under mpd.
>> I'm
>> interested in trying this under smpd process management from Sun's
>> Grid
>> Engine and would like to know if there are any limitations or
>> requirements for job suspension to work correctly without issuing a
>> ctrl-z to the mpiexec process. In particular, is it possible to
>> suspend
>> the processes on a single arbitrary node within the job, or is it
>> necessary to signal all processes in the job simultaneously?
>>
>> thanks for any help,
>> -Jason
>>
>>
More information about the mpich-discuss
mailing list