[mpich-discuss] how to send a SIGUSR1 signal to mpiexec using BLCR?

Wei Jiang jiangwei at cse.ohio-state.edu
Tue Nov 29 22:48:45 CST 2011


Hi Pavan,

Yes. It is enabled. And I could checkpoint the app by using the option
-ckpoint-interval 20 to do it automatically every 20 seconds. It is working
well.
Then I wanted to try it manually by sending a signal, but nothing happened.
So it looks weird to me. Don't know what the problem might be.

I am using mpich2-1.4.1p. Any ideas?

Thanks~

On Tue, Nov 29, 2011 at 11:27 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:

>
> Please keep mpich-discuss cc'ed.
>
> Can you make sure checkpointing is in fact enabled? (see the README +
> check the output of mpiexec -info). There was a problem where it was not
> being enabled by default and additional configure options had to be passed.
> This has been fixed, but might not be in the version you are using.
>
>  -- Pavan
>
>
> On 11/29/2011 10:58 PM, Wei Jiang wrote:
>
>> Hi Pavan,
>>
>> Thanks for your reply.
>>
>> I tried that, but nothing happened. I also tried to insert a code like
>> "system("pkill -USR1 mpiexec");" after a synchronous point in the mpi
>> code, but no checkpointing was done either.
>>
>> Is it possible that the SIGUSR1 signal was ignored? Because when I tired
>> the hard kill with -KILL option, the mpiexec was killed as I expected.
>>
>> Or what could be the problem? Was I missing something?
>>
>> Thanks very much!
>>
>> On Mon, Nov 28, 2011 at 11:29 PM, Pavan Balaji <balaji at mcs.anl.gov
>> <mailto:balaji at mcs.anl.gov>> wrote:
>>
>>
>>    On 11/29/2011 11:12 AM, Wei Jiang wrote:
>>
>>        I was using BLCR in mpich2 to checkpoint/restart my mpi program.
>>        How can
>>        I request a checkpoint manually?
>>
>>
>>    You can run "pkill -USR1 mpiexec" from a different terminal.
>>
>>      -- Pavan
>>
>>    --
>>    Pavan Balaji
>>    http://www.mcs.anl.gov/~balaji
>>
>>
>>
>>
>> --
>> -- Wei
>>
>>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>



-- 
-- Wei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111129/5e226576/attachment-0001.htm>


More information about the mpich-discuss mailing list