[mpich-discuss] how to send a SIGUSR1 signal to mpiexec using BLCR?

Pavan Balaji balaji at mcs.anl.gov
Tue Nov 29 22:27:58 CST 2011


Please keep mpich-discuss cc'ed.

Can you make sure checkpointing is in fact enabled? (see the README + 
check the output of mpiexec -info). There was a problem where it was not 
being enabled by default and additional configure options had to be 
passed. This has been fixed, but might not be in the version you are using.

  -- Pavan

On 11/29/2011 10:58 PM, Wei Jiang wrote:
> Hi Pavan,
>
> Thanks for your reply.
>
> I tried that, but nothing happened. I also tried to insert a code like
> "system("pkill -USR1 mpiexec");" after a synchronous point in the mpi
> code, but no checkpointing was done either.
>
> Is it possible that the SIGUSR1 signal was ignored? Because when I tired
> the hard kill with -KILL option, the mpiexec was killed as I expected.
>
> Or what could be the problem? Was I missing something?
>
> Thanks very much!
>
> On Mon, Nov 28, 2011 at 11:29 PM, Pavan Balaji <balaji at mcs.anl.gov
> <mailto:balaji at mcs.anl.gov>> wrote:
>
>
>     On 11/29/2011 11:12 AM, Wei Jiang wrote:
>
>         I was using BLCR in mpich2 to checkpoint/restart my mpi program.
>         How can
>         I request a checkpoint manually?
>
>
>     You can run "pkill -USR1 mpiexec" from a different terminal.
>
>       -- Pavan
>
>     --
>     Pavan Balaji
>     http://www.mcs.anl.gov/~balaji
>
>
>
>
> --
> -- Wei
>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list