[mpich-discuss] How to get checkpoint-file

Darius Buntinas buntinas at mcs.anl.gov
Fri May 21 10:35:15 CDT 2010


Try:
     mpiexec -n 11 ckpoint-prefix /tmp/app.ckpoint

Because your checkpoints are stored on the local filesystem, make sure 
you're using the same nodes.  You'll also need to make sure that you're 
starting the same ranks on the same nodes, but if the nodes are 
specified in the same order, hydra should do that.

Can you put the checkpoint files on a shared filesystem?  That would 
tell us whether this was the source of the problem.

-d

On 05/21/2010 10:15 AM, Bagus Jati Santoso wrote:
> OK..
>
> Thank you all. It works.. :)
>
> Maybe, It will be better if in MPICH2-1.3a2' configuration's help, there
> will be explanation about this option:
> /--with-hydra-ckpointlib=blcr/.
> Because if I'm not wrong, I can't find that option when I type
> ./configure --help in MPICH2-1.3a2 installation.
>
> Okay, the checkpoint is running now :).
> I'm using interval (3 seconds) to do checkpointing.
> Then, in the middle of execution, I'm pressing Ctrl-Z / Ctrl-C to
> interrupt the mpiexec.
> After interrupt process's success, in order to restart the process, I
> execute this :/
> mpiexec -n 11 -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint -n 11/
>
> And I don't get anything on the screen. Is anything wrong?
>
> Thank you for your attention and answers.
>
> Best regards,
> Bagus
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list