[mpich-discuss] How to get checkpoint-file
Darius Buntinas
buntinas at mcs.anl.gov
Fri May 21 10:35:15 CDT 2010
Try:
mpiexec -n 11 ckpoint-prefix /tmp/app.ckpoint
Because your checkpoints are stored on the local filesystem, make sure
you're using the same nodes. You'll also need to make sure that you're
starting the same ranks on the same nodes, but if the nodes are
specified in the same order, hydra should do that.
Can you put the checkpoint files on a shared filesystem? That would
tell us whether this was the source of the problem.
-d
On 05/21/2010 10:15 AM, Bagus Jati Santoso wrote:
> OK..
>
> Thank you all. It works.. :)
>
> Maybe, It will be better if in MPICH2-1.3a2' configuration's help, there
> will be explanation about this option:
> /--with-hydra-ckpointlib=blcr/.
> Because if I'm not wrong, I can't find that option when I type
> ./configure --help in MPICH2-1.3a2 installation.
>
> Okay, the checkpoint is running now :).
> I'm using interval (3 seconds) to do checkpointing.
> Then, in the middle of execution, I'm pressing Ctrl-Z / Ctrl-C to
> interrupt the mpiexec.
> After interrupt process's success, in order to restart the process, I
> execute this :/
> mpiexec -n 11 -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint -n 11/
>
> And I don't get anything on the screen. Is anything wrong?
>
> Thank you for your attention and answers.
>
> Best regards,
> Bagus
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list