[mpich-discuss] How to specify the "--save-all" option when using blcr to checkpoint apps in mpich2-1.4.1p1?

Pavan Balaji balaji at mcs.anl.gov
Mon Nov 28 22:28:03 CST 2011


Hi Wei,

Are you writing the checkpoint images to some shared file system? If you 
are seeing a problem in this case, can you file a bug report?

https://trac.mcs.anl.gov/projects/mpich2/newticket

  -- Pavan

On 11/28/2011 11:08 PM, Wei Jiang wrote:
> Hi,
>
> I was using blcr library with mpich2 to checkpoint/restart my
> applications. It is working well when I restart the apps on the same set
> of nodes.
> But when I use a different node (or set of nodes) to restart, the
> restarting process just hangs there.
>
> I looked at the BLCR documentation and it is mentioned that the
> "--save-all" flag should be specified with using a different node (or
> set of nodes) to re-run the saved apps.
>
> So I was wondering that whether mpich2 provides such a "--save-all"
> option to enable blcr calls when I use mpiexec? If so, how should I
> specify that?
>
> Thanks very much!
>
> Let me know if you need more information.
>
> Thanks~
>
> --
> -- Wei
>
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list