[mpich-discuss] How to get checkpoint-file
Pavan Balaji
balaji at mcs.anl.gov
Tue May 18 00:38:01 CDT 2010
Bagus,
The /tmp/app.ckpoint directory needs to be created on the compute nodes,
not the node where you are running mpiexec from.
-- Pavan
On 05/18/2010 12:21 AM, Bagus Jati Santoso wrote:
> Dear Darius,
>
> Yes, /tmp/app.ckpoint directory is already exist. And I didn''t find
> anything here after the program was running, or while the program still
> run.
> But the execution always print 'requesting checkpoint... checkpoint
> completed'.
>
> Please give me a suggestion. I think it would be great if MPICH2 can
> communicate with BLCR :).
> Thank you
>
> Best regards,
> Bagus
>
> On Mon, May 17, 2010 at 11:57 PM, Darius Buntinas <buntinas at mcs.anl.gov
> <mailto:buntinas at mcs.anl.gov>> wrote:
>
> It seems that you did things correctly. Did you recompile your
> application (cg) with the new mpich2? The make sure the
> /tmp/app.ckpoint directory exists (not just /tmp but /tmp/app.ckpoint).
>
> Note that this version of MPICH2 is still an alpha version, and
> we're still working the bugs out of it. We appreciate that you're
> giving this a try.
>
> -d
>
>
> On 05/16/2010 03:43 AM, Bagus Jati Santoso wrote:
>
> Hello all,
>
> I have succesfully compiled blcr-0.8.2 and the target is in
> /mirror/blcr.
>
> Then I Install MPICH 2-1.3a2 that support BLCR, and the target is in
> /mirror/mpich2blcr.
> I installed it by :
> ./configure --prefix=/mirror/mpich2blcr --with-blcr=/mirror/blcr
> --with-blcr-include=/mirror/blcr/include
> --with-blcr-lib=/mirror/blcr/lib
> After that :
> make
> sudo make install
>
> And it shows no error.
>
> My cluster is consisted of 11 computers with debian.
>
> Then I compile. After that I execute the program my CG program
> with :
> mpiexec -ckpointlib blcr -ckpoint-interval 4 -ckpoint-prefix
> /tmp/app.ckpoint ./cg bcsstk18.mtx
>
> It seems that checkpoint process is success, since i found
> 'requested
> checkpointing... checkpointing completed' every 4 seconds.
>
> But, why I can't found the checkpoint file in /tmp/app.ckpoint?
> Is all my above procedures correct?
>
> Thanks for your answers.
>
> Best regards,
> Bagus
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpich-discuss
mailing list