[mpich-discuss] How to get checkpoint-file

Pavan Balaji balaji at mcs.anl.gov
Tue May 18 00:38:01 CDT 2010


Bagus,

The /tmp/app.ckpoint directory needs to be created on the compute nodes, 
not the node where you are running mpiexec from.

  -- Pavan

On 05/18/2010 12:21 AM, Bagus Jati Santoso wrote:
> Dear Darius,
> 
> Yes, /tmp/app.ckpoint directory is already exist. And I didn''t find 
> anything here after the program was running, or while the program still 
> run.
> But the execution always print 'requesting checkpoint... checkpoint 
> completed'.
> 
> Please give me a suggestion. I think it would be great if MPICH2 can 
> communicate with BLCR :).
> Thank you
> 
> Best regards,
> Bagus
> 
> On Mon, May 17, 2010 at 11:57 PM, Darius Buntinas <buntinas at mcs.anl.gov 
> <mailto:buntinas at mcs.anl.gov>> wrote:
> 
>     It seems that you did things correctly.  Did you recompile your
>     application (cg) with the new mpich2?  The make sure the
>     /tmp/app.ckpoint directory exists (not just /tmp but /tmp/app.ckpoint).
> 
>     Note that this version of MPICH2 is still an alpha version, and
>     we're still working the bugs out of it.  We appreciate that you're
>     giving this a try.
> 
>     -d
> 
> 
>     On 05/16/2010 03:43 AM, Bagus Jati Santoso wrote:
> 
>         Hello all,
> 
>         I have succesfully compiled blcr-0.8.2 and the target is in
>         /mirror/blcr.
> 
>         Then I Install MPICH 2-1.3a2 that support BLCR, and the target is in
>         /mirror/mpich2blcr.
>         I installed it by :
>         ./configure --prefix=/mirror/mpich2blcr --with-blcr=/mirror/blcr
>         --with-blcr-include=/mirror/blcr/include
>         --with-blcr-lib=/mirror/blcr/lib
>         After that :
>         make
>         sudo make install
> 
>         And it shows no error.
> 
>         My cluster is consisted of 11 computers with debian.
> 
>         Then I compile. After that I execute the program my CG program
>         with :
>         mpiexec -ckpointlib blcr -ckpoint-interval 4 -ckpoint-prefix
>         /tmp/app.ckpoint ./cg bcsstk18.mtx
> 
>         It seems that checkpoint process is success, since i found
>         'requested
>         checkpointing... checkpointing completed' every 4 seconds.
> 
>         But, why I can't found the checkpoint file in /tmp/app.ckpoint?
>         Is all my above procedures correct?
> 
>         Thanks for your answers.
> 
>         Best regards,
>         Bagus
> 
> 
> 
>         _______________________________________________
>         mpich-discuss mailing list
>         mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>         https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list