[mpich-discuss] How to get checkpoint-file

Darius Buntinas buntinas at mcs.anl.gov
Mon May 24 14:14:51 CDT 2010


Hi Bagus,

You're right, specifying a shared directory would lead to overwritten 
files.  I'm not sure how we missed that :-).  Anyway, I fixed this for 
the next release, but you can download the patch here if you're 
interested (there's a link to download at the bottom of the page):

https://trac.mcs.anl.gov/projects/mpich2/changeset/6716

I'm not sure why you're getting those errors on restart.  Can you try 
this on a single node and see what happens?

-d


On 05/21/2010 11:47 AM, Bagus Jati Santoso wrote:
> I try :
> mpiexec -n 11 ckpoint-prefix /tmp/app.ckpoint
> to restart
>
> And got this : ..
> system msg for write_line failure : Broken pipe
> Error: mpid_nem_ckpt.c:92 "ckpt_restart failed"
> Other MPI error, error stack:
> ckpt_restart(168)........:
> MPIDI_PG_SetConnInfo(632): PMI_KVS_Put returned -1
> [cli_0]: write_line error; fd=12 buf=:cmd=put kvsname=kvs_3354_0
> key=P0-business
> card value=description#ndsl1$port#42965$ifname#192.168.1.1$
> ....
> system msg for write_line failure : Broken pipe
> Error: mpid_nem_ckpt.c:92 "ckpt_restart failed"
> Other MPI error, error stack:
> ckpt_restart(168)........:
> MPIDI_PG_SetConnInfo(632): PMI_KVS_Put returned -1
> [cli_0]: write_line error; fd=12 buf=:cmd=put kvsname=kvs_3354_0
> key=P0-business
> card value=description#ndsl1$port#42965$ifname#192.168.1.11$
>
> Sorry Mr. Darius,
> If I put the checkpoint prefix in the same directory at shared FS.
> For example :
> /mpiexec -n 11 -ckpoint-interval 5 -ckpoint-prefix /mirror/app.ckpoint ./cg/
> (my shared FS is located in .mirror)
>
> Will the checkpoint file writing be success? Since all 11 nodes/cpu will
> write their checkpoint file using the same filename, /context, /at the
> shared FS.
>
> Thank you for your answers.
>
> Best regards,
> Bagus
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list