[mpich-discuss] Coordinated Checkpoint without making checkpoint images

Darius Buntinas buntinas at mcs.anl.gov
Tue Nov 1 14:20:47 CDT 2011


Hi Mehdi,

You'll need to modify the MPIDI_nem_ckpt_finish() function in mpid_nem_ckpt.c.  This function is called when the checkpoint protocol has completed and we're ready to take a checkpoint of the process.  The checkpoint is taken between the sem_post and sem_wait (the blcr checkpoint thread (i.e., the ckpt_cb function) is waiting on the ckpt_sem to take the checkpoint).  If you're doing this without blcr, you'll also need to modify the mechanism to initiate the checkpoint.  Normally the checkpoint thread will set MPIDI_nem_ckpt_start_checkpoint = TRUE at rank 0, so you'll need to do that yourself.

-d


On Oct 27, 2011, at 2:33 PM, Mohammed El Mehdi DIOURI wrote:

> Hi,
> 
> I was wondering if with mpich2 we can run the coordinated checkpointing without making the checkpoint images. 
> I mean for each checkpoint interval, we only play the messages that enable the corresponding coordination ?
> 
> Thanks for your help,
> 
> Mehdi.
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list