[mpich-discuss] Checkpoint and Restart; How to modify mpich2-mechanism

Darius Buntinas buntinas at mcs.anl.gov
Wed Jun 16 11:17:31 CDT 2010


The checkpoint is initiated (and the checkpoint file is created) by the 
hydra proxy.

This is done in HYDT_ckpoint_blcr_suspend():
     src/pm/hydra/tools/ckpoint/blcr/ckpoint_blcr.c

-d

On 06/12/2010 09:28 AM, Bagus Jati Santoso wrote:
> Dear all,
>
> I have question.
> I have a cluster with 11 machine, using Debian 5.0, MPICH2-1.3a2, and
> BLCR-0.8.2
>
> I want to do some simulation and measure the performance.
> Here it it my scenario :
> When take the checkpointing of its node, and it's finished, each node
> will send its checkpoint file to next 3 computer.
> And after that sending is finished, each node will have 3 checkpoint
> file from 3 previous computer, then do XOR of this 3 checkpoint file.
>
> I know how to send the checkpoint file of nodes to other computer.
> But where can I put this mechanism's source code? Or is there anyone
> have a solution how to implement it?
>
> Please I need the answer.
>
> Thank you very much for your answer and attention.
>
>
> Best regards,
>
> Bagus
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list