[mpich-discuss] Checkpoint and Restart; How to modify mpich2-mechanism

Bagus Jati Santoso bagus.jati at gmail.com
Sat Jun 12 11:28:50 CDT 2010


Dear all,

I have question.
I have a cluster with 11 machine, using Debian 5.0, MPICH2-1.3a2, and
BLCR-0.8.2

I want to do some simulation and measure the performance.
Here it it my scenario :
When take the checkpointing of its node, and it's finished, each node will
send its checkpoint file to next 3 computer.
And after that sending is finished, each node will have 3 checkpoint file
from 3 previous computer, then do XOR of this 3 checkpoint file.

I know how to send the checkpoint file of nodes to other computer.
But where can I put this mechanism's source code? Or is there anyone have a
solution how to implement it?

Please I need the answer.

Thank you very much for your answer and attention.


Best regards,

Bagus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100613/3492e446/attachment.htm>


More information about the mpich-discuss mailing list