[mpich-discuss] Checkpoint and Restart; How to modify mpich2-mechanism
Bagus Jati Santoso
bagus.jati at gmail.com
Sat Jun 12 11:28:50 CDT 2010
Dear all,
I have question.
I have a cluster with 11 machine, using Debian 5.0, MPICH2-1.3a2, and
BLCR-0.8.2
I want to do some simulation and measure the performance.
Here it it my scenario :
When take the checkpointing of its node, and it's finished, each node will
send its checkpoint file to next 3 computer.
And after that sending is finished, each node will have 3 checkpoint file
from 3 previous computer, then do XOR of this 3 checkpoint file.
I know how to send the checkpoint file of nodes to other computer.
But where can I put this mechanism's source code? Or is there anyone have a
solution how to implement it?
Please I need the answer.
Thank you very much for your answer and attention.
Best regards,
Bagus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100613/3492e446/attachment.htm>
More information about the mpich-discuss
mailing list