Hi Pavan,<br><br>Yes. It is enabled. And I could checkpoint the app by using the option -ckpoint-interval 20 to do it automatically every 20 seconds. It is working well.<br>Then I wanted to try it manually by sending a signal, but nothing happened. So it looks weird to me. Don't know what the problem might be.<br>
<br>I am using mpich2-1.4.1p. Any ideas?<br><br>Thanks~<br><br><div class="gmail_quote">On Tue, Nov 29, 2011 at 11:27 PM, Pavan Balaji <span dir="ltr"><<a href="mailto:balaji@mcs.anl.gov">balaji@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><br>
Please keep mpich-discuss cc'ed.<br>
<br>
Can you make sure checkpointing is in fact enabled? (see the README + check the output of mpiexec -info). There was a problem where it was not being enabled by default and additional configure options had to be passed. This has been fixed, but might not be in the version you are using.<br>
<font color="#888888">
<br>
-- Pavan</font><div class="im"><br>
<br>
On 11/29/2011 10:58 PM, Wei Jiang wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
Hi Pavan,<br>
<br>
Thanks for your reply.<br>
<br>
I tried that, but nothing happened. I also tried to insert a code like<br>
"system("pkill -USR1 mpiexec");" after a synchronous point in the mpi<br>
code, but no checkpointing was done either.<br>
<br>
Is it possible that the SIGUSR1 signal was ignored? Because when I tired<br>
the hard kill with -KILL option, the mpiexec was killed as I expected.<br>
<br>
Or what could be the problem? Was I missing something?<br>
<br>
Thanks very much!<br>
<br>
On Mon, Nov 28, 2011 at 11:29 PM, Pavan Balaji <<a href="mailto:balaji@mcs.anl.gov" target="_blank">balaji@mcs.anl.gov</a><br></div><div class="im">
<mailto:<a href="mailto:balaji@mcs.anl.gov" target="_blank">balaji@mcs.anl.gov</a>>> wrote:<br>
<br>
<br>
On 11/29/2011 11:12 AM, Wei Jiang wrote:<br>
<br>
I was using BLCR in mpich2 to checkpoint/restart my mpi program.<br>
How can<br>
I request a checkpoint manually?<br>
<br>
<br>
You can run "pkill -USR1 mpiexec" from a different terminal.<br>
<br>
-- Pavan<br>
<br>
--<br>
Pavan Balaji<br>
<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>
<br>
<br>
<br>
<br>
--<br>
-- Wei<br>
<br>
</div></blockquote><div><div></div><div class="h5">
<br>
-- <br>
Pavan Balaji<br>
<a href="http://www.mcs.anl.gov/%7Ebalaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>-- Wei<br><br>