[mpich-discuss] MPICH2 Checkpointing Error with BLCR

Manisha Chauhan manisha.chauhan at yahoo.co.in
Fri Sep 21 23:07:32 CDT 2012


Hi,

I am working on check-pointing my MPI application. I installed both hydra and blcr.  I have also checked "mpiexec --info" and it shows check pointing library as blcr, But still I am not able to checkpoint my application.

It makes a request of "requesting checkpoint"  and returned with "checkpoint completed" but the context file is empty. The next time it tries it end with the following error.


MPICH2 
version= 1.4.1

[proxy:0:0 at tom-laptop] requesting checkpoint
[proxy:0:0 at tom-laptop] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not completed.[proxy:0:0 at tom-laptop] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:902): checkpoint suspend failed
[proxy:0:0 at tom-laptop] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at tom-laptop] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[mpiexec at tom-laptop] control_cb (./pm/pmiserv/pmiserv_cb.c:201): assert (!closed) failed
[mpiexec at tom-laptop] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at tom-laptop] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec at tom-laptop] main (./ui/mpich/mpiexec.c:325): process manager error waiting for completion

Can you please help me  to find out the issue.

Regards

Manisha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120922/ed6ca891/attachment.html>


More information about the mpich-discuss mailing list