[mpich2-dev] Checkpointing failed

Bo Fang flyree at gmail.com
Tue Nov 29 19:04:47 CST 2011


Hi,

I am working on a course project which aims to evaluate MPICH2 with BLCR.
But I am having some problems with running my benchmarks under ckpoint
mode. The problem I have is that when the second checkpoint is requested, a
error would occur, no matter what time interval I specify or which
benchmark is running.

Here is the error message:

--------------------------------------------------------------------------------------
[proxy:0:0 at bo-laptop] requesting checkpoint
[proxy:0:0 at bo-laptop] HYDT_ckpoint_checkpoint
(./tools/ckpoint/ckpoint.c:111): Previous checkpoint has not
completed.[proxy:0:0 at bo-laptop] HYD_pmcd_pmip_control_cmd_cb
(./pm/pmiserv/pmip_cb.c:947): checkpoint suspend failed
[proxy:0:0 at bo-laptop] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at bo-laptop] main (./pm/pmiserv/pmip.c:225): demux engine error
waiting for event
[mpiexec at bo-laptop] control_cb (./pm/pmiserv/pmiserv_cb.c:215): assert
(!closed) failed
[mpiexec at bo-laptop] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at bo-laptop] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec at bo-laptop] main (./ui/mpich/mpiexec.c:420): process manager error
waiting for completion
------------------------------------------------------------------------------------

It happened when the second checkpoint is requested. It seems that the
first one is not complete when the second one is coming. But from the code
I don't see any hint for why the first checkpoint is not complete. The
checkpointing file of the first one is actually very large (over 150 MB).

Thank you very much for your help.

Bo Fang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich2-dev/attachments/20111129/2f925dde/attachment.htm>


More information about the mpich2-dev mailing list