[mpich2-dev] regarding checkpointing using BLCR in MPICH2

Darius Buntinas buntinas at mcs.anl.gov
Mon Dec 12 15:30:32 CST 2011


Yup, there was a bug in the async progress code.  Try applying this patch, it should fix the async progress issue.

http://trac.mcs.anl.gov/projects/mpich2/changeset/9333

-d

On Dec 12, 2011, at 2:28 PM, Hao Yang wrote:

> Hi, all:
> 
> I found checkpoint failure when I use MPICH with BLCR in my  machine. 
> 
> We tried inserting MPI_Iprobe in the program, but the second checkpoint still failed. 
> 
> Alternatively, we try to set MPI_ASYNC_PROGRESS to enable a MPI progress thread which may help processes execute checkpoint algorithm. We first use --enable-async-progress when we build MPICH and then run "mpiexec -env MPICH_ASYNC_PROGRESS 1 -n 5 ./mpitest". But we met an error "Assertion failed in file async.c at line 52: !mpi_errno. internal ABORT - process 0". 
> 
> Does anyone know how to fix this? Thank you. 



More information about the mpich2-dev mailing list