[mpich2-dev] regarding checkpointing using BLCR in MPICH2
Darius Buntinas
buntinas at mcs.anl.gov
Mon Dec 12 15:30:32 CST 2011
Yup, there was a bug in the async progress code. Try applying this patch, it should fix the async progress issue.
http://trac.mcs.anl.gov/projects/mpich2/changeset/9333
-d
On Dec 12, 2011, at 2:28 PM, Hao Yang wrote:
> Hi, all:
>
> I found checkpoint failure when I use MPICH with BLCR in my machine.
>
> We tried inserting MPI_Iprobe in the program, but the second checkpoint still failed.
>
> Alternatively, we try to set MPI_ASYNC_PROGRESS to enable a MPI progress thread which may help processes execute checkpoint algorithm. We first use --enable-async-progress when we build MPICH and then run "mpiexec -env MPICH_ASYNC_PROGRESS 1 -n 5 ./mpitest". But we met an error "Assertion failed in file async.c at line 52: !mpi_errno. internal ABORT - process 0".
>
> Does anyone know how to fix this? Thank you.
More information about the mpich2-dev
mailing list