Hi, all:<br><span class="Apple-style-span" style="border-collapse:separate;color:rgb(0,0,0);font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium"><span class="Apple-style-span" style="border-collapse:collapse;font-family:arial,sans-serif;font-size:13px"><span class="Apple-converted-space"><br>
I </span>found checkpoint failure when I use MPICH with BLCR in my machine. <br><br>We tried inserting MPI_Iprobe in the program, but the second checkpoint still failed. <br><br></span></span><span class="Apple-style-span" style="border-collapse:separate;color:rgb(0,0,0);font-family:'Times New Roman';font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-size:medium"><span class="Apple-style-span" style="border-collapse:collapse;font-family:arial,sans-serif;font-size:13px">Alternatively, we try to set MPI_ASYNC_PROGRESS to enable a MPI progress thread which may help processes execute checkpoint algorithm. We first use --enable-async-progress when we build MPICH and then run "mpiexec -env MPICH_ASYNC_PROGRESS 1 -n 5 ./mpitest". But we met an error "Assertion failed in file async.c at line 52: !mpi_errno. internal ABORT - process 0". <br>
<br>Does anyone know how to fix this? Thank you. <br></span></span>