[mpich-discuss] Trouble with checkpoint
Darius Buntinas
buntinas at mcs.anl.gov
Mon Oct 24 16:41:09 CDT 2011
Hmm strange. Did you do a make clean first? I.e.:
make clean
make
make install
Also make sure you recompile your app (maybe even do a make clean for the app too).
-d
On Oct 22, 2011, at 3:28 PM, Fernando Luz wrote:
> Hi Darius,
>
> I applied the patch, but I have the same errors.
>
> Do you need some file or info about my system?
>
> Regards
>
> Fernando Luz
>
> ----- Mensagem original -----
> De: "Darius Buntinas" <buntinas at mcs.anl.gov>
> Para: mpich-discuss at mcs.anl.gov
> Enviadas: Sexta-feira, 21 de Outubro de 2011 16:20:49
> Assunto: Re: [mpich-discuss] Trouble with checkpoint
>
> Hi Fernando,
>
> Can you apply this patch and see if it fixes your problem?
>
> Let us know how it goes.
> -d
>
>
>
> On Oct 19, 2011, at 2:13 PM, Fernando Luz wrote:
>
>> Hi,
>>
>> I tried use the checkpoint-restart with this execution.
>>
>> mpiexec -ckpointlib blcr -ckpoint-prefix ./teste.ckpoint -ckpoint-interval 30 -f hosts -n 26 Dyna Prea_teste001.p3d 2
>>
>> with mpich2 and I received the follows errors.
>>
>> 0% [= ] 00:00:28 / 00:56:27
>> [proxy:0:0 at s23n20.gradebr.tpn] requesting checkpoint
>> [proxy:0:1 at s23n21.gradebr.tpn] requesting checkpoint
>> [proxy:0:2 at s23n22.gradebr.tpn] requesting checkpoint
>> [proxy:0:3 at s23n23.gradebr.tpn] requesting checkpoint
>> [proxy:0:0 at s23n20.gradebr.tpn] checkpoint completed
>> [proxy:0:1 at s23n21.gradebr.tpn] checkpoint completed
>> [proxy:0:2 at s23n22.gradebr.tpn] checkpoint completed
>> [proxy:0:3 at s23n23.gradebr.tpn] checkpoint completed
>> 0% [= ] 00:00:29 / 00:56:28Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x1ebebfc0, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fff7ba7b620) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x1f84f600, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fff957566a0) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x1fc58d50, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7ffff54102a0) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x7752ca0, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fffeab72ca0) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x12274ca0, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fffb55e4ea0) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x1b6c4600, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fff74e63520) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x15511ca0, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fff9fb57ca0) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x815afc0, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fff87e31fa0) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0xf1e7d80, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fff19d30120) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x1758f9a0, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fff06ac13a0) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0xaaf8ce0, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fff7e488920) failed
>> MPIDI_CH3I_Progress(321)..:
>> MPIDI_nem_ckpt_finish(469): sem_wait() failed Interrupted system call
>> Fatal error in MPI_Recv: Other MPI error, error stack:
>> MPI_Recv(186).............: MPI_Recv(buf=0x1cc47990, count=7, MPI_DOUBLE, src=0, tag=2, MPI_COMM_WORLD, status=0x7fff00638520) failed
>>
>> without checkpoint, the execution is accomplish. How I need to proceed to solve this error?
>>
>> Regards
>>
>> Fernando Luz
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list