[mpich-discuss] Assertion failed
Darius Buntinas
buntinas at mcs.anl.gov
Thu Dec 4 11:16:16 CST 2008
Hi Xavier,
This assertion means a message is getting corrupted, which often means
there's a problem with MPICH2. Can you send us a small test program
that demonstrates this?
Thanks,
-d
On 12/03/2008 11:59 PM, Xavier Olive wrote:
> Hello,
>
> I tried to recompile again that version (instead of the ubuntu
> package), and I have maybe more explicit error...
>
> Fatal error in MPI_Iprobe: Other MPI error, error stack:
> MPI_Iprobe(121)...........................:
> MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
> flag=0x90986da4, status=0x906e2878) failed
> MPIDI_CH3i_Progress_test(95)..............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(637)..............: connection failure
> (set=0,sock=2,errno=104:Connection reset by peer)[cli_0]: aborting
> job:
> Fatal error in MPI_Iprobe: Other MPI error, error stack:
> MPI_Iprobe(121)...........................:
> MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
> flag=0x90986da4, status=0x906e2878) failed
> MPIDI_CH3i_Progress_test(95)..............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(637)..............: connection failure
> (set=0,sock=2,errno=104:Connection reset by peer)
> Assertion failed in file ch3_progress.c at line 431: conn->pkt.type <
> MPIDI_CH3_PKT_END_CH3
> internal ABORT - process 2[cli_2]: aborting job:
> internal ABORT - process 2
> rank 2 in job 3 localhost.localdomain_44739 caused collective abort
> of all ranks
> exit status of rank 2: return code 1
> zsh: exit 9 ~/mpich2-1.0.8/bin/mpiexec -np 9
>
>
> The line with "Assertion failed in file ch3_progress.c at line 431:
> conn->pkt.type < MPIDI_CH3_PKT_END_CH3" moves randomly inside that
> message...
>
> Since I use the Java wrapping and nothing is raised there, all this
> looks Chinese for me...
>
> Thanks for your concern,
> Xavier
>
>
> On Thu, Dec 4, 2008 at 07:27, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>> Hard to say what the problem might be. Make sure you are using the latest
>> version of MPICH2 (1.0.8).
>>
>> Rajeev
>>
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Xavier Olive
>>> Sent: Wednesday, December 03, 2008 3:48 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: [mpich-discuss] Assertion failed
>>>
>>> Hello mpich-discuss users,
>>>
>>> I am using mpich2 with mpiJava for my application and am facing a
>>> serious problem I don't know how to deal with. During any execution,
>>> sometimes in the very beginning, sometimes after some thousands of
>>> messages sent and received, I get the following error, generated by
>>> the C code:
>>>
>>> Assertion failed in file ch3_progress.c at line 431: conn->pkt.type <
>>> MPIDI_CH3_PKT_END_CH3
>>>
>>> I tried to investigate the code but I really don't get how I could
>>> violate that assertion. Would anyone have any idea about the cause of
>>> this problem?
>>>
>>> Thanks,
>>>
>>> --
>>> Xavier Olive
>>>
>>
>
>
>
More information about the mpich-discuss
mailing list