[mpich-discuss] Assertion failed

Darius Buntinas buntinas at mcs.anl.gov
Thu Dec 4 11:16:16 CST 2008


Hi Xavier,

This assertion means a message is getting corrupted, which often means
there's a problem with MPICH2.  Can you send us a small test program
that demonstrates this?

Thanks,
-d


On 12/03/2008 11:59 PM, Xavier Olive wrote:
> Hello,
> 
>  I tried to recompile again that version (instead of the ubuntu
> package), and I have maybe more explicit error...
> 
> Fatal error in MPI_Iprobe: Other MPI error, error stack:
> MPI_Iprobe(121)...........................:
> MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
> flag=0x90986da4, status=0x906e2878) failed
> MPIDI_CH3i_Progress_test(95)..............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(637)..............: connection failure
> (set=0,sock=2,errno=104:Connection reset by peer)[cli_0]: aborting
> job:
> Fatal error in MPI_Iprobe: Other MPI error, error stack:
> MPI_Iprobe(121)...........................:
> MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
> flag=0x90986da4, status=0x906e2878) failed
> MPIDI_CH3i_Progress_test(95)..............: an error occurred while
> handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(420):
> MPIDU_Socki_handle_read(637)..............: connection failure
> (set=0,sock=2,errno=104:Connection reset by peer)
> Assertion failed in file ch3_progress.c at line 431: conn->pkt.type <
> MPIDI_CH3_PKT_END_CH3
> internal ABORT - process 2[cli_2]: aborting job:
> internal ABORT - process 2
> rank 2 in job 3  localhost.localdomain_44739   caused collective abort
> of all ranks
>   exit status of rank 2: return code 1
> zsh: exit 9     ~/mpich2-1.0.8/bin/mpiexec -np 9
> 
> 
> The line with "Assertion failed in file ch3_progress.c at line 431:
> conn->pkt.type < MPIDI_CH3_PKT_END_CH3" moves randomly inside that
> message...
> 
> Since I use the Java wrapping and nothing is raised there, all this
> looks Chinese for me...
> 
> Thanks for your concern,
> Xavier
> 
> 
> On Thu, Dec 4, 2008 at 07:27, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>> Hard to say what the problem might be. Make sure you are using the latest
>> version of MPICH2 (1.0.8).
>>
>> Rajeev
>>
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov
>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Xavier Olive
>>> Sent: Wednesday, December 03, 2008 3:48 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: [mpich-discuss] Assertion failed
>>>
>>> Hello mpich-discuss users,
>>>
>>>  I am using mpich2 with mpiJava for my application and am facing a
>>> serious problem I don't know how to deal with. During any execution,
>>> sometimes in the very beginning, sometimes after some thousands of
>>> messages sent and received, I get the following error, generated by
>>> the C code:
>>>
>>> Assertion failed in file ch3_progress.c at line 431: conn->pkt.type <
>>> MPIDI_CH3_PKT_END_CH3
>>>
>>> I tried to investigate the code but I really don't get how I could
>>> violate that assertion. Would anyone have any idea about the cause of
>>> this problem?
>>>
>>> Thanks,
>>>
>>> --
>>> Xavier Olive
>>>
>>
> 
> 
> 



More information about the mpich-discuss mailing list