[mpich-discuss] Assertion failed

Xavier Olive xo.olive at gmail.com
Thu Dec 4 20:58:44 CST 2008


Thanks for your answer Darius,

 unfortunately, my small test programs all work perfectly. Just the
big problem doesn't. I found some messages on some mpich1 forums
suggesting to use some cleanipcs. I didn't find it in the mpich2, so I
tried using it.

I have the feeling that the memory gets corrupted after a large number
messages sent/received. The main program in Java has been extensively
tested and works properly with sockets. Just after I programmed the
MPI interface (nothing magic, just a couple of serialization (the data
are not very big) + Send/Recv), I get those problems. And the memory
may still be corrupted for the next execution, which makes it fail at
the first Send met (and makes also fail the small test programs). As I
make my tests on my desktop computer right now, would it be plausible
explanation that I run out of memory ?

Well, if this is the explanation, I still don't really know how to
solve my problem :(

Xavier





On Fri, Dec 5, 2008 at 02:16, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
> Hi Xavier,
>
> This assertion means a message is getting corrupted, which often means
> there's a problem with MPICH2.  Can you send us a small test program
> that demonstrates this?
>
> Thanks,
> -d
>
>
> On 12/03/2008 11:59 PM, Xavier Olive wrote:
>> Hello,
>>
>>  I tried to recompile again that version (instead of the ubuntu
>> package), and I have maybe more explicit error...
>>
>> Fatal error in MPI_Iprobe: Other MPI error, error stack:
>> MPI_Iprobe(121)...........................:
>> MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
>> flag=0x90986da4, status=0x906e2878) failed
>> MPIDI_CH3i_Progress_test(95)..............: an error occurred while
>> handling an event returned by MPIDU_Sock_Wait()
>> MPIDI_CH3I_Progress_handle_sock_event(420):
>> MPIDU_Socki_handle_read(637)..............: connection failure
>> (set=0,sock=2,errno=104:Connection reset by peer)[cli_0]: aborting
>> job:
>> Fatal error in MPI_Iprobe: Other MPI error, error stack:
>> MPI_Iprobe(121)...........................:
>> MPI_Iprobe(src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, MPI_COMM_WORLD,
>> flag=0x90986da4, status=0x906e2878) failed
>> MPIDI_CH3i_Progress_test(95)..............: an error occurred while
>> handling an event returned by MPIDU_Sock_Wait()
>> MPIDI_CH3I_Progress_handle_sock_event(420):
>> MPIDU_Socki_handle_read(637)..............: connection failure
>> (set=0,sock=2,errno=104:Connection reset by peer)
>> Assertion failed in file ch3_progress.c at line 431: conn->pkt.type <
>> MPIDI_CH3_PKT_END_CH3
>> internal ABORT - process 2[cli_2]: aborting job:
>> internal ABORT - process 2
>> rank 2 in job 3  localhost.localdomain_44739   caused collective abort
>> of all ranks
>>   exit status of rank 2: return code 1
>> zsh: exit 9     ~/mpich2-1.0.8/bin/mpiexec -np 9
>>
>>
>> The line with "Assertion failed in file ch3_progress.c at line 431:
>> conn->pkt.type < MPIDI_CH3_PKT_END_CH3" moves randomly inside that
>> message...
>>
>> Since I use the Java wrapping and nothing is raised there, all this
>> looks Chinese for me...
>>
>> Thanks for your concern,
>> Xavier
>>
>>
>> On Thu, Dec 4, 2008 at 07:27, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>>> Hard to say what the problem might be. Make sure you are using the latest
>>> version of MPICH2 (1.0.8).
>>>
>>> Rajeev
>>>
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Xavier Olive
>>>> Sent: Wednesday, December 03, 2008 3:48 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: [mpich-discuss] Assertion failed
>>>>
>>>> Hello mpich-discuss users,
>>>>
>>>>  I am using mpich2 with mpiJava for my application and am facing a
>>>> serious problem I don't know how to deal with. During any execution,
>>>> sometimes in the very beginning, sometimes after some thousands of
>>>> messages sent and received, I get the following error, generated by
>>>> the C code:
>>>>
>>>> Assertion failed in file ch3_progress.c at line 431: conn->pkt.type <
>>>> MPIDI_CH3_PKT_END_CH3
>>>>
>>>> I tried to investigate the code but I really don't get how I could
>>>> violate that assertion. Would anyone have any idea about the cause of
>>>> this problem?
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Xavier Olive
>>>>
>>>
>>
>>
>>
>



-- 
Xavier Olive



More information about the mpich-discuss mailing list