[MPICH] mpich2 "Internal MPI error!"

sage weil sage at newdream.net
Wed Jun 15 14:54:29 CDT 2005


>> 1: aborting job:
>> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
>> 1: MPI_Isend(142): MPI_Isend(buf=0x2bf3ffe0, count=4, MPI_CHAR, dest=12, tag=0, MPI_COMM_WORLD, request=0x2bf405a0) failed
>> 1: (unknown)(): Internal MPI error!
>> 29: aborting job:
>> 29: Fatal error in MPI_Recv: Other MPI error, error stack:
>> 29: MPI_Recv(179): MPI_Recv(buf=0xb7fea3c0, count=24, MPI_CHAR, src=-2, tag=0, MPI_COMM_WORLD, status=0xb7fea3e0) failed
>
> source rank in this MPI_Recv is -2.  It seems the memory is corrupted.
> You may want to use a memory checker to check your code and/or use a
> debugger to find out why source rank is negative.

That's

#define MPI_ANY_SOURCE  (-2)

I'm working on putting together a smaller test program now..

I compiled with --enable-threads=multiple, and I'm doing some sends 
between threads on the same node.  Could that be part of the problem? 
I'm not sure how stable the thread support is...

sage


>
>> rank 29 in job 4  googoo-1_38138   caused collective abort of all ranks
>>    exit status of rank 29: return code 13
>> 29: MPIDI_CH3_Progress_wait(209): an error occurred while handling an event returned by MPIDU_Sock_Wait()
>> 29: MPIDI_CH3I_Progress_handle_sock_event(489):
>> 29: connection_recv_fail(1836):
>> rank 1 in job 4  googoo-1_38138   caused collective abort of all ranks
>>    exit status of rank 1: killed by signal 9
>> 29: MPIDU_Socki_handle_read(658): connection failure
>> (set=0,sock=3,errno=104:Connection reset by peer)
>>
>> Is this what I should expect to see if there's some underlying
>> communications/network error?  Or is that 'Connection reset by peer'
>> on 29 probably caused by the Internal MPI error?
>>
>> sage
>>
>>
>>
>>
>> On Wed, 15 Jun 2005, Rajeev Thakur wrote:
>>
>>> Are you using the latest release, 1.0.2?
>>>
>>> Rajeev
>>>
>>>> -----Original Message-----
>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of sage weil
>>>> Sent: Wednesday, June 15, 2005 11:53 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: [MPICH] mpich2 "Internal MPI error!"
>>>>
>>>> Hi,
>>>>
>>>> I'm getting fatal "Internal MPI error!" and am not really
>>>> sure where to
>>>> start tracking it down.  I don't get any core files, and the error
>>>> message(s) aren't especially helpful:
>>>>
>>>> 1: aborting job:
>>>> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
>>>> 1: MPI_Isend(142): MPI_Isend(buf=0x19c83e98, count=4,
>>>> MPI_CHAR, dest=16,
>>>> tag=0, MPI_COMM_WORLD, request=0xa4088c0) failed
>>>> 1: (unknown)(): Internal MPI error!
>>>>
>>>> Are there any usual suspects I should check for, or is there
>>>> some sort of
>>>> debug mode I can enable to get more information?
>>>>
>>>> The app isn't doing much of anything very tricky... it's doing almost
>>>> entirely all Probe's, Isend's, Test's, and Recv's.  I'm not
>>>> sure where to
>>>> start...
>>>>
>>>> Any suggestions would be most appreciated!
>>>>
>>>> thanks-
>>>> sage
>>>>
>>>>
>>>
>>
>>
>




More information about the mpich-discuss mailing list