[MPICH] mpich2 "Internal MPI error!"

sage weil sage at newdream.net
Wed Jun 15 15:30:32 CDT 2005


>> I compiled with --enable-threads=multiple, and I'm doing some sends
>> between threads on the same node.  Could that be part of the problem?
>> I'm not sure how stable the thread support is...
>
> That could well be the problem. The multithreaded part has not been tested
> extensively yet.

Hmm.  That brings up a different question, then.  All of my MPI traffic is 
already isolated to a single thread, with this one exception:  I need a 
way to make the MPI thread respond to incoming messages AND local events 
(outgoing messages).  Right now I do that by having the MPI thread do a 
Recv to wait for incoming messages.  Other local threads that need to wake 
it up send a dummy message to self using Isend.

Is there a cleaner way to this?  I need to be able to efficiently wait on 
incoming Recv's and other events (e.g. a queued outgoing message).

sage


>
> Rajeev
>
>
>> -----Original Message-----
>> From: sage weil [mailto:sage at newdream.net]
>> Sent: Wednesday, June 15, 2005 2:54 PM
>> To: Anthony Chan
>> Cc: Rajeev Thakur; mpich-discuss at mcs.anl.gov
>> Subject: RE: [MPICH] mpich2 "Internal MPI error!"
>>
>>>> 1: aborting job:
>>>> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
>>>> 1: MPI_Isend(142): MPI_Isend(buf=0x2bf3ffe0, count=4,
>> MPI_CHAR, dest=12, tag=0, MPI_COMM_WORLD, request=0x2bf405a0) failed
>>>> 1: (unknown)(): Internal MPI error!
>>>> 29: aborting job:
>>>> 29: Fatal error in MPI_Recv: Other MPI error, error stack:
>>>> 29: MPI_Recv(179): MPI_Recv(buf=0xb7fea3c0, count=24,
>> MPI_CHAR, src=-2, tag=0, MPI_COMM_WORLD, status=0xb7fea3e0) failed
>>>
>>> source rank in this MPI_Recv is -2.  It seems the memory is
>> corrupted.
>>> You may want to use a memory checker to check your code and/or use a
>>> debugger to find out why source rank is negative.
>>
>> That's
>>
>> #define MPI_ANY_SOURCE  (-2)
>>
>> I'm working on putting together a smaller test program now..
>>
>> I compiled with --enable-threads=multiple, and I'm doing some sends
>> between threads on the same node.  Could that be part of the problem?
>> I'm not sure how stable the thread support is...
>>
>> sage
>>
>>
>>>
>>>> rank 29 in job 4  googoo-1_38138   caused collective abort
>> of all ranks
>>>>    exit status of rank 29: return code 13
>>>> 29: MPIDI_CH3_Progress_wait(209): an error occurred while
>> handling an event returned by MPIDU_Sock_Wait()
>>>> 29: MPIDI_CH3I_Progress_handle_sock_event(489):
>>>> 29: connection_recv_fail(1836):
>>>> rank 1 in job 4  googoo-1_38138   caused collective abort
>> of all ranks
>>>>    exit status of rank 1: killed by signal 9
>>>> 29: MPIDU_Socki_handle_read(658): connection failure
>>>> (set=0,sock=3,errno=104:Connection reset by peer)
>>>>
>>>> Is this what I should expect to see if there's some underlying
>>>> communications/network error?  Or is that 'Connection
>> reset by peer'
>>>> on 29 probably caused by the Internal MPI error?
>>>>
>>>> sage
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, 15 Jun 2005, Rajeev Thakur wrote:
>>>>
>>>>> Are you using the latest release, 1.0.2?
>>>>>
>>>>> Rajeev
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of sage weil
>>>>>> Sent: Wednesday, June 15, 2005 11:53 AM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: [MPICH] mpich2 "Internal MPI error!"
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm getting fatal "Internal MPI error!" and am not really
>>>>>> sure where to
>>>>>> start tracking it down.  I don't get any core files, and
>> the error
>>>>>> message(s) aren't especially helpful:
>>>>>>
>>>>>> 1: aborting job:
>>>>>> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
>>>>>> 1: MPI_Isend(142): MPI_Isend(buf=0x19c83e98, count=4,
>>>>>> MPI_CHAR, dest=16,
>>>>>> tag=0, MPI_COMM_WORLD, request=0xa4088c0) failed
>>>>>> 1: (unknown)(): Internal MPI error!
>>>>>>
>>>>>> Are there any usual suspects I should check for, or is there
>>>>>> some sort of
>>>>>> debug mode I can enable to get more information?
>>>>>>
>>>>>> The app isn't doing much of anything very tricky... it's
>> doing almost
>>>>>> entirely all Probe's, Isend's, Test's, and Recv's.  I'm not
>>>>>> sure where to
>>>>>> start...
>>>>>>
>>>>>> Any suggestions would be most appreciated!
>>>>>>
>>>>>> thanks-
>>>>>> sage
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>




More information about the mpich-discuss mailing list