[MPICH] mpich2 "Internal MPI error!"
sage weil
sage at newdream.net
Wed Jun 15 15:30:32 CDT 2005
>> I compiled with --enable-threads=multiple, and I'm doing some sends
>> between threads on the same node. Could that be part of the problem?
>> I'm not sure how stable the thread support is...
>
> That could well be the problem. The multithreaded part has not been tested
> extensively yet.
Hmm. That brings up a different question, then. All of my MPI traffic is
already isolated to a single thread, with this one exception: I need a
way to make the MPI thread respond to incoming messages AND local events
(outgoing messages). Right now I do that by having the MPI thread do a
Recv to wait for incoming messages. Other local threads that need to wake
it up send a dummy message to self using Isend.
Is there a cleaner way to this? I need to be able to efficiently wait on
incoming Recv's and other events (e.g. a queued outgoing message).
sage
>
> Rajeev
>
>
>> -----Original Message-----
>> From: sage weil [mailto:sage at newdream.net]
>> Sent: Wednesday, June 15, 2005 2:54 PM
>> To: Anthony Chan
>> Cc: Rajeev Thakur; mpich-discuss at mcs.anl.gov
>> Subject: RE: [MPICH] mpich2 "Internal MPI error!"
>>
>>>> 1: aborting job:
>>>> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
>>>> 1: MPI_Isend(142): MPI_Isend(buf=0x2bf3ffe0, count=4,
>> MPI_CHAR, dest=12, tag=0, MPI_COMM_WORLD, request=0x2bf405a0) failed
>>>> 1: (unknown)(): Internal MPI error!
>>>> 29: aborting job:
>>>> 29: Fatal error in MPI_Recv: Other MPI error, error stack:
>>>> 29: MPI_Recv(179): MPI_Recv(buf=0xb7fea3c0, count=24,
>> MPI_CHAR, src=-2, tag=0, MPI_COMM_WORLD, status=0xb7fea3e0) failed
>>>
>>> source rank in this MPI_Recv is -2. It seems the memory is
>> corrupted.
>>> You may want to use a memory checker to check your code and/or use a
>>> debugger to find out why source rank is negative.
>>
>> That's
>>
>> #define MPI_ANY_SOURCE (-2)
>>
>> I'm working on putting together a smaller test program now..
>>
>> I compiled with --enable-threads=multiple, and I'm doing some sends
>> between threads on the same node. Could that be part of the problem?
>> I'm not sure how stable the thread support is...
>>
>> sage
>>
>>
>>>
>>>> rank 29 in job 4 googoo-1_38138 caused collective abort
>> of all ranks
>>>> exit status of rank 29: return code 13
>>>> 29: MPIDI_CH3_Progress_wait(209): an error occurred while
>> handling an event returned by MPIDU_Sock_Wait()
>>>> 29: MPIDI_CH3I_Progress_handle_sock_event(489):
>>>> 29: connection_recv_fail(1836):
>>>> rank 1 in job 4 googoo-1_38138 caused collective abort
>> of all ranks
>>>> exit status of rank 1: killed by signal 9
>>>> 29: MPIDU_Socki_handle_read(658): connection failure
>>>> (set=0,sock=3,errno=104:Connection reset by peer)
>>>>
>>>> Is this what I should expect to see if there's some underlying
>>>> communications/network error? Or is that 'Connection
>> reset by peer'
>>>> on 29 probably caused by the Internal MPI error?
>>>>
>>>> sage
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, 15 Jun 2005, Rajeev Thakur wrote:
>>>>
>>>>> Are you using the latest release, 1.0.2?
>>>>>
>>>>> Rajeev
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of sage weil
>>>>>> Sent: Wednesday, June 15, 2005 11:53 AM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: [MPICH] mpich2 "Internal MPI error!"
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm getting fatal "Internal MPI error!" and am not really
>>>>>> sure where to
>>>>>> start tracking it down. I don't get any core files, and
>> the error
>>>>>> message(s) aren't especially helpful:
>>>>>>
>>>>>> 1: aborting job:
>>>>>> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
>>>>>> 1: MPI_Isend(142): MPI_Isend(buf=0x19c83e98, count=4,
>>>>>> MPI_CHAR, dest=16,
>>>>>> tag=0, MPI_COMM_WORLD, request=0xa4088c0) failed
>>>>>> 1: (unknown)(): Internal MPI error!
>>>>>>
>>>>>> Are there any usual suspects I should check for, or is there
>>>>>> some sort of
>>>>>> debug mode I can enable to get more information?
>>>>>>
>>>>>> The app isn't doing much of anything very tricky... it's
>> doing almost
>>>>>> entirely all Probe's, Isend's, Test's, and Recv's. I'm not
>>>>>> sure where to
>>>>>> start...
>>>>>>
>>>>>> Any suggestions would be most appreciated!
>>>>>>
>>>>>> thanks-
>>>>>> sage
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
More information about the mpich-discuss
mailing list