[MPICH] mpich2 "Internal MPI error!"
sage weil
sage at newdream.net
Wed Jun 15 14:20:04 CDT 2005
It seemed to last longer with 1.0.2, but it still eventually bailed with
1: aborting job:
1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
1: MPI_Isend(142): MPI_Isend(buf=0x2bf3ffe0, count=4, MPI_CHAR, dest=12, tag=0, MPI_COMM_WORLD, request=0x2bf405a0) failed
1: (unknown)(): Internal MPI error!
29: aborting job:
29: Fatal error in MPI_Recv: Other MPI error, error stack:
29: MPI_Recv(179): MPI_Recv(buf=0xb7fea3c0, count=24, MPI_CHAR, src=-2, tag=0, MPI_COMM_WORLD, status=0xb7fea3e0) failed
rank 29 in job 4 googoo-1_38138 caused collective abort of all ranks
exit status of rank 29: return code 13
29: MPIDI_CH3_Progress_wait(209): an error occurred while handling an event returned by MPIDU_Sock_Wait()
29: MPIDI_CH3I_Progress_handle_sock_event(489):
29: connection_recv_fail(1836):
rank 1 in job 4 googoo-1_38138 caused collective abort of all ranks
exit status of rank 1: killed by signal 9
29: MPIDU_Socki_handle_read(658): connection failure
(set=0,sock=3,errno=104:Connection reset by peer)
Is this what I should expect to see if there's some underlying
communications/network error? Or is that 'Connection reset by peer'
on 29 probably caused by the Internal MPI error?
sage
On Wed, 15 Jun 2005, Rajeev Thakur wrote:
> Are you using the latest release, 1.0.2?
>
> Rajeev
>
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of sage weil
>> Sent: Wednesday, June 15, 2005 11:53 AM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: [MPICH] mpich2 "Internal MPI error!"
>>
>> Hi,
>>
>> I'm getting fatal "Internal MPI error!" and am not really
>> sure where to
>> start tracking it down. I don't get any core files, and the error
>> message(s) aren't especially helpful:
>>
>> 1: aborting job:
>> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
>> 1: MPI_Isend(142): MPI_Isend(buf=0x19c83e98, count=4,
>> MPI_CHAR, dest=16,
>> tag=0, MPI_COMM_WORLD, request=0xa4088c0) failed
>> 1: (unknown)(): Internal MPI error!
>>
>> Are there any usual suspects I should check for, or is there
>> some sort of
>> debug mode I can enable to get more information?
>>
>> The app isn't doing much of anything very tricky... it's doing almost
>> entirely all Probe's, Isend's, Test's, and Recv's. I'm not
>> sure where to
>> start...
>>
>> Any suggestions would be most appreciated!
>>
>> thanks-
>> sage
>>
>>
>
More information about the mpich-discuss
mailing list