[MPICH] mpich2 "Internal MPI error!"

Anthony Chan chan at mcs.anl.gov
Wed Jun 15 14:48:38 CDT 2005



On Wed, 15 Jun 2005, sage weil wrote:

> It seemed to last longer with 1.0.2, but it still eventually bailed with
>
> 1: aborting job:
> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
> 1: MPI_Isend(142): MPI_Isend(buf=0x2bf3ffe0, count=4, MPI_CHAR, dest=12, tag=0, MPI_COMM_WORLD, request=0x2bf405a0) failed
> 1: (unknown)(): Internal MPI error!
> 29: aborting job:
> 29: Fatal error in MPI_Recv: Other MPI error, error stack:
> 29: MPI_Recv(179): MPI_Recv(buf=0xb7fea3c0, count=24, MPI_CHAR, src=-2, tag=0, MPI_COMM_WORLD, status=0xb7fea3e0) failed

source rank in this MPI_Recv is -2.  It seems the memory is corrupted.
You may want to use a memory checker to check your code and/or use a
debugger to find out why source rank is negative.

> rank 29 in job 4  googoo-1_38138   caused collective abort of all ranks
>    exit status of rank 29: return code 13
> 29: MPIDI_CH3_Progress_wait(209): an error occurred while handling an event returned by MPIDU_Sock_Wait()
> 29: MPIDI_CH3I_Progress_handle_sock_event(489):
> 29: connection_recv_fail(1836):
> rank 1 in job 4  googoo-1_38138   caused collective abort of all ranks
>    exit status of rank 1: killed by signal 9
> 29: MPIDU_Socki_handle_read(658): connection failure
> (set=0,sock=3,errno=104:Connection reset by peer)
>
> Is this what I should expect to see if there's some underlying
> communications/network error?  Or is that 'Connection reset by peer'
> on 29 probably caused by the Internal MPI error?
>
> sage
>
>
>
>
> On Wed, 15 Jun 2005, Rajeev Thakur wrote:
>
> > Are you using the latest release, 1.0.2?
> >
> > Rajeev
> >
> >> -----Original Message-----
> >> From: owner-mpich-discuss at mcs.anl.gov
> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of sage weil
> >> Sent: Wednesday, June 15, 2005 11:53 AM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: [MPICH] mpich2 "Internal MPI error!"
> >>
> >> Hi,
> >>
> >> I'm getting fatal "Internal MPI error!" and am not really
> >> sure where to
> >> start tracking it down.  I don't get any core files, and the error
> >> message(s) aren't especially helpful:
> >>
> >> 1: aborting job:
> >> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
> >> 1: MPI_Isend(142): MPI_Isend(buf=0x19c83e98, count=4,
> >> MPI_CHAR, dest=16,
> >> tag=0, MPI_COMM_WORLD, request=0xa4088c0) failed
> >> 1: (unknown)(): Internal MPI error!
> >>
> >> Are there any usual suspects I should check for, or is there
> >> some sort of
> >> debug mode I can enable to get more information?
> >>
> >> The app isn't doing much of anything very tricky... it's doing almost
> >> entirely all Probe's, Isend's, Test's, and Recv's.  I'm not
> >> sure where to
> >> start...
> >>
> >> Any suggestions would be most appreciated!
> >>
> >> thanks-
> >> sage
> >>
> >>
> >
>
>




More information about the mpich-discuss mailing list