[MPICH] mpich2 "Internal MPI error!"

Anthony Chan chan at mcs.anl.gov
Wed Jun 15 15:03:51 CDT 2005



On Wed, 15 Jun 2005, Anthony Chan wrote:

>
>
> On Wed, 15 Jun 2005, sage weil wrote:
>
> > It seemed to last longer with 1.0.2, but it still eventually bailed with
> >
> > 1: aborting job:
> > 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
> > 1: MPI_Isend(142): MPI_Isend(buf=0x2bf3ffe0, count=4, MPI_CHAR, dest=12, tag=0, MPI_COMM_WORLD, request=0x2bf405a0) failed
> > 1: (unknown)(): Internal MPI error!
> > 29: aborting job:
> > 29: Fatal error in MPI_Recv: Other MPI error, error stack:
> > 29: MPI_Recv(179): MPI_Recv(buf=0xb7fea3c0, count=24, MPI_CHAR, src=-2, tag=0, MPI_COMM_WORLD, status=0xb7fea3e0) failed
>
> source rank in this MPI_Recv is -2.  It seems the memory is corrupted.
> You may want to use a memory checker to check your code and/or use a
> debugger to find out why source rank is negative.

Oops!  Did you use MPI_ANY_SOURCE in this MPI_Recv ?  If so, this is not
the error.  MPI_ANY_SOURCE should have printed here.

>
> > rank 29 in job 4  googoo-1_38138   caused collective abort of all ranks
> >    exit status of rank 29: return code 13
> > 29: MPIDI_CH3_Progress_wait(209): an error occurred while handling an event returned by MPIDU_Sock_Wait()
> > 29: MPIDI_CH3I_Progress_handle_sock_event(489):
> > 29: connection_recv_fail(1836):
> > rank 1 in job 4  googoo-1_38138   caused collective abort of all ranks
> >    exit status of rank 1: killed by signal 9
> > 29: MPIDU_Socki_handle_read(658): connection failure
> > (set=0,sock=3,errno=104:Connection reset by peer)
> >
> > Is this what I should expect to see if there's some underlying
> > communications/network error?  Or is that 'Connection reset by peer'
> > on 29 probably caused by the Internal MPI error?
> >
> > sage
> >
> >
> >
> >
> > On Wed, 15 Jun 2005, Rajeev Thakur wrote:
> >
> > > Are you using the latest release, 1.0.2?
> > >
> > > Rajeev
> > >
> > >> -----Original Message-----
> > >> From: owner-mpich-discuss at mcs.anl.gov
> > >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of sage weil
> > >> Sent: Wednesday, June 15, 2005 11:53 AM
> > >> To: mpich-discuss at mcs.anl.gov
> > >> Subject: [MPICH] mpich2 "Internal MPI error!"
> > >>
> > >> Hi,
> > >>
> > >> I'm getting fatal "Internal MPI error!" and am not really
> > >> sure where to
> > >> start tracking it down.  I don't get any core files, and the error
> > >> message(s) aren't especially helpful:
> > >>
> > >> 1: aborting job:
> > >> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
> > >> 1: MPI_Isend(142): MPI_Isend(buf=0x19c83e98, count=4,
> > >> MPI_CHAR, dest=16,
> > >> tag=0, MPI_COMM_WORLD, request=0xa4088c0) failed
> > >> 1: (unknown)(): Internal MPI error!
> > >>
> > >> Are there any usual suspects I should check for, or is there
> > >> some sort of
> > >> debug mode I can enable to get more information?
> > >>
> > >> The app isn't doing much of anything very tricky... it's doing almost
> > >> entirely all Probe's, Isend's, Test's, and Recv's.  I'm not
> > >> sure where to
> > >> start...
> > >>
> > >> Any suggestions would be most appreciated!
> > >>
> > >> thanks-
> > >> sage
> > >>
> > >>
> > >
> >
> >
>
>




More information about the mpich-discuss mailing list