[MPICH] mpich2 "Internal MPI error!"

Rajeev Thakur thakur at mcs.anl.gov
Wed Jun 15 15:23:59 CDT 2005


> I compiled with --enable-threads=multiple, and I'm doing some sends 
> between threads on the same node.  Could that be part of the problem? 
> I'm not sure how stable the thread support is...

That could well be the problem. The multithreaded part has not been tested
extensively yet.

Rajeev
 

> -----Original Message-----
> From: sage weil [mailto:sage at newdream.net] 
> Sent: Wednesday, June 15, 2005 2:54 PM
> To: Anthony Chan
> Cc: Rajeev Thakur; mpich-discuss at mcs.anl.gov
> Subject: RE: [MPICH] mpich2 "Internal MPI error!"
> 
> >> 1: aborting job:
> >> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
> >> 1: MPI_Isend(142): MPI_Isend(buf=0x2bf3ffe0, count=4, 
> MPI_CHAR, dest=12, tag=0, MPI_COMM_WORLD, request=0x2bf405a0) failed
> >> 1: (unknown)(): Internal MPI error!
> >> 29: aborting job:
> >> 29: Fatal error in MPI_Recv: Other MPI error, error stack:
> >> 29: MPI_Recv(179): MPI_Recv(buf=0xb7fea3c0, count=24, 
> MPI_CHAR, src=-2, tag=0, MPI_COMM_WORLD, status=0xb7fea3e0) failed
> >
> > source rank in this MPI_Recv is -2.  It seems the memory is 
> corrupted.
> > You may want to use a memory checker to check your code and/or use a
> > debugger to find out why source rank is negative.
> 
> That's
> 
> #define MPI_ANY_SOURCE  (-2)
> 
> I'm working on putting together a smaller test program now..
> 
> I compiled with --enable-threads=multiple, and I'm doing some sends 
> between threads on the same node.  Could that be part of the problem? 
> I'm not sure how stable the thread support is...
> 
> sage
> 
> 
> >
> >> rank 29 in job 4  googoo-1_38138   caused collective abort 
> of all ranks
> >>    exit status of rank 29: return code 13
> >> 29: MPIDI_CH3_Progress_wait(209): an error occurred while 
> handling an event returned by MPIDU_Sock_Wait()
> >> 29: MPIDI_CH3I_Progress_handle_sock_event(489):
> >> 29: connection_recv_fail(1836):
> >> rank 1 in job 4  googoo-1_38138   caused collective abort 
> of all ranks
> >>    exit status of rank 1: killed by signal 9
> >> 29: MPIDU_Socki_handle_read(658): connection failure
> >> (set=0,sock=3,errno=104:Connection reset by peer)
> >>
> >> Is this what I should expect to see if there's some underlying
> >> communications/network error?  Or is that 'Connection 
> reset by peer'
> >> on 29 probably caused by the Internal MPI error?
> >>
> >> sage
> >>
> >>
> >>
> >>
> >> On Wed, 15 Jun 2005, Rajeev Thakur wrote:
> >>
> >>> Are you using the latest release, 1.0.2?
> >>>
> >>> Rajeev
> >>>
> >>>> -----Original Message-----
> >>>> From: owner-mpich-discuss at mcs.anl.gov
> >>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of sage weil
> >>>> Sent: Wednesday, June 15, 2005 11:53 AM
> >>>> To: mpich-discuss at mcs.anl.gov
> >>>> Subject: [MPICH] mpich2 "Internal MPI error!"
> >>>>
> >>>> Hi,
> >>>>
> >>>> I'm getting fatal "Internal MPI error!" and am not really
> >>>> sure where to
> >>>> start tracking it down.  I don't get any core files, and 
> the error
> >>>> message(s) aren't especially helpful:
> >>>>
> >>>> 1: aborting job:
> >>>> 1: Fatal error in MPI_Isend: Internal MPI error!, error stack:
> >>>> 1: MPI_Isend(142): MPI_Isend(buf=0x19c83e98, count=4,
> >>>> MPI_CHAR, dest=16,
> >>>> tag=0, MPI_COMM_WORLD, request=0xa4088c0) failed
> >>>> 1: (unknown)(): Internal MPI error!
> >>>>
> >>>> Are there any usual suspects I should check for, or is there
> >>>> some sort of
> >>>> debug mode I can enable to get more information?
> >>>>
> >>>> The app isn't doing much of anything very tricky... it's 
> doing almost
> >>>> entirely all Probe's, Isend's, Test's, and Recv's.  I'm not
> >>>> sure where to
> >>>> start...
> >>>>
> >>>> Any suggestions would be most appreciated!
> >>>>
> >>>> thanks-
> >>>> sage
> >>>>
> >>>>
> >>>
> >>
> >>
> >
> 
> 




More information about the mpich-discuss mailing list