[MPICH] Out of memory problem

Dmitri Chubarov dmitri.chubarov at gmail.com
Thu Oct 18 10:11:48 CDT 2007


Hello, Rajeev,

I have managed to isolate the problem in some 40 lines.
Unfortunately, MPI_Barrier did not help "out of the box".

I suspect, that the problem might be due to a memory leak in MPICH2.
When running the program I observe that the resident set is growing
very fast, which normally should not happen.

I would be very grateful if someone could run the following code with
mpich2 1.0.5
to see if the problem is indeed reproducible and with 1.0.6 to see if
this is fixed
in the latest version. It takes about 10 minutes with 3 processes on a
2.4GHz Opteron DC system.

Thank you,
  Dima

--- code sample starts ---
! "outofmemory.f90"
include "mpif.h"

integer :: NMAX = 200001
integer :: NSTEP = 1

real*8 psi0(1000000),psi(200000)
real*8 dens(100000),dens0(200000)

integer myrank,mysize
integer M
integer ierr
integer i

  call MPI_Init(ierr)
  call MPI_Comm_size(MPI_COMM_WORLD,mysize,ierr)
  call MPI_Comm_rank(MPI_COMM_WORLD,myrank,ierr)

  do i = 0,NMAX,NSTEP
! compute some random M
      M = abs(sin(i/100 + 1.0))*100000.0/mysize

! Bcast M
      call MPI_Bcast(M,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr)

      if ((myrank .eq. 0) .AND. (mod(i,100000) .eq. 0)) then
        write (*,*) myrank, M
      endif

! Do a scatter
      call MPI_Scatter(psi0,M*2,MPI_REAL8,psi,M*2,MPI_REAL8,0,MPI_COMM_WORLD,ierr)

! Do a gather
      call MPI_Gather(dens,M,MPI_REAL8,dens0,M,MPI_REAL8,0,MPI_COMM_WORLD,ierr)

      if (mod(i,100) .eq. 0) then
         call MPI_Barrier(MPI_COMM_WORLD,ierr) ! Have a barrier
      endif

   end do
   call MPI_Finalize(ierr)

end

-- code sample ends --

> >
> > Here is the problem.
> > We use MPICH 2 version 1.0.5 with SunStudio compilers on AMD Opterons.
> >
> > There is a code that fails with the following message:
> >
> > Fatal error in MPI_Scatter: Other MPI error, error stack:
> > MPI_Scatter(760)..........: MPI_Scatter(sbuf=0xef0860, scount=2211,
> > MPI_DOUBLE_COMPLEX, rbuf=0x4828fb0, rcount=2211, MPI_DOUBLE_COMPLEX,
> > root=0, MPI_COMM_WORLD) failed
> > MPIR_Scatter(253).........:
> > MPIC_Send(36).............:
> > MPIDI_EagerContigSend(146): failure occurred while attempting to send
> > an eager message
> > MPIDI_CH3_iStartMsgv(132).: Out of memory
> >
> > I wonder what might have caused "Out of memory" here.
> >




More information about the mpich-discuss mailing list