[MPICH] Out of memory problem

Dmitri Chubarov dmitri.chubarov at gmail.com
Wed Oct 17 08:09:06 CDT 2007


Dear MPICH developers,

we are running a small development cluster here where new numerical
codes are being developed tested and used for research in physics and
engineering. I mostly try to solve the issues that arise with the
users myself together with the user. However the number of users is
growing and the problems tend to get more involved. Currently there is
an issue that I find difficult to fix. I would appreciate any advice
on how to proceed any further.

Here is the problem.
We use MPICH 2 version 1.0.5 with SunStudio compilers on AMD Opterons.

There is a code that fails with the following message:

Fatal error in MPI_Scatter: Other MPI error, error stack:
MPI_Scatter(760)..........: MPI_Scatter(sbuf=0xef0860, scount=2211,
MPI_DOUBLE_COMPLEX, rbuf=0x4828fb0, rcount=2211, MPI_DOUBLE_COMPLEX,
root=0, MPI_COMM_WORLD) failed
MPIR_Scatter(253).........:
MPIC_Send(36).............:
MPIDI_EagerContigSend(146): failure occurred while attempting to send
an eager message
MPIDI_CH3_iStartMsgv(132).: Out of memory

The code does 9000000 iterations. The communication structure is such
that on each iteration there is
an MPI_BCAST from process 0,
an MPI_SCATTER from process 0 and
an MPI_GATHER to process 0.

We run the code with 14 ranks and rank 0 fails.

I wonder what might have caused "Out of memory" here.

Best regards,
  Dima

--
Dmitri Chubarov
Siberian Branch of the Russian Academy of Sciences
Institute of Computational Technologies
Lavrentjev str. 6
Novosibirsk, 630090
tel. +7 913 746 9733




More information about the mpich-discuss mailing list