[MPICH] Out of memory problem
Rajeev Thakur
thakur at mcs.anl.gov
Wed Oct 17 08:46:03 CDT 2007
May be a flow control problem. Try adding an MPI_Barrier every 100 iterations or so.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-
> discuss at mcs.anl.gov] On Behalf Of Dmitri Chubarov
> Sent: Wednesday, October 17, 2007 8:09 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] Out of memory problem
>
> Dear MPICH developers,
>
> we are running a small development cluster here where new numerical
> codes are being developed tested and used for research in physics and
> engineering. I mostly try to solve the issues that arise with the
> users myself together with the user. However the number of users is
> growing and the problems tend to get more involved. Currently there is
> an issue that I find difficult to fix. I would appreciate any advice
> on how to proceed any further.
>
> Here is the problem.
> We use MPICH 2 version 1.0.5 with SunStudio compilers on AMD Opterons.
>
> There is a code that fails with the following message:
>
> Fatal error in MPI_Scatter: Other MPI error, error stack:
> MPI_Scatter(760)..........: MPI_Scatter(sbuf=0xef0860, scount=2211,
> MPI_DOUBLE_COMPLEX, rbuf=0x4828fb0, rcount=2211, MPI_DOUBLE_COMPLEX,
> root=0, MPI_COMM_WORLD) failed
> MPIR_Scatter(253).........:
> MPIC_Send(36).............:
> MPIDI_EagerContigSend(146): failure occurred while attempting to send
> an eager message
> MPIDI_CH3_iStartMsgv(132).: Out of memory
>
> The code does 9000000 iterations. The communication structure is such
> that on each iteration there is
> an MPI_BCAST from process 0,
> an MPI_SCATTER from process 0 and
> an MPI_GATHER to process 0.
>
> We run the code with 14 ranks and rank 0 fails.
>
> I wonder what might have caused "Out of memory" here.
>
> Best regards,
> Dima
>
> --
> Dmitri Chubarov
> Siberian Branch of the Russian Academy of Sciences
> Institute of Computational Technologies
> Lavrentjev str. 6
> Novosibirsk, 630090
> tel. +7 913 746 9733
More information about the mpich-discuss
mailing list