[mpich-discuss] Socket closed
Tim Kroeger
tim.kroeger at cevis.uni-bremen.de
Thu Nov 5 01:45:36 CST 2009
Dear Dave,
On Wed, 4 Nov 2009, Dave Goodell wrote:
> If you do suspect that you are out of memory and you are running on Linux,
> check your system log for messages that say "invoked oom-killer". If you
> find those messages then you are definitely running out of memory.
Thank you for that hint. I got some of these messages (exactly they
say things like "oom-killer: gfp_mask=0x201d2, order=0"), but how can
I be sure that they were caused by my application? I am running on a
large cluster, and it would not surprise me if somebody else's
applications would also sometimes run out of memory. The system log
does not seem to give any information about the time at which the
oom-killer has been invoked.
Anyway, after getting the thread balance right, my application does no
longer crash after the 4th time step. Rather, it crashes after the
7th time step. On the one hand, this argues for memory shortage since
otherwise optimizing the balance would not have had any effect. On
the other hand, 7 steps should not consume more memory than 4 steps
since there is no adaptive grid refinement (yet). I guess there is
some memory leak somewhere in my application.
Thank you very much again, I think I know now how to proceed.
Best Regards,
Tim
--
Dr. Tim Kroeger
tim.kroeger at mevis.fraunhofer.de Phone +49-421-218-7710
tim.kroeger at cevis.uni-bremen.de Fax +49-421-218-4236
Fraunhofer MEVIS, Institute for Medical Image Computing
Universitaetsallee 29, 28359 Bremen, Germany
More information about the mpich-discuss
mailing list