[mpich-discuss] Socket closed

Tim Kroeger tim.kroeger at cevis.uni-bremen.de
Thu Nov 5 01:45:36 CST 2009


Dear Dave,

On Wed, 4 Nov 2009, Dave Goodell wrote:

> If you do suspect that you are out of memory and you are running on Linux, 
> check your system log for messages that say "invoked oom-killer".  If you 
> find those messages then you are definitely running out of memory.

Thank you for that hint.  I got some of these messages (exactly they 
say things like "oom-killer: gfp_mask=0x201d2, order=0"), but how can 
I be sure that they were caused by my application?  I am running on a 
large cluster, and it would not surprise me if somebody else's 
applications would also sometimes run out of memory.  The system log 
does not seem to give any information about the time at which the 
oom-killer has been invoked.

Anyway, after getting the thread balance right, my application does no 
longer crash after the 4th time step.  Rather, it crashes after the 
7th time step.  On the one hand, this argues for memory shortage since 
otherwise optimizing the balance would not have had any effect.  On 
the other hand, 7 steps should not consume more memory than 4 steps 
since there is no adaptive grid refinement (yet).  I guess there is 
some memory leak somewhere in my application.

Thank you very much again, I think I know now how to proceed.

Best Regards,

Tim

-- 
Dr. Tim Kroeger
tim.kroeger at mevis.fraunhofer.de            Phone +49-421-218-7710
tim.kroeger at cevis.uni-bremen.de            Fax   +49-421-218-4236

Fraunhofer MEVIS, Institute for Medical Image Computing
Universitaetsallee 29, 28359 Bremen, Germany



More information about the mpich-discuss mailing list