[mpich-discuss] Socket closed
Dave Goodell
goodell at mcs.anl.gov
Wed Nov 4 11:32:06 CST 2009
If you do suspect that you are out of memory and you are running on
Linux, check your system log for messages that say "invoked oom-
killer". If you find those messages then you are definitely running
out of memory.
-Dave
On Nov 4, 2009, at 10:42 AM, Tim Kroeger wrote:
> On Wed, 4 Nov 2009, Dave Goodell wrote:
>
>> This is the classic user interface problem in mpdboot. The short
>> answer is to change your mpdboot command to: "mpdboot --totalnum=4
>> --ncpus=6 -f nodefile".
>>
>> The slightly longer answer is that mpdboot doesn't respect the
>> number of cpus set in the hostfile for the node on which mpdboot is
>> run, it requires a --ncpus=X option.
>
> Ah, thank you. That works now.
>
> I'll care about Darius' suggestion tomorrow. Anyway, it's somehow
> likely that the missing "--ncpus" was actually the overall decisive
> mistake because my application is potentially short in memory, and
> thus having improper load balancing might cause one of the processes
> to crash.
>
> I'll let you guys know whether this was the problem.
>
> Thank you very much for now.
>
> Best Regards,
>
> Tim
>
> --
> Dr. Tim Kroeger
> tim.kroeger at mevis.fraunhofer.de Phone +49-421-218-7710
> tim.kroeger at cevis.uni-bremen.de Fax +49-421-218-4236
>
> Fraunhofer MEVIS, Institute for Medical Image Computing
> Universitaetsallee 29, 28359 Bremen, Germany
>
More information about the mpich-discuss
mailing list