[mpich-discuss] Socket closed

Dave Goodell goodell at mcs.anl.gov
Wed Nov 4 11:32:06 CST 2009


If you do suspect that you are out of memory and you are running on  
Linux, check your system log for messages that say "invoked oom- 
killer".  If you find those messages then you are definitely running  
out of memory.

-Dave

On Nov 4, 2009, at 10:42 AM, Tim Kroeger wrote:

> On Wed, 4 Nov 2009, Dave Goodell wrote:
>
>> This is the classic user interface problem in mpdboot.  The short  
>> answer is to change your mpdboot command to: "mpdboot --totalnum=4  
>> --ncpus=6 -f nodefile".
>>
>> The slightly longer answer is that mpdboot doesn't respect the  
>> number of cpus set in the hostfile for the node on which mpdboot is  
>> run, it requires a --ncpus=X option.
>
> Ah, thank you.  That works now.
>
> I'll care about Darius' suggestion tomorrow.  Anyway, it's somehow  
> likely that the missing "--ncpus" was actually the overall decisive  
> mistake because my application is potentially short in memory, and  
> thus having improper load balancing might cause one of the processes  
> to crash.
>
> I'll let you guys know whether this was the problem.
>
> Thank you very much for now.
>
> Best Regards,
>
> Tim
>
> -- 
> Dr. Tim Kroeger
> tim.kroeger at mevis.fraunhofer.de            Phone +49-421-218-7710
> tim.kroeger at cevis.uni-bremen.de            Fax   +49-421-218-4236
>
> Fraunhofer MEVIS, Institute for Medical Image Computing
> Universitaetsallee 29, 28359 Bremen, Germany
>



More information about the mpich-discuss mailing list