[mpich-discuss] Socket closed

Dave Goodell goodell at mcs.anl.gov
Wed Nov 4 10:35:06 CST 2009


This is the classic user interface problem in mpdboot.  The short  
answer is to change your mpdboot command to: "mpdboot --totalnum=4 -- 
ncpus=6 -f nodefile".

The slightly longer answer is that mpdboot doesn't respect the number  
of cpus set in the hostfile for the node on which mpdboot is run, it  
requires a --ncpus=X option.

-Dave

On Nov 4, 2009, at 10:32 AM, Tim Kroeger wrote:

> On Wed, 4 Nov 2009, Tim Kroeger wrote:
>
>> Anyway, to examine whether your idea that one of the processes ran  
>> out of memory is correct, I'll meanwhile run the application with  
>> less processes per node (that is more nodes with the same number of  
>> total processes).
>
> I now face another problem: I do
>
> mpdboot --totalnum=4 -f nodefile
>
> where nodefile looks like this:
>
> node092:6
> node094:6
> node095:6
> node096:6
>
> and then I do
>
> mpirun -n 24 ./my-application
>
> What happens is that I get 3 processes on node092 and 7 processes  
> each on the other nodes.  What happened there?
>
> Best Regards,
>
> Tim
>
> -- 
> Dr. Tim Kroeger
> tim.kroeger at mevis.fraunhofer.de            Phone +49-421-218-7710
> tim.kroeger at cevis.uni-bremen.de            Fax   +49-421-218-4236
>
> Fraunhofer MEVIS, Institute for Medical Image Computing
> Universitaetsallee 29, 28359 Bremen, Germany
>



More information about the mpich-discuss mailing list