[mpich-discuss] Socket closed
Dave Goodell
goodell at mcs.anl.gov
Wed Nov 4 10:35:06 CST 2009
This is the classic user interface problem in mpdboot. The short
answer is to change your mpdboot command to: "mpdboot --totalnum=4 --
ncpus=6 -f nodefile".
The slightly longer answer is that mpdboot doesn't respect the number
of cpus set in the hostfile for the node on which mpdboot is run, it
requires a --ncpus=X option.
-Dave
On Nov 4, 2009, at 10:32 AM, Tim Kroeger wrote:
> On Wed, 4 Nov 2009, Tim Kroeger wrote:
>
>> Anyway, to examine whether your idea that one of the processes ran
>> out of memory is correct, I'll meanwhile run the application with
>> less processes per node (that is more nodes with the same number of
>> total processes).
>
> I now face another problem: I do
>
> mpdboot --totalnum=4 -f nodefile
>
> where nodefile looks like this:
>
> node092:6
> node094:6
> node095:6
> node096:6
>
> and then I do
>
> mpirun -n 24 ./my-application
>
> What happens is that I get 3 processes on node092 and 7 processes
> each on the other nodes. What happened there?
>
> Best Regards,
>
> Tim
>
> --
> Dr. Tim Kroeger
> tim.kroeger at mevis.fraunhofer.de Phone +49-421-218-7710
> tim.kroeger at cevis.uni-bremen.de Fax +49-421-218-4236
>
> Fraunhofer MEVIS, Institute for Medical Image Computing
> Universitaetsallee 29, 28359 Bremen, Germany
>
More information about the mpich-discuss
mailing list