[mpich-discuss] Specifying hosts
Dave Goodell
goodell at mcs.anl.gov
Tue May 5 16:21:27 CDT 2009
Hi Scott,
I suspect that this is due to a long-standing, extremely user-
unfriendly gotcha in mpdboot's usage. The core counts in the
machinefile are used for all hosts except for the current host. So
you need to also specify a --ncpus=8 in your mpdboot command line.
You were probably getting some of your nodes oversubscribed and the
node where you ran mpdboot was probably undersubscribed.
You can usually debug these sorts of problems with a little shell
pipeline like:
% mpiexec -n 4 hostname | sort | uniq -c | sort -n
4 anlextwls098-007.wl.anl-external
On a cluster larger than just my laptop you would get a list of
(process_count,hostname) tuples. For very large systems where you
expect every host to have exactly the same number of processes you can
go a bit further:
% mpiexec -n 4 hostname | sort | uniq -c | sort -n | awk '{print $1}'
| uniq
4
If you see more than one line or if the number displayed on that one
line is anything other than the number of processes that you desire to
be run per node, then you have a problem.
Sorry for the very surprising behavior. I believe that this gotcha is
not present in our new process manager, Hydra. If this doesn't solve
your problem, let us know and we can dig in a bit deeper.
-Dave
On May 5, 2009, at 3:45 PM, Scott Atchley wrote:
> Hi all,
>
> I have run into a behavior that I did not expect. I have been using
> mpdboot to launch mpds on hosts with eight cores. I specify a
> machinefile with lines such as:
>
> node1:8
> node2:8
>
> I then call mpdboot with:
>
> $ mpdboot -n <num_hosts> -f machines -m `which mpd`
>
> This works and mpdtrace shows all the hosts.
>
> I then launch a job with:
>
> $ mpiexec -n <num_cores> ...
>
> expecting 8 cores per machine _and_ that the ranks are allocated
> sequentially by host. That is ranks 0-7 on the first host, 8-15 on
> the second host, etc.
>
> This does not seem to be the case with 1.0.7 or 1.0.8p1. When I run
> a large CFD code (Overflow), I see MPICH2 take twice as long as Open-
> MPI. I finally tracked it down to ranks not being contiguous. If I
> modify my mpiexec command with:
>
> $ mpiexec -machinefile machines -n <num_cores> ...
>
> where machines is the file I passed to mpdboot, it then runs as fast
> as Open-MPI.
>
> What logic does mpiexec use to assign ranks to hosts? It seems to be
> redundant to pass the machinefile to both mpdboot and mpiexec. In
> Intel MPI, their mpiexec has a -perhost <n> flag that helps
> accomplish this.
>
> Thanks,
>
> Scott
More information about the mpich-discuss
mailing list