[mpich-discuss] Specifying hosts

Dave Goodell goodell at mcs.anl.gov
Tue May 5 16:21:27 CDT 2009


Hi Scott,

I suspect that this is due to a long-standing, extremely user- 
unfriendly gotcha in mpdboot's usage.  The core counts in the  
machinefile are used for all hosts except for the current host.  So  
you need to also specify a --ncpus=8 in your mpdboot command line.

You were probably getting some of your nodes oversubscribed and the  
node where you ran mpdboot was probably undersubscribed.

You can usually debug these sorts of problems with a little shell  
pipeline like:

% mpiexec -n 4 hostname | sort | uniq -c | sort -n
       4 anlextwls098-007.wl.anl-external

On a cluster larger than just my laptop you would get a list of  
(process_count,hostname) tuples.  For very large systems where you  
expect every host to have exactly the same number of processes you can  
go a bit further:

% mpiexec -n 4 hostname | sort | uniq -c | sort -n | awk '{print $1}'  
| uniq
4

If you see more than one line or if the number displayed on that one  
line is anything other than the number of processes that you desire to  
be run per node, then you have a problem.

Sorry for the very surprising behavior.  I believe that this gotcha is  
not present in our new process manager, Hydra.  If this doesn't solve  
your problem, let us know and we can dig in a bit deeper.

-Dave

On May 5, 2009, at 3:45 PM, Scott Atchley wrote:

> Hi all,
>
> I have run into a behavior that I did not expect. I have been using  
> mpdboot to launch mpds on hosts with eight cores. I specify a  
> machinefile with lines such as:
>
> node1:8
> node2:8
>
> I then call mpdboot with:
>
> $ mpdboot -n <num_hosts> -f machines -m `which mpd`
>
> This works and mpdtrace shows all the hosts.
>
> I then launch a job with:
>
> $ mpiexec -n <num_cores> ...
>
> expecting 8 cores per machine _and_ that the ranks are allocated  
> sequentially by host. That is ranks 0-7 on the first host, 8-15 on  
> the second host, etc.
>
> This does not seem to be the case with 1.0.7 or 1.0.8p1. When I run  
> a large CFD code (Overflow), I see MPICH2 take twice as long as Open- 
> MPI. I finally tracked it down to ranks not being contiguous. If I  
> modify my mpiexec command with:
>
> $ mpiexec -machinefile machines -n <num_cores> ...
>
> where machines is the file I passed to mpdboot, it then runs as fast  
> as Open-MPI.
>
> What logic does mpiexec use to assign ranks to hosts? It seems to be  
> redundant to pass the machinefile to both mpdboot and mpiexec. In  
> Intel MPI, their mpiexec has a -perhost <n> flag that helps  
> accomplish this.
>
> Thanks,
>
> Scott



More information about the mpich-discuss mailing list