[mpich-discuss] Scalability of 'Intel Core 2 duo' cluster

Fri Mar 28 10:57:34 CDT 2008

Probably a daft question but why hasn't anyone suggested compiling mpich
with --comm-shared? Depending on the code, can't this drastically reduce
network usage for processes on the same node and give better scalability?

Tiago

Tony Ladd wrote:
> There are many possible reasons for poor scaling with gigabit
> ethernet. The two most significant issues that I have found are
> 1) Poorly performing switches
> 2) Inadequate algorithms for collectives
> These issues are discussed in reference to our Beowulf cluster at
> http://ladd.che.ufl.edu/research/beoclus/beoclus.htm. Bottom line is
> we have applications such as VASP and GROMACS that scale quite
> comparably on our GigE cluster to an Infiniband HPC system up to about
> 100 proccessors. With TCP the Infiniband wins out but with GAMMA the
> GigE cluster can outperform the HPC system.
>
> Typical Edge switches (even high end ones costing ~$5K+) are
> oversubscribed on the backplane. This can lead to packet loss with a
> very large drop in performance. I found factors of 100 difference in
> throughput depending on the layout of the nodes on the switch. Details
> are on on our website-its a lot of stuff, but its not a simple story.
>
> The second big issue is collective performance. Easy to check by
> trying applications with only point to point messages. The best
> collectives are in MPICH in general-it has the most advanced
> algorithms. But the Alltoall and similar routines suck. I have
> mentioned this to Rajeev and it is apparently being looked into. The
> problem is MPICH posts all the receives at once and then sends
> messages essentially randomly. This leads to oversubscription of the
> NICS and packet loss. For reasons I dont understand it is much more
> problematic on multicore nodes than single core. I am pretty sure a
> properly scheduled alltoall would solve this problem. I tested a naive
> version under MPICH1 (same algorithm) and it performed much better. I
> can see that there is a yet better algorithm based on tournament
> scheduling, but I have not had time to code a demo yet. Maybe over the
> summer.
>
> Tony
>

-- 
--
Tiago A. M. Silva
Postdoc Associate Researcher
College Of Oceanic and Atmospheric Sciences
Oregon State University
Burt 2-Room 426A
104 COAS Administration Building
Corvallis OR 97331-5503
USA
Phone: +1 541 737 5283
Fax:   +1 541 737 2064