[mpich-discuss] Scalability of 'Intel Core 2 duo' cluster

Tony Ladd tladd at che.ufl.edu
Fri Mar 28 09:21:14 CDT 2008


There are many possible reasons for poor scaling with gigabit ethernet. 
The two most significant issues that I have found are
1) Poorly performing switches
2) Inadequate algorithms for collectives
These issues are discussed in reference to our Beowulf cluster at 
http://ladd.che.ufl.edu/research/beoclus/beoclus.htm. Bottom line is we 
have applications such as VASP and GROMACS that scale quite comparably 
on our GigE cluster to an Infiniband HPC system up to about 100 
proccessors. With TCP the Infiniband wins out but with GAMMA the GigE 
cluster can outperform the HPC system.

Typical Edge switches (even high end ones costing ~$5K+) are 
oversubscribed on the backplane. This can lead to packet loss with a 
very large drop in performance. I found factors of 100 difference in 
throughput depending on the layout of the nodes on the switch. Details 
are on on our website-its a lot of stuff, but its not a simple story.

The second big issue is collective performance. Easy to check by trying 
applications with only point to point messages. The best collectives are 
in MPICH in general-it has the most advanced algorithms. But the 
Alltoall and similar routines suck. I have mentioned this to Rajeev and 
it is apparently being looked into. The problem is MPICH posts all the 
receives at once and then sends messages essentially randomly. This 
leads to oversubscription of the NICS and packet loss. For reasons I 
dont understand it is much more problematic on multicore nodes than 
single core. I am pretty sure a properly scheduled alltoall would solve 
this problem. I tested a naive version under MPICH1 (same algorithm) and 
it performed much better. I can see that there is a yet better algorithm 
based on tournament scheduling, but I have not had time to code a demo 
yet. Maybe over the summer.

Tony

-- 
Tony Ladd

Chemical Engineering Department
University of Florida
Gainesville, Florida 32611-6005
USA

Email: tladd-"(AT)"-che.ufl.edu
Web    http://ladd.che.ufl.edu

Tel:   (352)-392-6509
FAX:   (352)-392-9514




More information about the mpich-discuss mailing list