[mpich-discuss] Scalability of 'Intel Core 2 duo' cluster
Tony Ladd
tladd at che.ufl.edu
Fri Mar 28 09:21:14 CDT 2008
There are many possible reasons for poor scaling with gigabit ethernet.
The two most significant issues that I have found are
1) Poorly performing switches
2) Inadequate algorithms for collectives
These issues are discussed in reference to our Beowulf cluster at
http://ladd.che.ufl.edu/research/beoclus/beoclus.htm. Bottom line is we
have applications such as VASP and GROMACS that scale quite comparably
on our GigE cluster to an Infiniband HPC system up to about 100
proccessors. With TCP the Infiniband wins out but with GAMMA the GigE
cluster can outperform the HPC system.
Typical Edge switches (even high end ones costing ~$5K+) are
oversubscribed on the backplane. This can lead to packet loss with a
very large drop in performance. I found factors of 100 difference in
throughput depending on the layout of the nodes on the switch. Details
are on on our website-its a lot of stuff, but its not a simple story.
The second big issue is collective performance. Easy to check by trying
applications with only point to point messages. The best collectives are
in MPICH in general-it has the most advanced algorithms. But the
Alltoall and similar routines suck. I have mentioned this to Rajeev and
it is apparently being looked into. The problem is MPICH posts all the
receives at once and then sends messages essentially randomly. This
leads to oversubscription of the NICS and packet loss. For reasons I
dont understand it is much more problematic on multicore nodes than
single core. I am pretty sure a properly scheduled alltoall would solve
this problem. I tested a naive version under MPICH1 (same algorithm) and
it performed much better. I can see that there is a yet better algorithm
based on tournament scheduling, but I have not had time to code a demo
yet. Maybe over the summer.
Tony
--
Tony Ladd
Chemical Engineering Department
University of Florida
Gainesville, Florida 32611-6005
USA
Email: tladd-"(AT)"-che.ufl.edu
Web http://ladd.che.ufl.edu
Tel: (352)-392-6509
FAX: (352)-392-9514
More information about the mpich-discuss
mailing list