[mpich-discuss] Scalability of Intel quad core (Harpertown) cluster
chong tan
chong_guan_tan at yahoo.com
Fri Mar 28 14:08:34 CDT 2008
I have been watching this, and the other, scalability thread. Here is my observation
on the dual/quad core boxes after spending the last few years playing with these boxes:
- your SW is the main factor in scalability.
- Scalability should be per physical CPU, not core.
- keeping all cores busy is likely to bring down the performance/core.
- the latest quad core CPUs are notoriusly bad in throughput if you use all/most the cores.
-- shared cache thrashing ???
- hyperthreading can reduce throughput
- memory bandwidth is likely the limiting factors on multi-core boxes, even on SUN's Niagara.
- you need to tune your algorithm according to the HW you have. You can't rely solely on MPICH to deliver the scalability.
- If you really want performance, use the simplest MPICH routines. I have reduced my MPICH calls to fixed point-point comms, except at entry where I have the only Barrier call. I don't even use ISend/IRecv (they are extremely bad for me for reasons I can't confirm).
tan
Pavan Balaji <balaji at mcs.anl.gov> wrote:
Hee,
Can you send us this code? I'm interested in seeing what is causing the
communication time to go up so much.
-- Pavan
On 03/28/2008 12:21 PM, Hee Il Kim wrote:
> Thanks all,
>
> The Cactus code has a good scalability, especially with the latest
> version of Carpet it shows a good scalabitly over 5000 cpu cores(?). I
> tested both BSSN+PUGH and Whisky+Carpet benchmarks. Not an expert, I'm
> depending on the timing info shown by Cactus rather than use profiling
> tools introduced by Pavan. The profiling info says that most of
> communication time was taken to enforce boundary conditions. The total
> wall clock time (including communication time) increases from ~ 700sec
> (1CPU) to ~1500 sec (64CPU) whereas computation time only increases ~600
> to ~800 sec. Here the problem sizes were take to be proportional to the
> number of cpus. So now I'm looking for the ways to reduce the
> communication time.
>
> I'm using Harpertown 5420 (2.5GHz). What makes me disappointed more is
> this newest Xeon cpu cluster is not that superior to my old Pentium D
> 930 cluster (3.0GHz) which having 4 nodes (8 cores) . I tested various
> combinations of (node# x cpu#) and the results somewhat depends on the
> combinations.
>
> Hybrid run using "-openmp" option of Intel compilers made things worse
> and had broke loadbalancing. Also the optimization options (even -O2)
> made runs slower but did not break the load balancing.
>
> I checked the bandwidth behavior mentioned by Elvedin. Could I change or
> setup the message size and frequency in a runtime level or any other steps?
>
> I have no idea how to improve the scalability and how serious it is.
> Anyway it's a bit unsatisfatory.at the moment
> and I hope I can find a better way from here. I appreciate all your
> kind comments and suggestions.
>
> Regards
>
> Kim, Hee Il
>
>
>
> 2008/3/28, Brian Dobbins >:
>
> Hi,
>
> I don't use the Cactus code myself, but from what little I /do/
> know of it, this might not be unexpected. For starters, what do you
> mean by 'bad scalability'? I believe most (all?) benchmark cases
> demonstrate what is called 'weak scaling' - that is, the problem
> size increases along with the number of processors. So, running on
> 1 processor gives you a wall-clock time of /n/ seconds, and running
> on 2 processors will probably give you a problem size of /n/+> small number>. That small number is the communication time of your
> code. Thus, running on 80 cores /will/ be slower than running on 1,
> but it'll let you run a much larger system.
>
> (To clarify, unless you're specifically configuring a constant
> problem size, you won't reduce your time to solution by increasing
> your processors.)
>
> The next thing to consider is which benchmark you're using, as
> some of them are more scalable than others. You're likely to get
> different results when looking at the 'Whiskey Carpet' benchmark vs.
> the 'BSSN_PUGH' one. You might wish to take a look at the benchmark
> database at the Cactus website, and there are some PDF files with
> more information, too, including a master's thesis on benchmark
> performance.
>
> Finally, a few other slightly more technical things to consider are:
>
> (1) What kind of Harpertowns are you using? Looking at the 5450 vs.
> the 5472 (both 3.0 Ghz chips), the latter has more memory bandwidth,
> and may scale better since the code does appear to make use of it.
> Using the CPU2006fp_rate CactusADM benchmarks as a first
> approximation to parallel performance, the SPEC website shows that a
> 5450 gets a score of 101 and 73.1 when going from 1 -> 8 cores (and
> larger is better here - this is throughput, not wall-clock time),
> and the 5472 goes from 112 -> 84.8. Why does this matter? Well,
> you'll probably get different results running an 8-core job when
> running that as 8x1, 4x2, or 1x8 (cores x nodes). This will impact
> your benchmark results somewhat.
>
> (2) I think the code supports running with MPI and OpenMP... I
> don't know if there will be any difference in performance if you
> choose to run 1 MPI process per node with 8 OpenMP threads vs.
> simply using 8 MPI processes, but it might be worth looking into.
>
> (3) Again, I have no first-hand knowledge of the code's performance
> under different interconnects, but it /does/ seem likely to make a
> difference... chances are if you asked on the Cactus forums, there
> might be someone with first-hand experience with this who could give
> you some more specific information.
>
> Hope that helps, and if any of it isn't clear, I'll be happy to
> try to clarify. Good luck!
>
> Cheers,
> - Brian
>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080328/9caefe12/attachment.htm>
More information about the mpich-discuss
mailing list