general question on speed using quad core Xeons
Randall Mackie
rlmackie862 at gmail.com
Wed Apr 16 09:13:26 CDT 2008
Thanks Barry - very informative, and gave me a chuckle :-)
Randy
Barry Smith wrote:
>
> Randy,
>
> Please see
> http://www-unix.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers
>
> Essentially what has happened is that chip hardware designers
> (Intel, IBM, AMD) hit a wall
> on how high they can make their clock speed. They then needed some other
> way to try to
> increase the "performance" of their chips; since they could continue to
> make smaller circuits
> they came up on putting multiple cores on a single chip, then they can
> "double" or "quad" the
> claimed performance very easily. Unfortunately the whole multicore
> "solution" is really
> half-assed since it is difficult to effectively use all the cores,
> especially since the memory
> bandwidth did not improve as fast.
>
> Now when a company comes out with a half-assed product, do they say,
> "this is a half-assed product"?
> Did Microsoft say Vista was "half-assed". No, they emphasis the positive
> parts of their product and
> hide the limitations. This has been true since Grog made his first
> stone wheel in front of this cave.
> So Intel mislead everyone on how great multi-cores are.
>
> When you buy earlier dual or quad products you are NOT gettting a
> parallel system (even
> though it has 2 cores) because the memory is NOT parallel.
>
> Things are getting a bit better, Intel now has systems with higher
> memory bandwidth.
> The thing you have to look for is MEMORY BANDWDITH PER CORE, the higher
> that is the
> better performance you get.
>
> Note this doesn't have anything to do with PETSc, any sparse solver has
> the exact same
> issues.
>
> Barry
>
>
>
> On Apr 15, 2008, at 7:19 PM, Randall Mackie wrote:
>> I'm running my PETSc code on a cluster of quad core Xeon's connected
>> by Infiniband. I hadn't much worried about the performance, because
>> everything seemed to be working quite well, but today I was actually
>> comparing performance (wall clock time) for the same problem, but on
>> different combinations of CPUS.
>>
>> I find that my PETSc code is quite scalable until I start to use
>> multiple cores/cpu.
>>
>> For example, the run time doesn't improve by going from 1 core/cpu
>> to 4 cores/cpu, and I find this to be very strange, especially since
>> looking at top or Ganglia, all 4 cpus on each node are running at 100%
>> almost
>> all of the time. I would have thought if the cpus were going all out,
>> that I would still be getting much more scalable results.
>>
>> We are using mvapich-0.9.9 with infiniband. So, I don't know if
>> this is a cluster/Xeon issue, or something else.
>>
>> Anybody with experience on this?
>>
>> Thanks, Randy M.
>>
>
More information about the petsc-users
mailing list