general question on speed using quad core Xeons

Wed Apr 16 09:13:26 CDT 2008

Thanks Barry - very informative, and gave me a chuckle :-)

Randy


Barry Smith wrote:
> 
>    Randy,
> 
>     Please see 
> http://www-unix.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers
> 
>     Essentially what has happened is that chip hardware designers 
> (Intel, IBM, AMD) hit a wall
> on how high they can make their clock speed. They then needed some other 
> way to try to
> increase the "performance" of their chips; since they could continue to 
> make smaller circuits
> they came up on putting multiple cores on a single chip, then they can 
> "double" or "quad" the
> claimed performance very easily. Unfortunately the whole multicore 
> "solution" is really
> half-assed since it is difficult to effectively use all the cores, 
> especially since the memory
> bandwidth did not improve as fast.
> 
> Now when a company comes out with a half-assed product, do they say, 
> "this is a half-assed product"?
> Did Microsoft say Vista was "half-assed". No, they emphasis the positive 
> parts of their product and
> hide the limitations.  This has been true since Grog made his first 
> stone wheel in front of this cave.
> So Intel mislead everyone on how great multi-cores are.
> 
> When you buy earlier dual or quad products you are NOT gettting a 
> parallel system (even
> though it has 2 cores) because the memory is NOT parallel.
> 
> Things are getting a bit better, Intel now has systems with higher 
> memory bandwidth.
> The thing you have to look for is MEMORY BANDWDITH PER CORE, the higher 
> that is the
> better performance you get.
> 
> Note this doesn't have anything to do with PETSc, any sparse solver has 
> the exact same
> issues.
> 
>    Barry
> 
> 
> 
> On Apr 15, 2008, at 7:19 PM, Randall Mackie wrote:
>> I'm running my PETSc code on a cluster of quad core Xeon's connected
>> by Infiniband. I hadn't much worried about the performance, because
>> everything seemed to be working quite well, but today I was actually
>> comparing performance (wall clock time) for the same problem, but on
>> different combinations of CPUS.
>>
>> I find that my PETSc code is quite scalable until I start to use
>> multiple cores/cpu.
>>
>> For example, the run time doesn't improve by going from 1 core/cpu
>> to 4 cores/cpu, and I find this to be very strange, especially since
>> looking at top or Ganglia, all 4 cpus on each node are running at 100% 
>> almost
>> all of the time. I would have thought if the cpus were going all out,
>> that I would still be getting much more scalable results.
>>
>> We are using mvapich-0.9.9 with infiniband. So, I don't know if
>> this is a cluster/Xeon issue, or something else.
>>
>> Anybody with experience on this?
>>
>> Thanks, Randy M.
>>
>