general question on speed using quad core Xeons

Wed Apr 16 07:14:37 CDT 2008

    Randy,

     Please see http://www-unix.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers

     Essentially what has happened is that chip hardware designers  
(Intel, IBM, AMD) hit a wall
on how high they can make their clock speed. They then needed some  
other way to try to
increase the "performance" of their chips; since they could continue  
to make smaller circuits
they came up on putting multiple cores on a single chip, then they can  
"double" or "quad" the
claimed performance very easily. Unfortunately the whole multicore  
"solution" is really
half-assed since it is difficult to effectively use all the cores,  
especially since the memory
bandwidth did not improve as fast.

Now when a company comes out with a half-assed product, do they say,  
"this is a half-assed product"?
Did Microsoft say Vista was "half-assed". No, they emphasis the  
positive parts of their product and
hide the limitations.  This has been true since Grog made his first  
stone wheel in front of this cave.
So Intel mislead everyone on how great multi-cores are.

When you buy earlier dual or quad products you are NOT gettting a  
parallel system (even
though it has 2 cores) because the memory is NOT parallel.

Things are getting a bit better, Intel now has systems with higher  
memory bandwidth.
The thing you have to look for is MEMORY BANDWDITH PER CORE, the  
higher that is the
better performance you get.

Note this doesn't have anything to do with PETSc, any sparse solver  
has the exact same
issues.

    Barry

On Apr 15, 2008, at 7:19 PM, Randall Mackie wrote:
> I'm running my PETSc code on a cluster of quad core Xeon's connected
> by Infiniband. I hadn't much worried about the performance, because
> everything seemed to be working quite well, but today I was actually
> comparing performance (wall clock time) for the same problem, but on
> different combinations of CPUS.
>
> I find that my PETSc code is quite scalable until I start to use
> multiple cores/cpu.
>
> For example, the run time doesn't improve by going from 1 core/cpu
> to 4 cores/cpu, and I find this to be very strange, especially since
> looking at top or Ganglia, all 4 cpus on each node are running at  
> 100% almost
> all of the time. I would have thought if the cpus were going all out,
> that I would still be getting much more scalable results.
>
> We are using mvapich-0.9.9 with infiniband. So, I don't know if
> this is a cluster/Xeon issue, or something else.
>
> Anybody with experience on this?
>
> Thanks, Randy M.
>