general question on speed using quad core Xeons

Tue Apr 15 19:34:08 CDT 2008

On Tue, Apr 15, 2008 at 7:19 PM, Randall Mackie <rlmackie862 at gmail.com> wrote:
> I'm running my PETSc code on a cluster of quad core Xeon's connected
>  by Infiniband. I hadn't much worried about the performance, because
>  everything seemed to be working quite well, but today I was actually
>  comparing performance (wall clock time) for the same problem, but on
>  different combinations of CPUS.
>
>  I find that my PETSc code is quite scalable until I start to use
>  multiple cores/cpu.
>
>  For example, the run time doesn't improve by going from 1 core/cpu
>  to 4 cores/cpu, and I find this to be very strange, especially since
>  looking at top or Ganglia, all 4 cpus on each node are running at 100%
> almost
>  all of the time. I would have thought if the cpus were going all out,
>  that I would still be getting much more scalable results.

Those a really coarse measures. There is absolutely no way that all cores
are going 100%. Its easy to show by hand. Take the peak flop rate and
this gives you the bandwidth needed to sustain that computation (if
everything is perfect, like axpy). You will find that the chip bandwidth
is far below this. A nice analysis is in

  http://www.mcs.anl.gov/~kaushik/Papers/pcfd99_gkks.pdf

>  We are using mvapich-0.9.9 with infiniband. So, I don't know if
>  this is a cluster/Xeon issue, or something else.

This is actually mathematics! How satisfying. The only way to improve
this is to change the data structure (e.g. use blocks) or change the
algorithm (e.g. use spectral elements and unassembled structures)

  Matt

>  Anybody with experience on this?
>
>  Thanks, Randy M.
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener