general question on speed using quad core Xeons
Matthew Knepley
knepley at gmail.com
Tue Apr 15 21:34:33 CDT 2008
On Tue, Apr 15, 2008 at 9:03 PM, Randall Mackie <rlmackie862 at gmail.com> wrote:
> Okay, but if I'm stuck with a big 3D finite difference code, written in
> PETSc
> using Distributed Arrays, with 3 dof per node, then you're saying there is
> really nothing I can do, except using blocking, to improve things on quad
> core cpus? They talk about blocking using BAIJ format, and so is this the
Yes, just about.
> same thing as creating MPIBAIJ matrices in PETSc? And is creating MPIBAIJ
Yes.
> matrices in PETSc going to make a substantial difference in the speed?
That is the hope. You can just give MPIBAIJ as the argument to DAGetMatrix().
> I'm sorry if I'm being dense, I'm just trying to understand if there is some
> simple way I can utilize those extra cores on each cpu easily, and since
> I'm not a computer scientist, some of these concepts are difficult.
I really believe extra cores are currently a con for scientific computing. There
are real mathematical barriers to their effective use.
Matt
> Thanks, Randy
> Matthew Knepley wrote:
>
> > On Tue, Apr 15, 2008 at 7:41 PM, Randall Mackie <rlmackie862 at gmail.com>
> wrote:
> >
> > > Then what's the point of having 4 and 8 cores per cpu for parallel
> > > computations then? I mean, I think I've done all I can to make
> > > my code as efficient as possible.
> > >
> >
> > I really advise reading the paper. It explicitly treats the case of
> > blocking, and uses
> > a simple model to demonstrate all the points I made.
> >
> > With a single, scalar sparse matrix, there is definitely no point at
> > all of having
> > multiple cores. However, this will speed up things like finite element
> > integration.
> > So, for instance, making this integration dominate your cost (like
> > spectral element
> > codes do) will show nice speedup. Ulrich Ruede has a great talk about this
> on
> > his website.
> >
> > Matt
> >
> >
> > > I'm not quite sure I understand your comment about using blocks
> > > or unassembled structures.
> > >
> > >
> > > Randy
> > >
> > >
> > >
> > >
> > > Matthew Knepley wrote:
> > >
> > >
> > > > On Tue, Apr 15, 2008 at 7:19 PM, Randall Mackie
> <rlmackie862 at gmail.com>
> > > >
> > > wrote:
> > >
> > > >
> > > > > I'm running my PETSc code on a cluster of quad core Xeon's connected
> > > > > by Infiniband. I hadn't much worried about the performance, because
> > > > > everything seemed to be working quite well, but today I was
> actually
> > > > > comparing performance (wall clock time) for the same problem, but
> on
> > > > > different combinations of CPUS.
> > > > >
> > > > > I find that my PETSc code is quite scalable until I start to use
> > > > > multiple cores/cpu.
> > > > >
> > > > > For example, the run time doesn't improve by going from 1 core/cpu
> > > > > to 4 cores/cpu, and I find this to be very strange, especially
> since
> > > > > looking at top or Ganglia, all 4 cpus on each node are running at
> 100%
> > > > > almost
> > > > > all of the time. I would have thought if the cpus were going all
> out,
> > > > > that I would still be getting much more scalable results.
> > > > >
> > > > >
> > > > Those a really coarse measures. There is absolutely no way that all
> cores
> > > > are going 100%. Its easy to show by hand. Take the peak flop rate and
> > > > this gives you the bandwidth needed to sustain that computation (if
> > > > everything is perfect, like axpy). You will find that the chip
> bandwidth
> > > > is far below this. A nice analysis is in
> > > >
> > > > http://www.mcs.anl.gov/~kaushik/Papers/pcfd99_gkks.pdf
> > > >
> > > >
> > > >
> > > > > We are using mvapich-0.9.9 with infiniband. So, I don't know if
> > > > > this is a cluster/Xeon issue, or something else.
> > > > >
> > > > >
> > > > This is actually mathematics! How satisfying. The only way to improve
> > > > this is to change the data structure (e.g. use blocks) or change the
> > > > algorithm (e.g. use spectral elements and unassembled structures)
> > > >
> > > > Matt
> > > >
> > > >
> > > >
> > > > > Anybody with experience on this?
> > > > >
> > > > > Thanks, Randy M.
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
> >
> >
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener
More information about the petsc-users
mailing list