Performance Issues on ccNuma-System

Matthew Knepley knepley at
Mon Oct 13 08:26:58 CDT 2008

On Mon, Oct 13, 2008 at 7:12 AM, Christoph Statz
<christoph.statz at> wrote:
> Dear PETSc-users,
> i'm trying to work with PETSc on a ccNuma-system, where i am confronted with
> severe performance problems.
> Is there anyone using PETSc on e.g. a SGI Altix System?
> Which are the best kernels to use on cache coherent systems?
> The fortran kernels produces many cache misses (in functions like fsolve and
> fmatmul) slowing down a 3GFLOP/s machine to about 200MFLOP/s .
> Has anyone any advice to increase speed on ccNuma-system?

1) With any performance question, please send the output of -log_summary

2) I think it is unlikely that cache misses are responsible for this
performance. It is
    much more likely that bandwidth limitations are responsible.
Please see the paper
    by Kaushik and Gropp which models sparse matvec performance (on
Dinesh's website).

3) You would see better performance using a block method. Sparse matvec without
    blocks will never see good percentages of peak (ditto for backsolve).


> Sincerly,
> Christoph Statz
> --
> Christoph Statz
> Institut für Nachrichtentechnik
> Technische Universität Dresden
> 01062 Dresden
> Email:  christoph.statz at
> Phone: +49 351 463 32287
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

More information about the petsc-users mailing list