scaling in 4-core machine: unassembled structured

Jed Brown jed at
Fri Nov 20 04:24:29 CST 2009

jarunan at wrote:
>> Getting better multicore performance requires changing the algorithms to
>> better reuse L1 cache.  This means moving away from assembled matrices
>> where possible and of course finding good preconditioners.
> I do not know how to move away from assembled matrix. As I have to
> reset values to the matrix in each iteration, I oblige to call
> MatAssemblyBegin() and MatAssemblyEnd(). Is there other option to
> create and set values the matrix??

A matrix is just a linear operation.  What I mean by not assembling is
that you no longer define that operation in terms of matrix entries.  A
DFT is a famous example of a linear operation that should not be
represented in terms of matrix entries, instead it should be implemented
by FFT.  How to do this is highly dependent on discretization and
physics, good preconditioners almost always require assembled matrices
somewhere, but it's often possible to assemble something cheaper than
the real Jacobian.

>> High-order and fast multipole methods are good for this.
> For example, please?

Spectral element methods implement certain operations by exploiting a
tensor product structure which turns O(p^6) memory O(p^6) flops into
O(p^3) memory O(p^4) flops (with larger constants).  Matt has been doing
some work with FMM.  The key is to choose algorithms that do more work
on the CPU for each value loaded from memory.

I have some slides on the subject,

you could also take a look at slides from this mini-course (we did
high-order methods on the last day)

I can send you more technical references if you would like.

Finally, if you are in Zürich, we can talk about it sometime (I'm at


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <>

More information about the petsc-users mailing list