[petsc-users] GAMG scaling
Mark Adams
mfadams at lbl.gov
Mon Dec 24 22:57:58 CST 2018
On Mon, Dec 24, 2018 at 4:56 PM Jed Brown <jed at jedbrown.org> wrote:
> Mark Adams via petsc-users <petsc-users at mcs.anl.gov> writes:
>
> > Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in
> > attached, NB, this is code that I wrote in grad school). It is memory
> > efficient and simple, just four nested loops i,j,I,J: C(I,J) =
> > P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I
> am
> > getting from my bone modeling colleagues, that use this old code on
> > Stampede now, the times look reasonable compared to GAMG. This is
> optimized
> > for elasticity, where I unroll loops (so it is really six nested loops).
>
> Is the A above meant to include some ghosted rows?
>
You could but I was thinking of having i in the outer loop. In C(I,J) =
P(i,I)*A(i,j)*P(j,J), the iteration over 'i' need only be the local rows of
A (and the left term P).
>
> > In thinking about this now, I think you want to make a local copy of P
> with
> > rows (j) for every column in A that you have locally, then transpose this
> > local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
> > special tree data structure but a matrix is simpler.)
>
> Why transpose for P(j,J)?
>
(premature) optimization. I was thinking 'j' being in the inner loop and
doing sparse inner product, but now that I think about it there are other
options.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181224/22ce9720/attachment.html>
More information about the petsc-users
mailing list