<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr">On Mon, Dec 24, 2018 at 4:56 PM Jed Brown <<a href="mailto:jed@jedbrown.org">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Mark Adams via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> writes:<br>

<br>

> Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in<br>

> attached, NB, this is code that I wrote in grad school). It is memory<br>

> efficient and simple, just four nested loops i,j,I,J: C(I,J) =<br>

> P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I am<br>

> getting from my bone modeling colleagues, that use this old code on<br>

> Stampede now, the times look reasonable compared to GAMG. This is optimized<br>

> for elasticity, where I unroll loops (so it is really six nested loops).<br>

<br>

Is the A above meant to include some ghosted rows?<br></blockquote><div><br></div><div>You could but I was thinking of having i in the outer loop. In C(I,J) = P(i,I)*A(i,j)*P(j,J), the iteration over 'i' need only be the local rows of A (and the left term P). </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> In thinking about this now, I think you want to make a local copy of P with<br>

> rows (j) for every column in A that you have locally, then transpose this<br>

> local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a<br>

> special tree data structure but a matrix is simpler.)<br>

<br>

Why transpose for P(j,J)?<br></blockquote><div><br></div><div>(premature) optimization. I was thinking 'j' being in the inner loop and doing sparse inner product, but now that I think about it there are other options.</div></div></div>