Related: Mark's parallel Gauss-Seidel involves splitting the off-diagonal update apart into two or more pieces that can operate independently.<br><br><div class="gmail_quote">On Fri, Dec 23, 2011 at 01:01, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Can we enumerate the matrix operations that are known to be non-scalable currently?<div><br></div><div>1. MatPermute() does ISAllGather() on the row indices and requires the user to ISAllGather() the column indices before calling (I consider this to be an interface bug similar to the MatGetSubMatrix() one we fixed before petsc-3.1). The problem is that we have to determine the new row and column or each locally visible row and column. This would be equivalent to having a scalable ISInvertPermutation(), which is not difficult to implement, and a way to retrieve data from off-process (new column indices for the "B" part). The latter can be done with a "VecScatter for integers", or trivially with PetscBG (name subject to change).</div>
<div><br></div><div>2. MatGetSubMatrix(), also called by MatPermute(). For this, it is sufficient to determine where each currently owned entry should go (or be skipped). We could do this by making a parallel int-vector for the entire column space, setting it all to -1, then pushing the "new" column index (location in the IS) over the IS. This gives a parallel vector that is -1 for all column indices we don't want and the non-negative new column index for those that we do want. Now we retrieve from the parallel vector everything we need for the local column space.</div>
<div><br></div><div>3. MatIncreaseOverlap_MPISBAIJ(), I haven't looked carefully.</div><div><br></div><div>I'm writing MatConvert_Nest_AIJ() which needs a similar feature. Mark wants parallel MIS and MOOSE wants parallel coloring, both of which involve moving integer data over the scatter context.</div>
<div><br></div><div>What other implementations are currently non-scalable? Do we need other primitives, or would two-way integer operations over the MPIXAIJ scatter and via an IS be sufficient?</div>
</blockquote></div><br>