<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Dec 30, 2020 at 8:12 PM Barry Smith <<a href="mailto:bsmith@petsc.dev">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><br><div><br><blockquote type="cite"><div>On Dec 30, 2020, at 6:45 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:</div><br><div><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Dec 30, 2020 at 7:12 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
If you are using direct solvers on each block on each GPU (several matrices on each GPU) you could pull apart, for example, MatSolve_SeqAIJCUSPARSE()<br>
and launch each of the matrix solves on a separate stream. </blockquote><div><br></div><div>Yes, that is what I want. The first step is to figure out the best way to get the blocks from Plex/Forest and get an exact solver working on the CPU with ASM.</div></div></div></div></blockquote><div><br></div> I don't think you want ASM or at most you it inside PCFIELDSPLIT. It is splits job to pull out fields, not ASM's job (that pulls out geometrically connected regions).</div></div></blockquote><div><br></div><div>I was thinking about getting the IS for each field and creating an ASM for that, but FieldSplit can do that I guess.</div><div><br></div><div>How would I do that?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><br><blockquote type="cite"><div><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">You could use a MatSolveBegin/MatSolveEnd style or as Jed may prefer a Wait() model. Maybe a couple hours coding to produce a prototype MatSolveBegin/MatSolveEnd from MatSolve_SeqAIJCUSPARSE.<br>
<br>
Note pulling apart a non-coupled single MatAIJ that contains all the matrices would be hugely expensive. Better to build each matrix already separate or use MatNest with only diagonal matrices.<br></blockquote><div><br></div><div>The problem is that it runs in TS that uses DM, so I can't reorder the matrix without breaking TS. I mimic what DM does now. </div></div></div></div></blockquote><div><br></div> DM decides the ordering, not TS. </div></div></blockquote><div><br></div><div>Yes,</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div>You could slip a MatSetLocalToGlobal mapping in that uninterlaces the variables to get your DM to build an uninterlaced matrix. </div></div></blockquote><div><br></div><div>That sounds fragile but Matt would be the one to ask. I realized that I also need DM for doing FE integrals, for diagnostics, during the solve phase so I can't throw it away, but replacing DM[Forest]'s matrix ordering with a field major ordering and then assembling into a MatNest directly, and then having LU work directly on each block sounds promising.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div>For the vector it is easier but again you will need to uninterlace it. Back in the classic Cray vector machine days interlacing was bad, with Intel CPUs it became good, now both approaches should be supported in software.</div><div><br></div><div> All the DMs should support both interlaced and noninterlaced algebraic objects.</div></div></blockquote><div><br></div><div>That would be nice, but let's see. Plexes "interlaced" ordering is not that simple. It is interlaced up to Q2 elements, then a Q3 it mixes interlaced and non-interlaced. :o</div><div><br></div><div>Apparently FE topology people like to put data on topological entities, which means that Q1 puts the data on the vertices, Q2 + on the edges and cell centers (2D), but Q3 has no more topological objects so it put two "vertices" on edges and 4 in the cell center and these dofs are not interlaced. I have spent a fair amount of time in the last few months reverse engineering DM :) </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><br><blockquote type="cite"><div><div dir="ltr"><div class="gmail_quote"><div><br></div><div>I run once on the CPU to get the metadata for GPU assembly from DMForest. Maybe I should just get all the metadata that I need and throw the DM away after the setup solve and run TS without a DM...</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Barry<br>
<br>
<br>
> On Dec 30, 2020, at 5:46 PM, Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>> wrote:<br>
> <br>
> Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> writes:<br>
> <br>
>> I see that ASM has a DM and can get subdomains from it. I have a DMForest<br>
>> and I would like an ASM that has a subdomain for each field. How might I go<br>
>> about doing this? (the fields are not coupled in the matrix so this would<br>
>> give a block diagonal matrix, and thus exact with LU sub solvers.<br>
> <br>
> The fields are already not coupled or you want to filter the matrix and give back a single matrix with coupling removed?<br>
> <br>
> You can use Fieldsplit to get the math of field-based block Jacobi (or ASM, but overlap with fields tends to be expensive). Neither FieldSplit or ASM can run the (additive) solves concurrently (and most libraries would need something to drive the threads).<br>
> <br>
>> I am then going to want to get these separate solves to be run in parallel<br>
>> on a GPU (I'm talking with Sherry about getting SuperLU working on these<br>
>> small problems). In looking at PCApply_ASM it looks like this will take<br>
>> some thought. KSPSolve would need to be non-blocking, etc., or a new apply<br>
>> op might be needed.<br>
>> <br>
>> Thanks,<br>
>> Mark<br>
<br>
</blockquote></div></div>
</div></blockquote></div><br></div></blockquote></div></div>