[petsc-dev] ASM for each field solve on GPUs

Wed Dec 30 19:36:28 CST 2020

On Wed, Dec 30, 2020 at 8:12 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>
> On Dec 30, 2020, at 6:45 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
>
> On Wed, Dec 30, 2020 at 7:12 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>   If you are using direct solvers on each block on each GPU (several
>> matrices on each GPU) you could pull apart, for example,
>> MatSolve_SeqAIJCUSPARSE()
>> and launch each of the matrix solves on a separate stream.
>
>
> Yes, that is what I want. The first step is to figure out the best way to
> get the blocks from Plex/Forest and get an exact solver working on the CPU
> with ASM.
>
>
>   I don't think you want ASM or at most you it inside PCFIELDSPLIT. It is
> splits job to pull out fields, not ASM's job (that pulls out geometrically
> connected regions).
>

I was thinking about getting the IS for each field and creating an ASM for
that, but FieldSplit can do that I guess.

How would I do that?

>
>
>
>> You could use a MatSolveBegin/MatSolveEnd style or as Jed may prefer a
>> Wait() model. Maybe a couple hours coding to produce a prototype
>> MatSolveBegin/MatSolveEnd from MatSolve_SeqAIJCUSPARSE.
>>
>>   Note pulling apart a non-coupled single MatAIJ that contains all the
>> matrices would be hugely expensive. Better to build each matrix already
>> separate or use MatNest with only diagonal matrices.
>>
>
> The problem is that it runs in TS that uses DM, so I can't reorder the
> matrix without breaking TS. I mimic what DM does now.
>
>
>   DM decides the ordering, not TS.
>

Yes,

> You could slip a MatSetLocalToGlobal mapping in that uninterlaces the
> variables to get your DM to build an uninterlaced matrix.
>

That sounds fragile but Matt would be the one to ask. I realized that I
also need DM for doing FE integrals, for diagnostics, during the solve
phase so I can't throw it away, but replacing DM[Forest]'s matrix ordering
with a field major ordering and then assembling into a MatNest directly,
and then having LU work directly on each block sounds promising.

> For the vector it is easier but again you will need to uninterlace it.
> Back in the classic Cray vector machine days interlacing was bad, with
> Intel CPUs it became good, now both approaches should be supported in
> software.
>
>   All the DMs should support both interlaced and noninterlaced algebraic
> objects.
>

That would be nice, but let's see. Plexes "interlaced" ordering is not that
simple. It is interlaced up to Q2 elements, then a Q3 it mixes interlaced
and non-interlaced. :o

Apparently FE topology people like to put data on topological entities,
which means that Q1 puts the data on the vertices, Q2 + on the edges and
cell centers (2D), but Q3 has no more topological objects so it put two
"vertices" on edges and 4 in the cell center and these dofs are not
interlaced. I have spent a fair amount of time in the last few months
reverse engineering DM :)

>
>
> I run once on the CPU to get the metadata for GPU assembly from DMForest.
> Maybe I should just get all the metadata that I need and throw the DM away
> after the setup solve and run TS without a DM...
>
>
>>
>>   Barry
>>
>>
>> > On Dec 30, 2020, at 5:46 PM, Jed Brown <jed at jedbrown.org> wrote:
>> >
>> > Mark Adams <mfadams at lbl.gov> writes:
>> >
>> >> I see that ASM has a DM and can get subdomains from it. I have a
>> DMForest
>> >> and I would like an ASM that has a subdomain for each field. How might
>> I go
>> >> about doing this? (the fields are not coupled in the matrix so this
>> would
>> >> give a block diagonal matrix, and thus exact with LU sub solvers.
>> >
>> > The fields are already not coupled or you want to filter the matrix and
>> give back a single matrix with coupling removed?
>> >
>> > You can use Fieldsplit to get the math of field-based block Jacobi (or
>> ASM, but overlap with fields tends to be expensive). Neither FieldSplit or
>> ASM can run the (additive) solves concurrently (and most libraries would
>> need something to drive the threads).
>> >
>> >> I am then going to want to get these separate solves to be run in
>> parallel
>> >> on a GPU (I'm talking with Sherry about getting SuperLU working on
>> these
>> >> small problems). In looking at PCApply_ASM it looks like this will take
>> >> some thought. KSPSolve would need to be non-blocking, etc., or a new
>> apply
>> >> op might be needed.
>> >>
>> >> Thanks,
>> >> Mark
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20201230/9b89c680/attachment.html>