[petsc-dev] ASM for each field solve on GPUs

Mark Adams mfadams at lbl.gov
Wed Dec 30 18:45:34 CST 2020

On Wed, Dec 30, 2020 at 7:12 PM Barry Smith <bsmith at petsc.dev> wrote:

>   If you are using direct solvers on each block on each GPU (several
> matrices on each GPU) you could pull apart, for example,
> and launch each of the matrix solves on a separate stream.

Yes, that is what I want. The first step is to figure out the best way to
get the blocks from Plex/Forest and get an exact solver working on the CPU
with ASM.

> You could use a MatSolveBegin/MatSolveEnd style or as Jed may prefer a
> Wait() model. Maybe a couple hours coding to produce a prototype
> MatSolveBegin/MatSolveEnd from MatSolve_SeqAIJCUSPARSE.
>   Note pulling apart a non-coupled single MatAIJ that contains all the
> matrices would be hugely expensive. Better to build each matrix already
> separate or use MatNest with only diagonal matrices.

The problem is that it runs in TS that uses DM, so I can't reorder the
matrix without breaking TS. I mimic what DM does now.

I run once on the CPU to get the metadata for GPU assembly from DMForest.
Maybe I should just get all the metadata that I need and throw the DM away
after the setup solve and run TS without a DM...

>   Barry
> > On Dec 30, 2020, at 5:46 PM, Jed Brown <jed at jedbrown.org> wrote:
> >
> > Mark Adams <mfadams at lbl.gov> writes:
> >
> >> I see that ASM has a DM and can get subdomains from it. I have a
> DMForest
> >> and I would like an ASM that has a subdomain for each field. How might
> I go
> >> about doing this? (the fields are not coupled in the matrix so this
> would
> >> give a block diagonal matrix, and thus exact with LU sub solvers.
> >
> > The fields are already not coupled or you want to filter the matrix and
> give back a single matrix with coupling removed?
> >
> > You can use Fieldsplit to get the math of field-based block Jacobi (or
> ASM, but overlap with fields tends to be expensive). Neither FieldSplit or
> ASM can run the (additive) solves concurrently (and most libraries would
> need something to drive the threads).
> >
> >> I am then going to want to get these separate solves to be run in
> parallel
> >> on a GPU (I'm talking with Sherry about getting SuperLU working on these
> >> small problems). In looking at PCApply_ASM it looks like this will take
> >> some thought. KSPSolve would need to be non-blocking, etc., or a new
> apply
> >> op might be needed.
> >>
> >> Thanks,
> >> Mark
