[petsc-dev] model for parallel ASM

Sat Jan 9 16:14:01 CST 2021

  If it is non-overlapping do you mean block Jacobi with multiple blocks per MPI rank? (one block per rank is trivial and should work now). 

  If you mean block Jacobi with multiple blocks per MPI rank you should start with the PCApply_BJacob_Multiblock(). It monkey's with pointers into the vector and then calls KSPSolve() for each block. So you just need a non-blocking KSPSolve(). What KSP and what PC do you want to use per block? If you want to use LU then I think you proceed largely as I said a few days ago. All the routines in MatSolve_SeqAIJCUSparse can be made non-blocking as discussed with each one using its own stream (supplied with a hack or with approaches from Junchao and Jacob in the future possibly; but a hack is all that is needed for this trivial case.)

  Barry

> On Jan 9, 2021, at 3:01 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> I would like to put a non-overlapping ASM solve on the GPU. It's not clear that we have a model for this. 
> 
> PCApply_ASM currently pipelines the scater with the subdomain solves. I think we would want to change this and do a 1) scatter begin loop, 2) scatter end and non-blocking solve loop, 3) solve-wait and scatter begging loop and 4) scatter end loop.
> 
> I'm not sure how to go about doing this.
>  * Should we make a new PCApply_ASM_PARALLEL or dump this pipelining algorithm and rewrite PCApply_ASM?
>  * Add a solver-wait method to KSP?
> 
> Thoughts?
> 
> Mark