[petsc-dev] ASM for each field solve on GPUs

Wed Dec 30 19:30:37 CST 2020

Barry Smith <bsmith at petsc.dev> writes:

>   If you are using direct solvers on each block on each GPU (several matrices on each GPU) you could pull apart, for example, MatSolve_SeqAIJCUSPARSE()
> and launch each of the matrix solves on a separate stream.   You could use a MatSolveBegin/MatSolveEnd style or as Jed may prefer a Wait() model. Maybe a couple hours coding to produce a prototype MatSolveBegin/MatSolveEnd from MatSolve_SeqAIJCUSPARSE.

I doubt cusparse_solve is a single kernel launch (and there's two of them already). You'd almost certainly need a thread to keep driving it, or an async/await model. Begin/End pairs for compute (even "offloaded") compute are no small change. 

>   Note pulling apart a non-coupled single MatAIJ that contains all the matrices would be hugely expensive. Better to build each matrix already separate or use MatNest with only diagonal matrices.

Nonsense, the ND will notice that they're decoupled and you get more meat per kernel launch.