[petsc-dev] model for parallel ASM

Mark Adams mfadams at lbl.gov
Sun Jan 10 13:30:52 CST 2021


On Sat, Jan 9, 2021 at 11:14 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>   Ok, fieldsplit, pretty much the same as what I said for block Jacobi.
>
> static PetscErrorCode PCApply_FieldSplit(PC pc,Vec x,Vec y)
> {
>   PC_FieldSplit      *jac = (PC_FieldSplit*)pc->data;
>   PetscErrorCode     ierr;
>   PC_FieldSplitLink  ilink = jac->head;
>   PetscInt           cnt,bs;
>
>   PetscFunctionBegin;
>   if (jac->type == PC_COMPOSITE_ADDITIVE) {
>     if (jac->defaultsplit) {
>       ierr = VecGetBlockSize(x,&bs);CHKERRQ(ierr);
>       if (jac->bs > 0 && bs != jac->bs)
> SETERRQ2(PetscObjectComm((PetscObject)pc),PETSC_ERR_ARG_WRONGSTATE,"Blocksize
> of x vector %D does not match fieldsplit blocksize %D",bs,jac->bs);
>       ierr = VecGetBlockSize(y,&bs);CHKERRQ(ierr);
>       if (jac->bs > 0 && bs != jac->bs)
> SETERRQ2(PetscObjectComm((PetscObject)pc),PETSC_ERR_ARG_WRONGSTATE,"Blocksize
> of y vector %D does not match fieldsplit blocksize %D",bs,jac->bs);
>       ierr = VecStrideGatherAll(x,jac->x,INSERT_VALUES);CHKERRQ(ierr);
>       while (ilink) {
>         ierr =
> PetscLogEventBegin(ilink->event,ilink->ksp,ilink->x,ilink->y,NULL);CHKERRQ(ierr);
>         ierr = KSPSolve(ilink->ksp,ilink->x,ilink->y);CHKERRQ(ierr);
>
> you need these KSPSolve() to not block which if you are using SuperLU_DIST
> means each call to
>
>  PetscStackCall("SuperLU_DIST:PStatInit",PStatInit(&stat));        /*
> Initialize the statistics variables. */
> #if defined(PETSC_USE_COMPLEX)
>
> PetscStackCall("SuperLU_DIST:pzgssvx",pzgssvx(&lu->options,&lu->A_sup,&lu->ScalePermstruct,(doublecomplex*)bptr,m,1,&lu->grid,&lu->LUstruct,&lu->SOLVEstruct,berr,&stat,&info));
> #else
>
> PetscStackCall("SuperLU_DIST:pdgssvx",pdgssvx(&lu->options,&lu->A_sup,&lu->ScalePermstruct,bptr,m,1,&lu->grid,&lu->LUstruct,&lu->SOLVEstruct,berr,&stat,&info));
> #endif
>
> must be on a different stream and cannot block. Which means essentially
> the PStatInit and pdgssvx have to run completely on the GPU or have some
> complex CPU babysitting that allows them to be interlaced on the GPU while
> doing their CPU stuff.
>

It is not clear why the library code would be that complex but I don't
understand streams well. SuperLU currently uses Thrust driven by CPU code,
but is developing a pure GPU version. I can wait for the pure GPU version.


> I doubt either thing is true and you are best off using the cuSparse
> Solvers which do run completely on the GPU in any stream you give them.
>

CUSparse does not have an LU factorization.


>
> The factorization is even more of a nightmare, I think Sherry uses the GPU
> just to do blocks for her sequential factorization so I doubt very much it
> can handle multiple simultaneous factorizations.
>

Oh you were talking about the solves. (I have only been thinking about
factorizations and that has caused some confusion I now see).

I am still not clear about the parallel dispatch. If an MPI serial library
method needs a lot of complex logic to work perhaps look at other models?
The XGC code uses OMP a lot and they dispatch asynchronous "solves" in an
OMP loop. I'm not sure if being in an OMP thread buys you anything for a
GPU solver but this is what they have done for CPU solvers. At one point
they used PETSc for a LU solve and we (Barry) made some fixes to make it
thread safe enough for this operation, but don't know what GPU "solvers"
they have now running in this model on the GPU.

Regardless, my best guess as the basic model is something like:

For all blocks i,
   create a stream i.s, and launch a non-blocking vecScater with i.s
For all blocks i,
  vecScaterEnd(i),
  Solve(i.s, ....)  // So library solvers and our built-in solvers would
need to take a stream or I guess that would get passed in through Jacob's
vector class and the library interface code would need to take the Cuda
stream out and give it to SuperLU, say.
For all blocks i,
  Wait (i.s)
  vecscatterBegin (i.s, ...)
For all blocks i,
 vecscatterEnd (i.s, ...)  // non-overlapping blocks so problem with adding
values
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210110/5983016a/attachment.html>


More information about the petsc-dev mailing list