[petsc-dev] Subsurface application and Algebraic Multigrid on GPUs

Thu Sep 20 16:06:30 CDT 2018

   Brian,

      I have finished making the (relatively few) changes needed to get PETSc's GAMG to run on a combination of the CPU and GPU. Any of the AMG kernels that has a CUDA backed is run automatically on the GPU while the kernels without a CUDA backend are run on the CPU. In particular the "solve" portion" (Chebyshev/Jacobi smoothing, coarse grid restriction and interpolation) will run on the GPU as well as part of the AMG "setup".

      This is in the branch barry/mpiaijcusparse-better-subclass-mpiaij which will hopefully be in the master branch tomorrow if it passes all the test suite tonight. I see Mark is already attempting to build PETSc on summit and can hopefully quickly determine if the branch works (Mark since Summit is presumably a batch system you will need to run the last two test cases listed in src/snes/examples/tutorials/ex19.c by setting up the approbate batch file and including the appropriate PETSc command line options.)

     We look forward to hearing how it functions and in particular would love to receive -log_view performance output on summit comparing the use of the GPU with simply running on the CPU for your application. This would also tell us what additional kernels, if any, should be ported to a CUDA backend.


    Barry


> On Sep 19, 2018, at 4:43 PM, Mills, Richard Tran <rtmills at anl.gov> wrote:
> 
> Hi Brian,
> 
> Your message to petsc-dev has prompted some ongoing discussion among the core PETSc developers, and we'll hopefully be able to give you an outline of a coherent plan to help you meet your ECP milestones soon.
> 
> We have had adding GPU support within PETSc's GAMG preconditioner on our list of goals for some time, but we didn't manage to get this into the recent 3.10 release. We can bump up the priority of this, and, as Jed has said, we should be able to provide AMG setup on the CPU and the solves on the CPU in relatively short order, and we can see how much this can help in the near term. Doing the setup on the GPU is much more involved, but is something that we are interested in doing.
> 
> Just wanted to let you know that your query has not gone unnoticed. Expect a more detailed reply from us soon.
> 
> Best regards,
> Richard
> 
> On Wed, Sep 19, 2018 at 11:43 AM Jed Brown <jed at jedbrown.org> wrote:
> Brian, how frequently do you need to update the matrix (thus rebuild the
> preconditioner)?
> 
> If it is infrequent, we could (in the near term) provide AMG setup on
> CPU with solves on GPU.
> 
> What is your typical problem size per node to be run on Summit?  What is
> your MPI/OpenMP(?) decomposition?
> 
> Are these heterogeneous Poisson solves or are the equations to be solved
> implicitly more complicated?  Do you have experimental information about
> relative convergence rates/grid complexity/strong scalability for your
> operator solved using classical AMG (e.g., Hypre) versus smoothed (or
> plain) aggregation (ML, GAMG default)?
> 
> Brian Van Straalen <bvstraalen at lbl.gov> writes:
> 
> > So Baky and I have been at the Brookhaven GPU Hackathon now for three days,
> > talking to everyone.  We have also been emailing with people who will
> > respond to us from the hypre team and the PETSc team, as well as reading
> > every blog post and mail archive and message board and from what we can
> > tell, a distributed AMG preconditioner will not be available for us on a
> > Summit platform for the foreseeable future.
> >
> > There is a hypre build for CUDA, but it has a problem with it's use of
> > CUSP, and nobody seems to be working on it.
> >
> > PETSc has some .cu cuda files for the SpMV and Vector operations but the
> > preconditioners are limited to point Jacobi and similar simple operations
> > and a version of ILU.  Neither works for our stiff projection in the
> > embedded boundary algorithms.   We built it and ran it and PETSc takes
> > several hundred iterations to get the residual down by a factor of 6.  We
> > need to get down to more like 10e-11 for this solve.
> >
> > The AMG being worked on by the NVIDIA team is not targeted for multi-node
> > solving, and I haven't heard back from them in months.
> >
> >   We are left with two options as I see it to meet our ECP Milestones:
> >
> > 1. Build yet another interface, this time to see if there is a distributed
> > GPU AMG preconditioner in Trilinos
> >
> >  2. Implement our own special-purpose EB-GMG solver written in Chombo.
> >
> > I would love to be wrong about all this.
> >
> > Brian
> >
> > -- 
> > Brian Van Straalen         Lawrence Berkeley Lab
> > BVStraalen at lbl.gov        Computational Research
> > (510) 486-4976            Division (crd.lbl.gov)