[petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors
Matthew Knepley
knepley at gmail.com
Fri Mar 18 12:15:26 CDT 2022
On Fri, Mar 18, 2022 at 11:28 AM Sajid Ali Syed <sasyed at fnal.gov> wrote:
> Hi Matt/Mark,
>
> I'm working on a Poisson solver for a distributed PIC code, where the
> particles are distributed over MPI ranks rather than the grid. Prior to the
> solve, all particles are deposited onto a (DMDA) grid.
>
> The current prototype I have is that each rank holds a full size DMDA
> vector and particles on that rank are deposited into it. Then, the data
> from all the local vectors in combined into multiple distributed DMDA
> vectors via VecScatters and this is followed by solving the Poisson
> equation. The need to have multiple subcomms, each solving the same
> equation is due to the fact that the grid size too small to use all the MPI
> ranks (beyond the strong scaling limit). The solution is then scattered
> back to each MPI rank via VecScatters.
>
> This first local-to-(multi)global transfer required the use of multiple
> VecScatters as there is no one-to-multiple scatter capability in SF. This
> works and is already giving a large speedup over the current allreduce
> baseline (which transfers more data than is necessary) which is currently
> used.
>
> I was wondering if within each subcommunicator I could directly write to
> the DMDA vector via VecSetValues and PETSc would take care of stashing them
> on the GPU until I call VecAssemblyBegin. Since this would be from within a
> kokkos parallel_for operation, there would be multiple (probably ~1e3)
> simultaneous writes that the stashing mechanism would have to support.
> Currently, we use Kokkos-ScatterView to do this.
>
Hi Sajid,
It turns out that Mark and I are doing exactly this operation for plasma
physics. Here is what we currently do:
1) Use DMSwarm to hold the particle data
2) Use a DMPlex as the cellDM for the swarm, which does point location
after each particle push
3) Use a conservative projection routine in PETSc to transfer charge to a
FEM space while preserving any number of moments (currently we do 0, 1, and
2).
This projection is just a KSP solve, which can happen on the GPU,
except that the particle data is currently held on the CPU.
4) Solve the Poisson problem (or Landau operator), which can happen
completely on the GPU
5) Project the other direction.
The biggest improvement we could make here for a GPU workflow is to hold
the particle data on the GPU. That is not conceptually hard, but would take
some rewriting of the internals, which predate GPUs.
Thanks,
Matt
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Scientific Computing Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* Thursday, March 17, 2022 7:25 PM
> *To:* Mark Adams <mfadams at lbl.gov>
> *Cc:* Sajid Ali Syed <sasyed at fnal.gov>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Regarding the status of
> VecSetValues(Blocked) for GPU vectors
>
> On Thu, Mar 17, 2022 at 8:19 PM Mark Adams <mfadams at lbl.gov> wrote:
>
> LocalToGlobal is a DM thing..
> Sajid, do use DM?
> If you need to add off procesor entries then DM could give you a local
> vector as Matt said that you can add to for off procesor values and then
> you could use the CPU communication in DM.
>
>
> It would be GPU communication, not CPU.
>
> Matt
>
>
> On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley <knepley at gmail.com> wrote:
>
> On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed <sasyed at fnal.gov> wrote:
>
> Hi PETSc-developers,
>
> Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos
> vectors from the device, i.e. can I call VecSetValues with GPU memory
> pointers and expect PETSc to figure out how to stash on the device it until
> I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to
> populate off-process values) ?
>
> If this is not currently supported, is supporting this on the roadmap?
> Thanks in advance!
>
>
> VecSetValues() will fall back to the CPU vector, so I do not think this
> will work on device.
>
> Usually, our assembly computes all values and puts them in a "local"
> vector, which you can access explicitly as Mark said. Then
> we call LocalToGlobal() to communicate the values, which does work
> directly on device using specialized code in VecScatter/PetscSF.
>
> What are you trying to do?
>
> THanks,
>
> Matt
>
>
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Scientific Computing Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=jaqSeHVty0Q2rK0mKuKQMyvcQGtqdOPN6wcZIGZ5_K4&e=>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220318/027523a8/attachment.html>
More information about the petsc-users
mailing list