<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi Matt/Mark,
<div><br>
</div>
<div>I'm working on a Poisson solver for a distributed PIC code, where the particles are distributed over MPI ranks rather than the grid. Prior to the solve, all particles are deposited onto a (DMDA) grid.</div>
<div><br>
</div>
<div>The current prototype I have is that each rank holds a full size DMDA vector and particles on that rank are deposited into it. Then, the data from all the local vectors in combined into multiple distributed DMDA vectors via VecScatters and this is followed
by solving the Poisson equation. The need to have multiple subcomms, each solving the same equation is due to the fact that the grid size too small to use all the MPI ranks (beyond the strong scaling limit). The solution is then scattered back to each MPI
rank via VecScatters.</div>
<div><br>
</div>
<div>This first local-to-(multi)global transfer required the use of multiple VecScatters as there is no one-to-multiple scatter capability in SF. This works and is already giving a large speedup over the current allreduce baseline (which transfers more data
than is necessary) which is currently used. </div>
<div><br>
</div>
<div>I was wondering if within each subcommunicator I could directly write to the DMDA vector via VecSetValues and PETSc would take care of stashing them on the GPU until I call VecAssemblyBegin. Since this would be from within a kokkos parallel_for operation,
there would be multiple (probably ~1e3) simultaneous writes that the stashing mechanism would have to support. Currently, we use Kokkos-ScatterView to do this.
</div>
</div>
<div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="Signature">
<div>
<div></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Thank You,<br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div style="font-size:12.8px">Sajid Ali (he/him) | Research Associate<br>
</div>
<div style="font-size:12.8px">Scientific Computing Division<br>
</div>
<div style="font-size:12.8px">Fermi National Accelerator Laboratory<br>
</div>
<span style="font-size:12.8px"><a href="http://s-sajid-ali.github.io" target="_blank">s-sajid-ali.github.io</a></span></div>
</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
</div>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Matthew Knepley <knepley@gmail.com><br>
<b>Sent:</b> Thursday, March 17, 2022 7:25 PM<br>
<b>To:</b> Mark Adams <mfadams@lbl.gov><br>
<b>Cc:</b> Sajid Ali Syed <sasyed@fnal.gov>; petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov><br>
<b>Subject:</b> Re: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div dir="ltr">On Thu, Mar 17, 2022 at 8:19 PM Mark Adams <<a href="mailto:mfadams@lbl.gov">mfadams@lbl.gov</a>> wrote:<br>
</div>
<div class="x_gmail_quote">
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">LocalToGlobal is a DM thing..<br>
<div>Sajid, do use DM?<br>
</div>
<div>If you need to add off procesor entries then DM could give you a local vector as Matt said that you can add to for off procesor values and then you could use the CPU communication in DM.</div>
</div>
</blockquote>
<div><br>
</div>
<div>It would be GPU communication, not CPU.</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div dir="ltr">On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed <<a href="mailto:sasyed@fnal.gov" target="_blank">sasyed@fnal.gov</a>> wrote:<br>
</div>
<div class="x_gmail_quote">
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Hi PETSc-developers,<br>
<br>
Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos vectors from the device, i.e. can I call VecSetValues with GPU memory pointers and expect PETSc to figure out how to stash on the device it until I call VecAssemblyBegin (at which point
PETSc could use GPU-aware MPI to populate off-process values) ?<br>
<br>
If this is not currently supported, is supporting this on the roadmap? Thanks in advance!</div>
</div>
</blockquote>
<div><br>
</div>
<div>VecSetValues() will fall back to the CPU vector, so I do not think this will work on device.</div>
<div><br>
</div>
<div>Usually, our assembly computes all values and puts them in a "local" vector, which you can access explicitly as Mark said. Then</div>
<div>we call LocalToGlobal() to communicate the values, which does work directly on device using specialized code in VecScatter/PetscSF.</div>
<div><br>
</div>
<div>What are you trying to do?</div>
<div><br>
</div>
<div> THanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="x_gmail_quote" style="margin:0px 0px 0px 0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">
<div dir="ltr">
<div>
<div id="x_gmail-m_-6743080588372772506gmail-m_177089234717882637gmail-m_-4538176136910842065gmail-m_2176094945916392184Signature">
<div>
<div></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
Thank You,<br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div style="font-size:12.8px">Sajid Ali (he/him) | Research Associate<br>
</div>
<div style="font-size:12.8px">Scientific Computing Division<br>
</div>
<div style="font-size:12.8px">Fermi National Accelerator Laboratory<br>
</div>
<span style="font-size:12.8px"><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=jaqSeHVty0Q2rK0mKuKQMyvcQGtqdOPN6wcZIGZ5_K4&e=" target="_blank">s-sajid-ali.github.io</a></span></div>
</div>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="x_gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>