[petsc-users] [GPU] Jacobi preconditioner
LEDAC Pierre
Pierre.LEDAC at cea.fr
Wed Oct 22 01:55:08 CDT 2025
Barry,
We are currently using more and more GPU computations and heavily relying on PETSc solvers (through
boomeramg, amgx, gamg preconditioners) so yes we will report you any issues or bottlenecks.
This leads to my next question: any hope, one day, of a MatGetDiagonal_SeqAIJHIPSPARSE implementation ?
We know that there is Kokkos backend as a workaround though.
Thanks again,
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°41
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
________________________________
De : Barry Smith <bsmith at petsc.dev>
Envoyé : mardi 21 octobre 2025 16:35:24
À : LEDAC Pierre
Cc : Junchao Zhang; petsc-users at mcs.anl.gov; BOURGEOIS Rémi
Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
That is clearly a dramatic amount! Of course, the previous code was absurd, copying all the nonzero entries to the host, finding the diagonal entries, and then copying them back to the GPU.
If, through Nsight, you find other similar performance bottlenecks, please let us know, and I can try to resolve them.
Barry
On Oct 21, 2025, at 5:55 AM, LEDAC Pierre <Pierre.LEDAC at cea.fr> wrote:
Hello,
Thanks for the work !
It is ok now, i check with Nsight system, the diagonal is indeed computed on the device.
How much time it saves ? I guess it depends of the number of iterations for Gmres, the lower, the more it is significant.
In my case, with 5 158 400 rows for the matrix, 45 iterations of GMRES, time to solve decrease 1.160s from to 0.671s
on a RTX A6000.
So thanks again,
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°41
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
________________________________
De : Barry Smith <bsmith at petsc.dev>
Envoyé : vendredi 17 octobre 2025 23:27:19
À : LEDAC Pierre
Cc : Junchao Zhang; petsc-users at mcs.anl.gov
Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
I have updated the MR with what I think is now correct code for computing the diagonal on the GPU, could you please try it again and let me know if it works and how much time it saves (I think it is should be significant).
Thankts for your patients,
Barry
On Oct 2, 2025, at 1:16 AM, LEDAC Pierre <Pierre.LEDAC at cea.fr> wrote:
Yes, probably the reason I saw also a crash in my test case after a quick fix of the integer conversion.
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°41
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
________________________________
De : Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Envoyé : jeudi 2 octobre 2025 02:16:40
À : LEDAC Pierre
Cc : Junchao Zhang; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>
Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
Sorry about that. The current code is buggy anyways; I will let you know when I have tested it extensively so you can try again.
Barry
On Oct 1, 2025, at 3:47 PM, LEDAC Pierre <Pierre.LEDAC at cea.fr<mailto:Pierre.LEDAC at cea.fr>> wrote:
Sorry the correct error is:
/export/home/catA/pl254994/trust/petsc/lib/src/LIBPETSC/build/petsc-barry-2025-09-30-add-matgetdiagonal-cuda/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu(3765): error: argument of type "int*" is incompatible with parameter of type "const PetscInt *"
GetDiagonal_CSR<<<(int)((n + 255) / 256), 256, 0, PetscDefaultCudaStream>>>(cusparsestruct->rowoffsets_gpu->data().get(), matstruct->cprowIndices->data().get(), cusparsestruct->workVector->data().get(), n, darray);
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°41
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
________________________________
De : LEDAC Pierre
Envoyé : mercredi 1 octobre 2025 21:46:00
À : Barry Smith
Cc : Junchao Zhang; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>
Objet : RE: [petsc-users] [GPU] Jacobi preconditioner
Hi all,
Thanks for the MR, there is a build issue cause we use --with-64-bit-indices:
/export/home/catA/pl254994/trust/petsc/lib/src/LIBPETSC/build/petsc-barry-2025-09-30-add-matgetdiagonal-cuda/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu(3765): error: argument of type "PetscInt" is incompatible with parameter of type "const PetscInt *"
GetDiagonal_CSR<<<(int)((n + 255) / 256), 256, 0, PetscDefaultCudaStream>>>(cusparsestruct->rowoffsets_gpu->data().get(), matstruct->cprowIndices->data().get(), cusparsestruct->workVector->data().get(), n, darray);
Thanks,
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°41
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
________________________________
De : Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Envoyé : mercredi 1 octobre 2025 18:48:37
À : LEDAC Pierre
Cc : Junchao Zhang; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>
Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
I have finally created an MR that moves the Jacobi accessing of the diagonal to the GPU, which should improve the GPU performance of your code. https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8756__;!!G_uCfscf7eWS!Z_bBmSMgZlgbbCnnU95NebK6rq-HthAoBot_aRjN3FrswnW_hUKhHtQEhzaSBDufQF4zdJmuu48OyETmlFE96TeqxaLz$
Please give it a try and let us know if it causes any difficulties or, hopefully, improves your code's performance significantly.
Sorry for the long delay, NVIDIA is hiring too many PETSc developers away from us.
Barry
On Jul 31, 2025, at 6:46 AM, LEDAC Pierre <Pierre.LEDAC at cea.fr<mailto:Pierre.LEDAC at cea.fr>> wrote:
Thanks Barry, I agree but didn't dare asking for that.
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°41
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
________________________________
De : Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Envoyé : mercredi 30 juillet 2025 20:34:26
À : Junchao Zhang
Cc : LEDAC Pierre; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>
Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
We absolutely should have a MatGetDiagonal_SeqAIJCUSPARSE(). It's somewhat embarrassing that we don't provide this.
I have found some potential code at https://urldefense.us/v3/__https://stackoverflow.com/questions/60311408/how-to-get-the-diagonal-of-a-sparse-matrix-in-cusparse__;!!G_uCfscf7eWS!Z_bBmSMgZlgbbCnnU95NebK6rq-HthAoBot_aRjN3FrswnW_hUKhHtQEhzaSBDufQF4zdJmuu48OyETmlFE96WMfFt-W$
Barry
On Jul 28, 2025, at 11:43 AM, Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda and petsc/kokkos backends are separate code.
If petsc/kokkos meet your needs, then just use them. For petsc users, we hope it will be just a difference of extra --download-kokkos --download-kokkos-kernels in configuration.
--Junchao Zhang
On Mon, Jul 28, 2025 at 2:51 AM LEDAC Pierre <Pierre.LEDAC at cea.fr<mailto:Pierre.LEDAC at cea.fr>> wrote:
Hello all,
We are solving with PETSc a linear system updated every time step (constant stencil but coefficients changing).
The matrix is preallocated once with MatSetPreallocationCOO() then filled each time step with MatSetValuesCOO() and we use device pointers for coo_i, coo_j, and coefficients values.
It is working fine with a GMRES Ksp solver and PC Jacobi but we are surprised to see that every time step, during PCSetUp, MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. Looking at the API, it seems there is no MatGetDiagonal_SeqAIJCUSPARSE() but a MatGetDiagonal_SeqAIJKOKKOS().
Does it mean we should use Kokkos backend in PETSc to have Jacobi preconditioner built directly on device ? Or I am doing something wrong ?
NB: Gmres is running well on device.
I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated each solve on host but it increases significantly the number of iterations.
Thanks,
<pastedImage.png>
Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°41
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20251022/e243fd38/attachment-0001.html>
More information about the petsc-users
mailing list