[petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Thu Feb 12 17:44:57 CST 2026

On Thu, Feb 12, 2026 at 5:14 PM feng wang <snailsoar at hotmail.com> wrote:

> Hi Mat,
>
> Thanks for your reply.
>
> For "VecCreateGhostBlock",  The CPU version runs in parallel, if we are
> solving Ax=b, so it also stores the halos in x and b for each partition.
> This is how my old implementation was done. If the current GPU
> implementation does not support halos, I can stick to one GPU for the
> moment. or is there a way around this?
>
PETSc currently doesn't support ghost vectors on device, though we plan to
support it.

>
> Regarding to "Rather you create a generic Mat, set the blocksize, and then
> MatSetFromOptions(). Then you can set the type from the command line, like
> baij or aijcusparse, etc.", my current CFD code also takes arguments from
> the command line, so I prefer I can set the types from the source code
> directly, so it does not mess around with arguments of the CFD code. Is
> there a way I can do this?
>
petsc asscpets options from three sources: 1) command line; 2) the .petscrc
file; 3) the PETSC_OPTIONS env var.  You can use the latter two approaches.

>
> With respect to "MatSetValuesCOO()", I am new to this, and was using the
> old way to set the values.  For MatSetValuesCOO, it requires an argument
> "coo_v", how does it work if I want to set the values in the GPU directly?
> say, coo_v has the type of PetscScalar, do I need to create coo_v and
> assign its values directly in the GPU and then give it to MatSetValuesCOO?
>
COO routines are used to assemble the matrix on device.  If you compute
matrix entries on host, you don't need COO, otherwise you need.  In
 MatSetValuesCOO(A, coo_v, ..), coo_v can be a device pointer, however in
MatSetValues/MatSetValuesBlocked(A, ..., v, ..), v must be a host pointer.

>
> Thanks for your help in advance.
>
> Best regards,
> Feng
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* 11 February 2026 16:32
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> On Wed, Feb 11, 2026 at 10:58 AM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Mat,
>
> Thanks for your reply. Maybe I am overthinking it.
>
> ksp/ex15 works fine with GPUs.
>
> To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how
> Petsc handle the memory in the host and the device.
>
> Below is a snippet of my current petsc implementation. Suppose I have:
>
>       ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize,
> blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);
>
>
> This is the problem. Right now VecGhost hardcodes the use of VECSEQ and
> VECMPI. This is not necessary, and the local and global representations
> could indeed be device types. Is ghost necessary right now?
>
>
>       ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);
>
>       //duplicate
>       ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);
>
>       //create preconditioning matrix
>       ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize,
> nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,
>                             maxneig, NULL, maxneig, NULL, &petsc_A_pre);
> CHKERRQ(ierr);
>
>
> I would not create the specific type. Rather you create a generic Mat, set
> the blocksize, and then MatSetFromOptions(). Then you can set the type from
> the command line, like baij or aijcusparse, etc.
>
>
> *If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and
> vectors directly created in the device?*
>
> Below is how I assign values for the matrix:
>
>          nnz=0;
>          for(jv=0; jv<nv; jv++)
>         {
>             for(iv=0; iv<nv; iv++)
>            {
>                values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the
> left hand side is [I/dt + (-J)]
>                nnz++;
>            }
>         }
>
>          idxm[0] = ig_mat[iql];
>          idxn[0] = ig_mat[iqr];
>          ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values,
> ADD_VALUES);
>          CHKERRQ(ierr);
>      }
>
> *Does petsc first set the value in the host and copy it to the device or
> the value is directly assigned in the device. in the 2nd case, I would need
> change my code a bit, since I need to make sure the data is in the device
> in the first place.*
>
>
> Yes, you would need to set the values on device for maximum efficiency
> (although I would try it out with CPU construction first). You can do this
> best on the GPU using MatSetValuesCOO().
>
>   Thanks,
>
>      Matt
>
>
> Thanks,
> Feng
>
>
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* 11 February 2026 13:42
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> On Wed, Feb 11, 2026 at 5:55 AM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Thanks for your reply. Probably I did not phrase it in a clear way.
>
> I am using openACC to port the CFD code to the GPU, so the CPU and the GPU
> version essentially share the same source code.  For the original CPU
> version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve
> the sparse linear system.
>
> The current GPU version of the code only port the Jacobi solver to the
> GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to
> make to the existing CPU version of GMRES+ILU(0) to achieve this goal?
>
>
> I think what Junchao is saying, is that if you use the GPU vec and mat
> types, this should be running on the GPU already. Does that not work?
>
>   Thanks,
>
>      Matt
>
>
> BTW: For performance the GPU version of the CFD code has minimum
> communication between the CPU and GPU, so for Ax=b, A, x and b are created
> in the GPU directly
>
> Thanks,
> Feng
>
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 11 February 2026 3:00
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Barry Smith <
> bsmith at petsc.dev>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Sorry, I don't understand your question.  What blocks you from running
> your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about
> the algorithms.
>
> --Junchao Zhang
>
>
> On Tue, Feb 10, 2026 at 3:57 PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> I have managed to configure Petsc for GPU, also managed to run ksp/ex15
> using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster
> compared to the scenario if I don't use " -mat_type aijcusparse -vec_type
> cuda". so I believe it runs okay for GPUs.
>
> I have an existing CFD code that runs natively on GPUs. so all the data is
> offloaded to GPU at the beginning and some data are copied back to the cpu
> at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for
> the implicit solver.  *My question is:  my code also has a GMRES+ILU(0)
> implemented with Petsc but it only runs on cpus (which I implemented a few
> years ago). How can I replace the existing Newton-Jacobi (which runs in
> GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give
> some advice?*
>
> Thanks,
> Feng
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 23:18
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hi Feng,
>   At the first step, you don't need to change your CPU implementation.
> Then do profiling to see where it is worth putting your effort.  Maybe you
> need to assemble your matrices and vectors on GPUs too, but decide that at
> a later stage.
>
>   Thanks!
> --Junchao Zhang
>
>
> On Mon, Feb 9, 2026 at 4:31 PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Many thanks for your reply.
>
> This is great!  Do I need to change anything for my current CPU
> implementation? or I just link to a version of Petsc that is configured
> with  cuda and make sure the necessary data are copied to the "device",
> then Petsc will do the rest magic for me?
>
> Thanks,
> Feng
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 1:55
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hello Feng,
>   It is possible to run GMRES with ILU(0) on GPUs.  You may need to
> configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with
> extra --download-kokkos  --download-kokkos-kernels).  Then run with
> -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
>   But triangular solve is not GPU friendly and the performance might be
> poor.  But you should try it, I think.
>
> Thanks!
> --Junchao Zhang
>
> On Sun, Feb 8, 2026 at 5:46 PM feng wang <snailsoar at hotmail.com> wrote:
>
> Dear All,
>
> I have an existing implementation of GMRES with ILU(0), it works well for
> cpu now. I went through the Petsc documentation, it seems Petsc has some
> support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?
>
> Many thanks for your help in advance,
> Feng
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhX6QVNE-$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhSOQ13qp$ >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhX6QVNE-$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhSOQ13qp$ >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260212/e6dedef3/attachment-0001.html>