[petsc-users] Port existing GMRES+ILU(0) implementation to GPU

feng wang snailsoar at hotmail.com
Wed Feb 11 09:58:48 CST 2026


Hi Mat,

Thanks for your reply. Maybe I am overthinking it.

ksp/ex15 works fine with GPUs.

To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device.

Below is a snippet of my current petsc implementation. Suppose I have:

      ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);
      ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);

      //duplicate
      ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);

      //create preconditioning matrix
      ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,
                            maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr);

If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device?

Below is how I assign values for the matrix:

         nnz=0;
         for(jv=0; jv<nv; jv++)
        {
            for(iv=0; iv<nv; iv++)
           {
               values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the left hand side is [I/dt + (-J)]
               nnz++;
           }
        }

         idxm[0] = ig_mat[iql];
         idxn[0] = ig_mat[iqr];
         ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values, ADD_VALUES);
         CHKERRQ(ierr);
     }

Does petsc first set the value in the host and copy it to the device or the value is directly assigned in the device. in the 2nd case, I would need change my code a bit, since I need to make sure the data is in the device in the first place.

Thanks,
Feng



________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: 11 February 2026 13:42
To: feng wang <snailsoar at hotmail.com>
Cc: Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

On Wed, Feb 11, 2026 at 5:55 AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Thanks for your reply. Probably I did not phrase it in a clear way.

I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code.  For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system.

The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal?

I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work?

  Thanks,

     Matt

BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly

Thanks,
Feng


________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 11 February 2026 3:00
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Sorry, I don't understand your question.  What blocks you from running your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about the algorithms.

--Junchao Zhang


On Tue, Feb 10, 2026 at 3:57 PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs.

I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver.  My question is:  my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice?

Thanks,
Feng

________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 23:18
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hi Feng,
  At the first step, you don't need to change your CPU implementation.  Then do profiling to see where it is worth putting your effort.  Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage.

  Thanks!
--Junchao Zhang


On Mon, Feb 9, 2026 at 4:31 PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Many thanks for your reply.

This is great!  Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with  cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 1:55
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos  --download-kokkos-kernels).  Then run with -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46 PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d_EEF6jyNQtfEhGsUPb8_rQ08cf8731pynjpilB9qWxrAda9t0oHNPkHWuJRVp1YEtQPM66JtPnQ9YHHHcW44l3FHA$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d_EEF6jyNQtfEhGsUPb8_rQ08cf8731pynjpilB9qWxrAda9t0oHNPkHWuJRVp1YEtQPM66JtPnQ9YHHHcX2ukoduw$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/c7273abb/attachment-0001.html>


More information about the petsc-users mailing list