From junchao.zhang at gmail.com  Mon Feb  2 12:04:45 2026
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Mon, 2 Feb 2026 12:04:45 -0600
Subject: [petsc-users] =?utf-8?q?PETSc_Online_BoF_=E2=80=94_February_11?=
	=?utf-8?q?=2C_2026_=28Free_Registration=29?=
Message-ID: <CA+MQGp_YTqRUCrAM5kwj7PUascVs3M_8kumzWUyO5-c+cjfWCA@mail.gmail.com>

Dear PETSc community,

PETSc will host a free online Birds-of-a-Feather (BoF) session on *February
11, 2026*, from *10:00?11:30 am (Central Time, US and Canada)*. The BoF
will not be recorded.

The agenda is available at
https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!Yah9fv8-ZyaZp-y5UUT3HzG0M5Sw3hmSKm7WgsrvebX3FP3sSq-jYj-6_etbVFAhdmoWrRayUOLjjPB9GTMk2PX-9gX-$ 

Please register in advance at
https://urldefense.us/v3/__https://argonne.zoomgov.com/meeting/register/ay4bMcRgSZaZ-l7u9AzAzQ__;!!G_uCfscf7eWS!Yah9fv8-ZyaZp-y5UUT3HzG0M5Sw3hmSKm7WgsrvebX3FP3sSq-jYj-6_etbVFAhdmoWrRayUOLjjPB9GTMk2Nelux0G$ 
To receive a Zoom link, the organizer requires all participants to register
individually. Registration is quick and requires only your name and email
address.

We look forward to your participation and to a productive and engaging
discussion.

Thank you,
Junchao Zhang
On behalf of the PETSc team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260202/31b8b9a0/attachment.html>

From snailsoar at hotmail.com  Sun Feb  8 17:46:37 2026
From: snailsoar at hotmail.com (feng wang)
Date: Sun, 8 Feb 2026 23:46:37 +0000
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
Message-ID: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>

Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260208/d3ff48e7/attachment.html>

From junchao.zhang at gmail.com  Sun Feb  8 19:55:45 2026
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Sun, 8 Feb 2026 19:55:45 -0600
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
Message-ID: <CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to
configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with
extra --download-kokkos  --download-kokkos-kernels).  Then run with
-mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be
poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com> wrote:

> Dear All,
>
> I have an existing implementation of GMRES with ILU(0), it works well for
> cpu now. I went through the Petsc documentation, it seems Petsc has some
> support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?
>
> Many thanks for your help in advance,
> Feng
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260208/f05adb27/attachment.html>

From aldo.bonfiglioli at unibas.it  Mon Feb  9 06:24:27 2026
From: aldo.bonfiglioli at unibas.it (Aldo Bonfiglioli)
Date: Mon, 9 Feb 2026 13:24:27 +0100
Subject: [petsc-users] Switching between sections of the same DM
Message-ID: <3308564d-6195-4ca1-b9d3-86a6eb387211@unibas.it>

Hi there,

I am trying to switch btw two different sections (section_l(1:2)) 
defined on the same DM (one section for the dependent variable, the 
other for their gradient)

> !
> ! dependent variables and their nodal gradient
> !
> ??PetscCall(DMSetLocalSection(dm, section_l(1), ierr)) ! dependent 
> variables
> ??PetscCall(DMGetLocalVector(dm, localu, ierr)) ! dependent variables
> ??PetscCall(DMGlobaltoLocal(dm, u, INSERT_VALUES, localu, ierr))
> ??PetscCall(VecSet(gradu, 0.d0, ierr))
> ??PetscCall(DMSetLocalSection(dm, section_l(2), ierr)) ! gradient of 
> the dependent variables
> ??PetscCall(DMGetLocalVector(dm, localdu, ierr)) ! gradient of the 
> dependent variables
> ??PetscCall(VecSet(localdu, 0.d0, ierr))
> !
> callinnerloop(dm, localu, localdu, section_l, ierr)
>
subroutine innerloop is supposed to do work on both Vecs.

When trying to do so, I get the following error:

> [0]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [0]PETSC ERROR: Object is in wrong state
> [0]PETSC ERROR: Clearing DM of local vectors that has a local vector 
> obtained with DMGetLocalVector()
>
as if two sections living simultaneously on the same DM are not allowed.

Should I instead clone the DM and create the "second" section of the clone?

Thanks,

Aldo

-- 
Dr. Aldo Bonfiglioli
Associate professor of Fluid Mechanics
Dipartimento di Ingegneria
Universita' della Basilicata
V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY
tel:+39.0971.205203 fax:+39.0971.205215
web:https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!YVPmxuMFuutpshSi89LXW-8W6oF1lauFcuN2HzqwcvecXbGkhDoRlhVqyIdHvFHEXN7_zUPSCbdoeElWRBtO1fQ_1q-v4pzF0QU$ 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260209/071f1f5a/attachment-0001.html>

From knepley at gmail.com  Mon Feb  9 06:41:00 2026
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 9 Feb 2026 07:41:00 -0500
Subject: [petsc-users] Switching between sections of the same DM
In-Reply-To: <3308564d-6195-4ca1-b9d3-86a6eb387211@unibas.it>
References: <3308564d-6195-4ca1-b9d3-86a6eb387211@unibas.it>
Message-ID: <CAMYG4GmKnoNBb0zGJR2AnmuXkS7=WQnbmFGAwodyyrcuD0SP6g@mail.gmail.com>

On Mon, Feb 9, 2026 at 6:29?AM Aldo Bonfiglioli <aldo.bonfiglioli at unibas.it>
wrote:

> Hi there,
>
> I am trying to switch btw two different sections (section_l(1:2)) defined
> on the same DM (one section for the dependent variable, the other for their
> gradient)
>
> !
> ! dependent variables and their nodal gradient
> !
>   PetscCall(DMSetLocalSection(dm, section_l(1), ierr)) ! dependent
> variables
>   PetscCall(DMGetLocalVector(dm, localu, ierr)) ! dependent variables
>   PetscCall(DMGlobaltoLocal(dm, u, INSERT_VALUES, localu, ierr))
>   PetscCall(VecSet(gradu, 0.d0, ierr))
>   PetscCall(DMSetLocalSection(dm, section_l(2), ierr)) ! gradient of the
> dependent variables
>   PetscCall(DMGetLocalVector(dm, localdu, ierr)) ! gradient of the
> dependent variables
>   PetscCall(VecSet(localdu, 0.d0, ierr))
> !
>   call innerloop(dm, localu, localdu, section_l, ierr)
>
> subroutine innerloop is supposed to do work on both Vecs.
>
> When trying to do so, I get the following error:
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Object is in wrong state
> [0]PETSC ERROR: Clearing DM of local vectors that has a local vector
> obtained with DMGetLocalVector()
>
> as if two sections living simultaneously on the same DM are not allowed.
>
> Should I instead clone the DM and create the "second" section of the clone?
>
> Yes, this is exactly right. It is lightweight, and this is what clone is
intended for.

  Thanks,

     Matt

> Thanks,
>
> Aldo
>
> --
> Dr. Aldo Bonfiglioli
> Associate professor of Fluid Mechanics
> Dipartimento di Ingegneria
> Universita' della Basilicata
> V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY
> tel:+39.0971.205203 fax:+39.0971.205215
> web: https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!aoLehySjF-RiaGi_oyCRn_dYj6QvTlk5w028Xwvyvfl42tLcztJ0dySq0cpcEjn2g2b_md_LL-09Jcu-WXxH$  <https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!YVPmxuMFuutpshSi89LXW-8W6oF1lauFcuN2HzqwcvecXbGkhDoRlhVqyIdHvFHEXN7_zUPSCbdoeElWRBtO1fQ_1q-v4pzF0QU$>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aoLehySjF-RiaGi_oyCRn_dYj6QvTlk5w028Xwvyvfl42tLcztJ0dySq0cpcEjn2g2b_md_LL-09JRBYgR6D$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aoLehySjF-RiaGi_oyCRn_dYj6QvTlk5w028Xwvyvfl42tLcztJ0dySq0cpcEjn2g2b_md_LL-09JYjpzJhA$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260209/cdd90101/attachment.html>

From junchao.zhang at gmail.com  Mon Feb  9 08:00:00 2026
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Mon, 9 Feb 2026 08:00:00 -0600
Subject: [petsc-users] 
	=?utf-8?q?PETSc_Online_BoF_=E2=80=94_February_11?=
	=?utf-8?q?=2C_2026_=28Free_Registration=29?=
In-Reply-To: <CA+MQGp_YTqRUCrAM5kwj7PUascVs3M_8kumzWUyO5-c+cjfWCA@mail.gmail.com>
References: <CA+MQGp_YTqRUCrAM5kwj7PUascVs3M_8kumzWUyO5-c+cjfWCA@mail.gmail.com>
Message-ID: <CA+MQGp-s0AcTMeBfoVrwEb8ho+97OZzU9FO0B9FN0xZesxOaEg@mail.gmail.com>

This is a kind reminder that the PETSc birds-of-a-feather (BoF) session
will take place on *February 11, 2026*, from *10:00?11:30 am (Central Time,
US and Canada)*.  If you have not already done so, please register in
advance (see links below).

We look forward to seeing you.

Best regards,
Junchao Zhang
On Mon, Feb 2, 2026 at 12:04?PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Dear PETSc community,
>
> PETSc will host a free online Birds-of-a-Feather (BoF) session on *February
> 11, 2026*, from *10:00?11:30 am (Central Time, US and Canada)*. The BoF
> will not be recorded.
>
> The agenda is available at
> https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!cowQnabNB_YIg5gdeJVFG0uqoKlgZFcbGCyAA6GIoBAeNfDRREg1c1wbcxAMOan5O_LKzDTB8BbJQGTVOpYIKrtwIDXz$ 
>
> Please register in advance at
> https://urldefense.us/v3/__https://argonne.zoomgov.com/meeting/register/ay4bMcRgSZaZ-l7u9AzAzQ__;!!G_uCfscf7eWS!cowQnabNB_YIg5gdeJVFG0uqoKlgZFcbGCyAA6GIoBAeNfDRREg1c1wbcxAMOan5O_LKzDTB8BbJQGTVOpYIKvHWg0Nt$ 
> To receive a Zoom link, the organizer requires all participants to
> register individually. Registration is quick and requires only your name
> and email address.
>
> We look forward to your participation and to a productive and engaging
> discussion.
>
> Thank you,
> Junchao Zhang
> On behalf of the PETSc team
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260209/84dfc285/attachment.html>

From snailsoar at hotmail.com  Mon Feb  9 16:31:44 2026
From: snailsoar at hotmail.com (feng wang)
Date: Mon, 9 Feb 2026 22:31:44 +0000
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
Message-ID: <SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>

Hi Junchao,

Many thanks for your reply.

This is great!  Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with  cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: 09 February 2026 1:55
To: feng wang <snailsoar at hotmail.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos  --download-kokkos-kernels).  Then run with -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260209/3684f0df/attachment.html>

From junchao.zhang at gmail.com  Mon Feb  9 17:18:04 2026
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Mon, 9 Feb 2026 17:18:04 -0600
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
Message-ID: <CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>

Hi Feng,
  At the first step, you don't need to change your CPU implementation.
Then do profiling to see where it is worth putting your effort.  Maybe you
need to assemble your matrices and vectors on GPUs too, but decide that at
a later stage.

  Thanks!
--Junchao Zhang


On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com> wrote:

> Hi Junchao,
>
> Many thanks for your reply.
>
> This is great!  Do I need to change anything for my current CPU
> implementation? or I just link to a version of Petsc that is configured
> with  cuda and make sure the necessary data are copied to the "device",
> then Petsc will do the rest magic for me?
>
> Thanks,
> Feng
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 1:55
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hello Feng,
>   It is possible to run GMRES with ILU(0) on GPUs.  You may need to
> configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with
> extra --download-kokkos  --download-kokkos-kernels).  Then run with
> -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
>   But triangular solve is not GPU friendly and the performance might be
> poor.  But you should try it, I think.
>
> Thanks!
> --Junchao Zhang
>
> On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Dear All,
>
> I have an existing implementation of GMRES with ILU(0), it works well for
> cpu now. I went through the Petsc documentation, it seems Petsc has some
> support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?
>
> Many thanks for your help in advance,
> Feng
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260209/e603a5ad/attachment-0001.html>

From rlmackie862 at gmail.com  Mon Feb  9 19:43:19 2026
From: rlmackie862 at gmail.com (Randall Mackie)
Date: Mon, 9 Feb 2026 17:43:19 -0800
Subject: [petsc-users] missing Fortran interfaces
Message-ID: <1867E7AD-2133-4B56-BBD1-8D47A1FC970D@gmail.com>

Hi Barry and PETSc team:

Is it possible that there are missing Fortran interfaces (in PETSc v3.24) for the following routines:

PetscViewerASCIISynchronizedPrintf

PetscViewerASCIIPushSynchronized


and related routines?


Thanks,

Randy M.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260209/1fef005d/attachment.html>

From bsmith at petsc.dev  Mon Feb  9 21:33:53 2026
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 9 Feb 2026 22:33:53 -0500
Subject: [petsc-users] missing Fortran interfaces
In-Reply-To: <1867E7AD-2133-4B56-BBD1-8D47A1FC970D@gmail.com>
References: <1867E7AD-2133-4B56-BBD1-8D47A1FC970D@gmail.com>
Message-ID: <B02A37A6-4CBB-4461-944E-C58F89AC2279@petsc.dev>


  Added in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/9020__;!!G_uCfscf7eWS!dv9dZw32F2nr49elnSrO-lbHwxSvKM9rDw7fW9dQqjp8GqagNUgN9zM_KUGTixJNxEkaI_h-8C7kWUgV3rP6nWk$ 

 The PetscViewerASCIIPushSynchronized should be generated automatically so it should just be there. Let us know at that MR if it is not.

  Barry


> On Feb 9, 2026, at 8:43?PM, Randall Mackie <rlmackie862 at gmail.com> wrote:
> 
> Hi Barry and PETSc team:
> 
> Is it possible that there are missing Fortran interfaces (in PETSc v3.24) for the following routines:
> 
> PetscViewerASCIISynchronizedPrintf
> 
> PetscViewerASCIIPushSynchronized
> 
> 
> and related routines?
> 
> 
> Thanks,
> 
> Randy M.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260209/0119db84/attachment.html>

From snailsoar at hotmail.com  Tue Feb 10 04:29:55 2026
From: snailsoar at hotmail.com (feng wang)
Date: Tue, 10 Feb 2026 10:29:55 +0000
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
Message-ID: <SY8P300MB012845176238587689EF9CF8B962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>

Hi Junchao,

Thanks for your reply. I will try that. If I have any issues, I will come back to this thread.

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: 09 February 2026 23:18
To: feng wang <snailsoar at hotmail.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hi Feng,
  At the first step, you don't need to change your CPU implementation.  Then do profiling to see where it is worth putting your effort.  Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage.

  Thanks!
--Junchao Zhang


On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Many thanks for your reply.

This is great!  Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with  cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 1:55
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos  --download-kokkos-kernels).  Then run with -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260210/cb579406/attachment-0001.html>

From matteo4.leone at mail.polimi.it  Mon Feb  9 18:58:22 2026
From: matteo4.leone at mail.polimi.it (Matteo Leone)
Date: Tue, 10 Feb 2026 00:58:22 +0000
Subject: [petsc-users] DG methods in PETSc
Message-ID: <AS8P251MB0853D33BE6BB232B3B7519EF8162A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>

Hello, I already posted on Reddit but just to be sure I write even here.

First thanks for the job you do for PETSc, I have used it for several projects and is  always nice.

I am writing cause  I am getting mad trying to implement DG solver in PETSc, the target is the Euler equations, however I am failing even with just the simplest transport equation (u/t + u/x = 0). I was wondering if I am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, and more, but I keep failing, I tried also with LLMs, but seems like there is no DG code with PETSc on the web, however I see many papers that do it.

I was  wondering if I am maybe missing  something out or what.

For reference I use PETSc 3.24.3 by means of nix.

Thanks in advance, cheers.

Matteo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260210/029665ac/attachment.html>

From rlmackie862 at gmail.com  Tue Feb 10 11:02:39 2026
From: rlmackie862 at gmail.com (Randall Mackie)
Date: Tue, 10 Feb 2026 09:02:39 -0800
Subject: [petsc-users] missing Fortran interfaces
In-Reply-To: <B02A37A6-4CBB-4461-944E-C58F89AC2279@petsc.dev>
References: <1867E7AD-2133-4B56-BBD1-8D47A1FC970D@gmail.com>
	<B02A37A6-4CBB-4461-944E-C58F89AC2279@petsc.dev>
Message-ID: <8C3C956A-FF59-4F9D-B7AC-76DD07761928@gmail.com>

Thanks Barry,

We will check this out and report back at the MR if anything is missing.


Randy M.


> On Feb 9, 2026, at 7:33?PM, Barry Smith <bsmith at petsc.dev> wrote:
> 
> 
>   Added in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/9020__;!!G_uCfscf7eWS!b2PdLz9P_gp9R4H2xGEBLuTQMCOglUn8ANgxlMaH3yiB8jKlOlW0tqbFGU5BJ1C8hYtkgF0BLqzKwx_NquC-t9Q9lw$ 
> 
>  The PetscViewerASCIIPushSynchronized should be generated automatically so it should just be there. Let us know at that MR if it is not.
> 
>   Barry
> 
> 
>> On Feb 9, 2026, at 8:43?PM, Randall Mackie <rlmackie862 at gmail.com> wrote:
>> 
>> Hi Barry and PETSc team:
>> 
>> Is it possible that there are missing Fortran interfaces (in PETSc v3.24) for the following routines:
>> 
>> PetscViewerASCIISynchronizedPrintf
>> 
>> PetscViewerASCIIPushSynchronized
>> 
>> 
>> and related routines?
>> 
>> 
>> Thanks,
>> 
>> Randy M.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260210/2ddd40c0/attachment.html>

From snailsoar at hotmail.com  Tue Feb 10 15:57:11 2026
From: snailsoar at hotmail.com (feng wang)
Date: Tue, 10 Feb 2026 21:57:11 +0000
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
Message-ID: <SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>

Hi Junchao,

I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs.

I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver.  My question is:  my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice?

Thanks,
Feng

________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: 09 February 2026 23:18
To: feng wang <snailsoar at hotmail.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hi Feng,
  At the first step, you don't need to change your CPU implementation.  Then do profiling to see where it is worth putting your effort.  Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage.

  Thanks!
--Junchao Zhang


On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Many thanks for your reply.

This is great!  Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with  cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 1:55
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos  --download-kokkos-kernels).  Then run with -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260210/75268209/attachment.html>

From drwells at email.unc.edu  Tue Feb 10 16:31:53 2026
From: drwells at email.unc.edu (Wells, David)
Date: Tue, 10 Feb 2026 22:31:53 +0000
Subject: [petsc-users] Limiting the number of vectors allocated at a time by
 fgmres etc.
Message-ID: <SA1PR03MB6387D98F49E67D0FA2A54181ED62A@SA1PR03MB6387.namprd03.prod.outlook.com>

Hello,

I've been profiling the memory usage of my solver and it looks like a huge number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I read through the source code and it looks like these vectors are allocated ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that KSP. Is there a way to change this value? If not - how hard would it be to add an API to set a different initial value for that? These vectors take up a lot of memory and I would rather just one at a time.

Best,
David Wells
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260210/1f720aee/attachment-0001.html>

From knepley at gmail.com  Tue Feb 10 17:28:09 2026
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 10 Feb 2026 18:28:09 -0500
Subject: [petsc-users] Limiting the number of vectors allocated at a
 time by fgmres etc.
In-Reply-To: <SA1PR03MB6387D98F49E67D0FA2A54181ED62A@SA1PR03MB6387.namprd03.prod.outlook.com>
References: <SA1PR03MB6387D98F49E67D0FA2A54181ED62A@SA1PR03MB6387.namprd03.prod.outlook.com>
Message-ID: <CAMYG4GmFLwVv8LsK8niEwOsRaW==c30pvjvtUEMpx7vk1MJxOg@mail.gmail.com>

On Tue, Feb 10, 2026 at 5:32?PM Wells, David via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hello,
>
> I've been profiling the memory usage of my solver and it looks like a huge
> number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I
> read through the source code and it looks like these vectors are allocated
> ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that
> KSP. Is there a way to change this value?
>

We could add an option to change this delta. Actually theory suggests that
a constant is not optimal, but rather we should double the number each
time. I would also be willing to code that.


> If not - how hard would it be to add an API to set a different initial
> value for that? These vectors take up a lot of memory and I would rather
> just one at a time.
>

I cannot understand precisely what is happening here. You specify a restart
size when you setup the KSP. It allocates that many vecs (roughly). Why are
there reallocations? Do you increase the restart size during the iteration?

  Thanks,

     Matt


> Best,
> David Wells
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bSUDVgOBX_MDxTjSivXOuXsYl5KKCBhJDYseaa3Gb4DMURCyG3nv1cAESszMf_OsNZRuR6JWB7VFthuvYWD6$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bSUDVgOBX_MDxTjSivXOuXsYl5KKCBhJDYseaa3Gb4DMURCyG3nv1cAESszMf_OsNZRuR6JWB7VFtp20RqZe$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260210/58cc30bb/attachment.html>

From drwells at email.unc.edu  Tue Feb 10 20:04:20 2026
From: drwells at email.unc.edu (Wells, David)
Date: Wed, 11 Feb 2026 02:04:20 +0000
Subject: [petsc-users] Limiting the number of vectors allocated at a
 time by fgmres etc.
In-Reply-To: <CAMYG4GmFLwVv8LsK8niEwOsRaW==c30pvjvtUEMpx7vk1MJxOg@mail.gmail.com>
References: <SA1PR03MB6387D98F49E67D0FA2A54181ED62A@SA1PR03MB6387.namprd03.prod.outlook.com>
	<CAMYG4GmFLwVv8LsK8niEwOsRaW==c30pvjvtUEMpx7vk1MJxOg@mail.gmail.com>
Message-ID: <SA1PR03MB6387FD4C342777BD437ABA1AED63A@SA1PR03MB6387.namprd03.prod.outlook.com>

Hi Matt,

Thanks for the quick response!

> I cannot understand precisely what is happening here. You specify a restart
> size when you setup the KSP. It allocates that many vecs (roughly). Why are
> there reallocations? Do you increase the restart size during the iteration?

I don't believe there are any reallocations (I didn't write this solver, but I
don't see any calls which set the restart size or any other relevant parameter [1]):
as far as I can tell, the solver just allocates a lot of vectors. I'm working
off of traces computed by heaptrack, which is my only insight into how this
works. The allocations come from KSPCreateVecs(), which is called by
1. KSPFGMRESGetNewVectors() (for about 1.7 GB [2] of memory)
2. KSPSetUp_GMRES() (for about 300 MB of memory)
3. KSPSetUp_FGMRES() (for about 264 MB of memory)
4. KSPSetWorkVecs() (for about 236 MB of memory)

Is there some relevant set of monitoring flags I can set which will show me how
many vectors I allocate or use? That would also help.

Best,
David

[1] This is IBAMR's PETScKrylovLinearSolver.
[2] This is half the total memory we use for side-centered data vectors.
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Tuesday, February 10, 2026 6:28 PM
To: Wells, David <drwells at email.unc.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc.

On Tue, Feb 10, 2026 at 5:32?PM Wells, David via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hello,

I've been profiling the memory usage of my solver and it looks like a huge number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I read through the source code and it looks like these vectors are allocated ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that KSP. Is there a way to change this value?

We could add an option to change this delta. Actually theory suggests that a constant is not optimal, but rather we should double the number each time. I would also be willing to code that.

If not - how hard would it be to add an API to set a different initial value for that? These vectors take up a lot of memory and I would rather just one at a time.

I cannot understand precisely what is happening here. You specify a restart size when you setup the KSP. It allocates that many vecs (roughly). Why are there reallocations? Do you increase the restart size during the iteration?

  Thanks,

     Matt

Best,
David Wells


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aBTZ4oaySeEsyg3U98-DZfMmoj0Wg8RUNEzlPdQoEnZ6pfCuP3cZSF7ib3bqpT7GBGVct81F2vjmak_Zva98zbh7D1M$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aBTZ4oaySeEsyg3U98-DZfMmoj0Wg8RUNEzlPdQoEnZ6pfCuP3cZSF7ib3bqpT7GBGVct81F2vjmak_Zva9875mpIc0$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/0e4c850f/attachment-0001.html>

From bsmith at petsc.dev  Tue Feb 10 20:32:39 2026
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 10 Feb 2026 21:32:39 -0500
Subject: [petsc-users] Limiting the number of vectors allocated at a
 time by fgmres etc.
In-Reply-To: <SA1PR03MB6387FD4C342777BD437ABA1AED63A@SA1PR03MB6387.namprd03.prod.outlook.com>
References: <SA1PR03MB6387D98F49E67D0FA2A54181ED62A@SA1PR03MB6387.namprd03.prod.outlook.com>
	<CAMYG4GmFLwVv8LsK8niEwOsRaW==c30pvjvtUEMpx7vk1MJxOg@mail.gmail.com>
	<SA1PR03MB6387FD4C342777BD437ABA1AED63A@SA1PR03MB6387.namprd03.prod.outlook.com>
Message-ID: <6F3A41A6-90B5-475D-88A3-C2BF6AE53547@petsc.dev>


  1) For a fixed restart (say of 30) FGMRES needs 60 vectors, while GMRES only needs 30. This is a big disadvantage of FGMRES over GMRES.

  2) By default PETSc GMRES uses a restart of 30 meaning it keeps 30 previous Krylov vectors (and FGMRES needs 60 vectors). You can use a smaller restart with KSPGMRESSetRestart or -ksp_gmres_restart to need less memory (of course the convergence may get far worse or not depending on the problem.

  3) When GMRES (or FGMRES) starts up it does not immediately allocate all 30 (or whatever) restart vectors because it may be that GMRES only takes 15 steps so why allocate all of them? Instead  it allocates a chunk at a time GMRES_DELTA_DIRECTIONS which is 10 when it uses up the 10 it allocates another 10 (if needed) etc until it gets to the restart You can force GMRES to allocate all 30 (or whatever) initially instead of the chunk of a time approach by using ?KSPGMRESSetPreAllocateVectors() or -ksp_gmres_preallocate  This will prevent confusion about why more vectors are allocated later and why they are not all allocated when the solve starts.

4) PETSc?s GMRES tries to use BLAS 2 operations for MDot() and MAXPY (the orthogonalization in GMRES). It can only use the BLAS 2 on vector chunks that are allocated together. By preallocating all the vectors at the beginning one gets a single chunk and hence more efficient orthogonalization; this is more important on GPUs. For CPUs whether you have 10 or 30 vectors together doesn?t matter much at all.

   I hope this clarifies why you are seeing the memory allocations. Note that these are NOT ?reallocations? in the sense of KSPGMRES allocating more memory and then copying something into the new memory and freeing the old. They are just allocations of new memory which will then be used.

   Barry


   

> On Feb 10, 2026, at 9:04?PM, Wells, David via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hi Matt,
> 
> Thanks for the quick response!
> 
> > I cannot understand precisely what is happening here. You specify a restart
> > size when you setup the KSP. It allocates that many vecs (roughly). Why are
> > there reallocations? Do you increase the restart size during the iteration?
> 
> I don't believe there are any reallocations (I didn't write this solver, but I
> don't see any calls which set the restart size or any other relevant parameter [1]):
> as far as I can tell, the solver just allocates a lot of vectors. I'm working
> off of traces computed by heaptrack, which is my only insight into how this
> works. The allocations come from KSPCreateVecs(), which is called by
> 1. KSPFGMRESGetNewVectors() (for about 1.7 GB [2] of memory)
> 2. KSPSetUp_GMRES() (for about 300 MB of memory)
> 3. KSPSetUp_FGMRES() (for about 264 MB of memory)
> 4. KSPSetWorkVecs() (for about 236 MB of memory)
> 
> Is there some relevant set of monitoring flags I can set which will show me how
> many vectors I allocate or use? That would also help.
> 
> Best,
> David
> 
> [1] This is IBAMR's PETScKrylovLinearSolver.
> [2] This is half the total memory we use for side-centered data vectors.
> From: Matthew Knepley <knepley at gmail.com>
> Sent: Tuesday, February 10, 2026 6:28 PM
> To: Wells, David <drwells at email.unc.edu>
> Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc.
>  
> On Tue, Feb 10, 2026 at 5:32?PM Wells, David via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
> Hello,
> 
> I've been profiling the memory usage of my solver and it looks like a huge number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I read through the source code and it looks like these vectors are allocated ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that KSP. Is there a way to change this value?
> 
> We could add an option to change this delta. Actually theory suggests that a constant is not optimal, but rather we should double the number each time. I would also be willing to code that.
>  
> If not - how hard would it be to add an API to set a different initial value for that? These vectors take up a lot of memory and I would rather just one at a time.
> 
> I cannot understand precisely what is happening here. You specify a restart size when you setup the KSP. It allocates that many vecs (roughly). Why are there reallocations? Do you increase the restart size during the iteration?
> 
>   Thanks,
> 
>      Matt
>  
> Best,
> David Wells
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f_cD6IEkmZUnH3gBVZJzlw3nqwNOD8FwvFPE32j3bn32qr-X0lkuz3NQ5CjDMqt5xLP-9CV1Et5Me4cGEVdFSVo$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aBTZ4oaySeEsyg3U98-DZfMmoj0Wg8RUNEzlPdQoEnZ6pfCuP3cZSF7ib3bqpT7GBGVct81F2vjmak_Zva9875mpIc0$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260210/55eb0261/attachment.html>

From junchao.zhang at gmail.com  Tue Feb 10 21:00:38 2026
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Tue, 10 Feb 2026 21:00:38 -0600
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
Message-ID: <CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>

Sorry, I don't understand your question.  What blocks you from running your
GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about the algorithms.

--Junchao Zhang


On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com> wrote:

> Hi Junchao,
>
> I have managed to configure Petsc for GPU, also managed to run ksp/ex15
> using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster
> compared to the scenario if I don't use " -mat_type aijcusparse -vec_type
> cuda". so I believe it runs okay for GPUs.
>
> I have an existing CFD code that runs natively on GPUs. so all the data is
> offloaded to GPU at the beginning and some data are copied back to the cpu
> at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for
> the implicit solver.  *My question is:  my code also has a GMRES+ILU(0)
> implemented with Petsc but it only runs on cpus (which I implemented a few
> years ago). How can I replace the existing Newton-Jacobi (which runs in
> GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give
> some advice?*
>
> Thanks,
> Feng
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 23:18
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hi Feng,
>   At the first step, you don't need to change your CPU implementation.
> Then do profiling to see where it is worth putting your effort.  Maybe you
> need to assemble your matrices and vectors on GPUs too, but decide that at
> a later stage.
>
>   Thanks!
> --Junchao Zhang
>
>
> On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Many thanks for your reply.
>
> This is great!  Do I need to change anything for my current CPU
> implementation? or I just link to a version of Petsc that is configured
> with  cuda and make sure the necessary data are copied to the "device",
> then Petsc will do the rest magic for me?
>
> Thanks,
> Feng
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 1:55
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hello Feng,
>   It is possible to run GMRES with ILU(0) on GPUs.  You may need to
> configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with
> extra --download-kokkos  --download-kokkos-kernels).  Then run with
> -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
>   But triangular solve is not GPU friendly and the performance might be
> poor.  But you should try it, I think.
>
> Thanks!
> --Junchao Zhang
>
> On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Dear All,
>
> I have an existing implementation of GMRES with ILU(0), it works well for
> cpu now. I went through the Petsc documentation, it seems Petsc has some
> support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?
>
> Many thanks for your help in advance,
> Feng
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260210/86cd24e4/attachment-0001.html>

From snailsoar at hotmail.com  Wed Feb 11 04:47:51 2026
From: snailsoar at hotmail.com (feng wang)
Date: Wed, 11 Feb 2026 10:47:51 +0000
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
Message-ID: <SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>

Hi Junchao,

Thanks for your reply. Probably I did not phrase it in a clear way.

I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code.  For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system.

The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal?

BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly

Thanks,
Feng


________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: 11 February 2026 3:00
To: feng wang <snailsoar at hotmail.com>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Barry Smith <bsmith at petsc.dev>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Sorry, I don't understand your question.  What blocks you from running your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about the algorithms.

--Junchao Zhang


On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs.

I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver.  My question is:  my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice?

Thanks,
Feng

________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 23:18
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hi Feng,
  At the first step, you don't need to change your CPU implementation.  Then do profiling to see where it is worth putting your effort.  Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage.

  Thanks!
--Junchao Zhang


On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Many thanks for your reply.

This is great!  Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with  cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 1:55
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos  --download-kokkos-kernels).  Then run with -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/19f675de/attachment.html>

From knepley at gmail.com  Wed Feb 11 07:42:22 2026
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 11 Feb 2026 08:42:22 -0500
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
	<SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4GnD6L-8F2WeSv4wXPBYqi3+ZyAAgo1e2x2Ue-Fe+DTStQ@mail.gmail.com>

On Wed, Feb 11, 2026 at 5:55?AM feng wang <snailsoar at hotmail.com> wrote:

> Hi Junchao,
>
> Thanks for your reply. Probably I did not phrase it in a clear way.
>
> I am using openACC to port the CFD code to the GPU, so the CPU and the GPU
> version essentially share the same source code.  For the original CPU
> version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve
> the sparse linear system.
>
> The current GPU version of the code only port the Jacobi solver to the
> GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to
> make to the existing CPU version of GMRES+ILU(0) to achieve this goal?
>

I think what Junchao is saying, is that if you use the GPU vec and mat
types, this should be running on the GPU already. Does that not work?

  Thanks,

     Matt


> BTW: For performance the GPU version of the CFD code has minimum
> communication between the CPU and GPU, so for Ax=b, A, x and b are created
> in the GPU directly
>
> Thanks,
> Feng
>
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 11 February 2026 3:00
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Barry Smith <
> bsmith at petsc.dev>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Sorry, I don't understand your question.  What blocks you from running
> your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about
> the algorithms.
>
> --Junchao Zhang
>
>
> On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> I have managed to configure Petsc for GPU, also managed to run ksp/ex15
> using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster
> compared to the scenario if I don't use " -mat_type aijcusparse -vec_type
> cuda". so I believe it runs okay for GPUs.
>
> I have an existing CFD code that runs natively on GPUs. so all the data is
> offloaded to GPU at the beginning and some data are copied back to the cpu
> at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for
> the implicit solver.  *My question is:  my code also has a GMRES+ILU(0)
> implemented with Petsc but it only runs on cpus (which I implemented a few
> years ago). How can I replace the existing Newton-Jacobi (which runs in
> GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give
> some advice?*
>
> Thanks,
> Feng
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 23:18
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hi Feng,
>   At the first step, you don't need to change your CPU implementation.
> Then do profiling to see where it is worth putting your effort.  Maybe you
> need to assemble your matrices and vectors on GPUs too, but decide that at
> a later stage.
>
>   Thanks!
> --Junchao Zhang
>
>
> On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Many thanks for your reply.
>
> This is great!  Do I need to change anything for my current CPU
> implementation? or I just link to a version of Petsc that is configured
> with  cuda and make sure the necessary data are copied to the "device",
> then Petsc will do the rest magic for me?
>
> Thanks,
> Feng
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 1:55
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hello Feng,
>   It is possible to run GMRES with ILU(0) on GPUs.  You may need to
> configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with
> extra --download-kokkos  --download-kokkos-kernels).  Then run with
> -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
>   But triangular solve is not GPU friendly and the performance might be
> poor.  But you should try it, I think.
>
> Thanks!
> --Junchao Zhang
>
> On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Dear All,
>
> I have an existing implementation of GMRES with ILU(0), it works well for
> cpu now. I went through the Petsc documentation, it seems Petsc has some
> support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?
>
> Many thanks for your help in advance,
> Feng
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f7i0WZFkMTRDMqONYUCeDNn9XDQjXS7bps7XWsgAlnO54oH90yfmvfuu-0QJAbxqCNYSof3G34TuqHuiTIAb$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f7i0WZFkMTRDMqONYUCeDNn9XDQjXS7bps7XWsgAlnO54oH90yfmvfuu-0QJAbxqCNYSof3G34TuqDSTsgpM$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/a6399026/attachment-0001.html>

From mfadams at lbl.gov  Wed Feb 11 08:38:30 2026
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 11 Feb 2026 09:38:30 -0500
Subject: [petsc-users] DG methods in PETSc
In-Reply-To: <AS8P251MB0853D33BE6BB232B3B7519EF8162A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
References: <AS8P251MB0853D33BE6BB232B3B7519EF8162A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
Message-ID: <CADOhEh4D_KCYEK14NYeHA8My+bsLU9suknntLKZU36v3PY6BuA@mail.gmail.com>

DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see
a test.
Here is a test but it is not well verified.
Mark

On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hello, I already posted on Reddit but just to be sure I write even here.
>
> First thanks for the job you do for PETSc, I have used it for several
> projects and is  always nice.
>
> I am writing cause  I am getting mad trying to implement DG solver in
> PETSc, the target is the Euler equations, however I am failing even with
> just the simplest transport equation (u/t + u/x = 0). I was wondering if I
> am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces,
> and more, but I keep failing, I tried also with LLMs, but seems like there
> is no DG code with PETSc on the web, however I see many papers that do it.
>
> I was  wondering if I am maybe missing  something out or what.
>
> For reference I use PETSc 3.24.3 by means of nix.
>
> Thanks in advance, cheers.
>
> Matteo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/5714063f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex9.c
Type: application/octet-stream
Size: 10367 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/5714063f/attachment.obj>

From mpll at dhigroup.com  Wed Feb 11 02:43:04 2026
From: mpll at dhigroup.com (Dr. Milan Pelletier)
Date: Wed, 11 Feb 2026 08:43:04 +0000
Subject: [petsc-users] Error when configuring PETSc with CUDA on Windows
Message-ID: <AM6PR05MB6392297D17EAA4219DFC1A80B263A@AM6PR05MB6392.eurprd05.prod.outlook.com>

Dear PETSc team,

I am currently trying to upgrade my PETSc package on Windows to 3.24.4, and I'm facing issues when configuring with CUDA.
Unfortunately, I cannot clearly see the error that causes the configure process to bail out
Please find enclosed the configure.log file.

My laptop setup:

  *
OS: Windows 11 23H2 (if that helps, running "uname -a" in Cygwin yields: "CYGWIN_NT-10.0-22631 MPLL-PC2 3.6.6-1.x86_64 2026-01-09 17:39 UTC x86_64 Cygwin")
  *
CPU: Intel  Core i7-1370P
  *
RAM: 32 GB
  *
MPI: Intel oneAPI MPI
  *
Compiler: MSVC v19.44.35222
  *
CUDA toolkit v13.1

Please let me know if some information is missing.
Many thanks for your help,

Kind regards,
Dr. Milan Pelletier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/5fafb27b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 3572665 bytes
Desc: configure.log
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/5fafb27b/attachment-0001.obj>

From balay.anl at fastmail.org  Wed Feb 11 09:07:00 2026
From: balay.anl at fastmail.org (Satish Balay)
Date: Wed, 11 Feb 2026 09:07:00 -0600 (CST)
Subject: [petsc-users] Error when configuring PETSc with CUDA on Windows
In-Reply-To: <AM6PR05MB6392297D17EAA4219DFC1A80B263A@AM6PR05MB6392.eurprd05.prod.outlook.com>
References: <AM6PR05MB6392297D17EAA4219DFC1A80B263A@AM6PR05MB6392.eurprd05.prod.outlook.com>
Message-ID: <fa539f4a-7660-fbdf-d91a-c1f387963162@fastmail.org>

Sorry -  PETSc+CUDA does not currently work on windows [and we have not been able to get a successful build here]

Satish

On Wed, 11 Feb 2026, Dr. Milan Pelletier wrote:

> Dear PETSc team,
> 
> I am currently trying to upgrade my PETSc package on Windows to 3.24.4, and I'm facing issues when configuring with CUDA.
> Unfortunately, I cannot clearly see the error that causes the configure process to bail out
> Please find enclosed the configure.log file.
> 
> My laptop setup:
> 
>   *
> OS: Windows 11 23H2 (if that helps, running "uname -a" in Cygwin yields: "CYGWIN_NT-10.0-22631 MPLL-PC2 3.6.6-1.x86_64 2026-01-09 17:39 UTC x86_64 Cygwin")
>   *
> CPU: Intel  Core i7-1370P
>   *
> RAM: 32 GB
>   *
> MPI: Intel oneAPI MPI
>   *
> Compiler: MSVC v19.44.35222
>   *
> CUDA toolkit v13.1
> 
> Please let me know if some information is missing.
> Many thanks for your help,
> 
> Kind regards,
> Dr. Milan Pelletier
> 


From junchao.zhang at gmail.com  Wed Feb 11 09:17:57 2026
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Wed, 11 Feb 2026 09:17:57 -0600
Subject: [petsc-users] Happening in less 45 minutes -- PETSc Online BoF
 (Free Registration)
In-Reply-To: <CA+MQGp-s0AcTMeBfoVrwEb8ho+97OZzU9FO0B9FN0xZesxOaEg@mail.gmail.com>
References: <CA+MQGp_YTqRUCrAM5kwj7PUascVs3M_8kumzWUyO5-c+cjfWCA@mail.gmail.com>
	<CA+MQGp-s0AcTMeBfoVrwEb8ho+97OZzU9FO0B9FN0xZesxOaEg@mail.gmail.com>
Message-ID: <CA+MQGp9moi=UpTR0r-QMCOLjA=v9YRxym-6eZDA0umCqQEroyg@mail.gmail.com>

  The BoF session will begin in less than 45 minutes.  The Zoom is already
open for testing (with the CASS intro slides).  See link info below.

  See you soon!
-Junchao


> On Mon, Feb 2, 2026 at 12:04?PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> Dear PETSc community,
>>
>> PETSc will host a free online Birds-of-a-Feather (BoF) session on *February
>> 11, 2026*, from *10:00?11:30 am (Central Time, US and Canada)*. The BoF
>> will not be recorded.
>>
>> The agenda is available at
>> https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!fR02TLjeNiZskbApDs8e50ol_dTg0JXjJge8YtUT75h0T4h6Lxk1vSXCKfyQ5QOZX1bUYowZsUZ8yUSePt52D5vogrCO$ 
>>
>> Please register in advance at
>> https://urldefense.us/v3/__https://argonne.zoomgov.com/meeting/register/ay4bMcRgSZaZ-l7u9AzAzQ__;!!G_uCfscf7eWS!fR02TLjeNiZskbApDs8e50ol_dTg0JXjJge8YtUT75h0T4h6Lxk1vSXCKfyQ5QOZX1bUYowZsUZ8yUSePt52D1HyPQTE$ 
>> To receive a Zoom link, the organizer requires all participants to
>> register individually. Registration is quick and requires only your name
>> and email address.
>>
>> We look forward to your participation and to a productive and engaging
>> discussion.
>>
>> Thank you,
>> Junchao Zhang
>> On behalf of the PETSc team
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/d5943532/attachment.html>

From snailsoar at hotmail.com  Wed Feb 11 09:58:48 2026
From: snailsoar at hotmail.com (feng wang)
Date: Wed, 11 Feb 2026 15:58:48 +0000
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <CAMYG4GnD6L-8F2WeSv4wXPBYqi3+ZyAAgo1e2x2Ue-Fe+DTStQ@mail.gmail.com>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
	<SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4GnD6L-8F2WeSv4wXPBYqi3+ZyAAgo1e2x2Ue-Fe+DTStQ@mail.gmail.com>
Message-ID: <SY8P300MB012845BA5F6260F713F82434B963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>

Hi Mat,

Thanks for your reply. Maybe I am overthinking it.

ksp/ex15 works fine with GPUs.

To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device.

Below is a snippet of my current petsc implementation. Suppose I have:

      ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);
      ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);

      //duplicate
      ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);

      //create preconditioning matrix
      ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,
                            maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr);

If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device?

Below is how I assign values for the matrix:

         nnz=0;
         for(jv=0; jv<nv; jv++)
        {
            for(iv=0; iv<nv; iv++)
           {
               values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the left hand side is [I/dt + (-J)]
               nnz++;
           }
        }

         idxm[0] = ig_mat[iql];
         idxn[0] = ig_mat[iqr];
         ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values, ADD_VALUES);
         CHKERRQ(ierr);
     }

Does petsc first set the value in the host and copy it to the device or the value is directly assigned in the device. in the 2nd case, I would need change my code a bit, since I need to make sure the data is in the device in the first place.

Thanks,
Feng



________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: 11 February 2026 13:42
To: feng wang <snailsoar at hotmail.com>
Cc: Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

On Wed, Feb 11, 2026 at 5:55?AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Thanks for your reply. Probably I did not phrase it in a clear way.

I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code.  For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system.

The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal?

I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work?

  Thanks,

     Matt

BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly

Thanks,
Feng


________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 11 February 2026 3:00
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Sorry, I don't understand your question.  What blocks you from running your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about the algorithms.

--Junchao Zhang


On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs.

I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver.  My question is:  my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice?

Thanks,
Feng

________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 23:18
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hi Feng,
  At the first step, you don't need to change your CPU implementation.  Then do profiling to see where it is worth putting your effort.  Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage.

  Thanks!
--Junchao Zhang


On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Many thanks for your reply.

This is great!  Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with  cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 1:55
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos  --download-kokkos-kernels).  Then run with -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d_EEF6jyNQtfEhGsUPb8_rQ08cf8731pynjpilB9qWxrAda9t0oHNPkHWuJRVp1YEtQPM66JtPnQ9YHHHcW44l3FHA$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d_EEF6jyNQtfEhGsUPb8_rQ08cf8731pynjpilB9qWxrAda9t0oHNPkHWuJRVp1YEtQPM66JtPnQ9YHHHcX2ukoduw$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/c7273abb/attachment-0001.html>

From mfadams at lbl.gov  Wed Feb 11 10:25:53 2026
From: mfadams at lbl.gov (Mark Adams)
Date: Wed, 11 Feb 2026 11:25:53 -0500
Subject: [petsc-users] DG methods in PETSc
In-Reply-To: <AS8P251MB085386B07B0AA270D1F8ACFE8163A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
References: <AS8P251MB0853D33BE6BB232B3B7519EF8162A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh4D_KCYEK14NYeHA8My+bsLU9suknntLKZU36v3PY6BuA@mail.gmail.com>
	<AS8P251MB085386B07B0AA270D1F8ACFE8163A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
Message-ID: <CADOhEh5SDji9uJSoMGnYxf_W7dLJMEmaRXtwYDvNi70EunU1qw@mail.gmail.com>

Great, and keep it on the list. Lots of people here to help!

On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone <matteo4.leone at mail.polimi.it>
wrote:

> Wow thank you so much! I was almost hopeless, I'll deep dive into it and
> I'll give you a feedback.
>
> Matteo
>
> ------------------------------
> *From:* Mark Adams <mfadams at lbl.gov>
> *Sent:* Wednesday, February 11, 2026 3:38:30 PM
> *To:* Matteo Leone <matteo4.leone at mail.polimi.it>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] DG methods in PETSc
>
> DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see
> a test.
> Here is a test but it is not well verified.
> Mark
>
> On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello, I already posted on Reddit but just to be sure I write even here.
>
> First thanks for the job you do for PETSc, I have used it for several
> projects and is  always nice.
>
> I am writing cause  I am getting mad trying to implement DG solver in
> PETSc, the target is the Euler equations, however I am failing even with
> just the simplest transport equation (u/t + u/x = 0). I was wondering if I
> am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces,
> and more, but I keep failing, I tried also with LLMs, but seems like there
> is no DG code with PETSc on the web, however I see many papers that do it.
>
> I was  wondering if I am maybe missing  something out or what.
>
> For reference I use PETSc 3.24.3 by means of nix.
>
> Thanks in advance, cheers.
>
> Matteo
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/606ac692/attachment.html>

From knepley at gmail.com  Wed Feb 11 10:32:37 2026
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 11 Feb 2026 11:32:37 -0500
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <SY8P300MB012845BA5F6260F713F82434B963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
	<SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4GnD6L-8F2WeSv4wXPBYqi3+ZyAAgo1e2x2Ue-Fe+DTStQ@mail.gmail.com>
	<SY8P300MB012845BA5F6260F713F82434B963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4G=p0BoHWgYXZR9RkjSFUk+0G2KLbNdQJJi-ChRM-bLeHg@mail.gmail.com>

On Wed, Feb 11, 2026 at 10:58?AM feng wang <snailsoar at hotmail.com> wrote:

> Hi Mat,
>
> Thanks for your reply. Maybe I am overthinking it.
>
> ksp/ex15 works fine with GPUs.
>
> To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how
> Petsc handle the memory in the host and the device.
>
> Below is a snippet of my current petsc implementation. Suppose I have:
>
>       ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize,
> blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);
>

This is the problem. Right now VecGhost hardcodes the use of VECSEQ and
VECMPI. This is not necessary, and the local and global representations
could indeed be device types. Is ghost necessary right now?


>       ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);
>
>       //duplicate
>       ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);
>
>       //create preconditioning matrix
>       ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize,
> nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,
>                             maxneig, NULL, maxneig, NULL, &petsc_A_pre);
> CHKERRQ(ierr);
>

I would not create the specific type. Rather you create a generic Mat, set
the blocksize, and then MatSetFromOptions(). Then you can set the type from
the command line, like baij or aijcusparse, etc.


> *If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and
> vectors directly created in the device?*
>
> Below is how I assign values for the matrix:
>
>          nnz=0;
>          for(jv=0; jv<nv; jv++)
>         {
>             for(iv=0; iv<nv; iv++)
>            {
>                values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the
> left hand side is [I/dt + (-J)]
>                nnz++;
>            }
>         }
>
>          idxm[0] = ig_mat[iql];
>          idxn[0] = ig_mat[iqr];
>          ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values,
> ADD_VALUES);
>          CHKERRQ(ierr);
>      }
>
> *Does petsc first set the value in the host and copy it to the device or
> the value is directly assigned in the device. in the 2nd case, I would need
> change my code a bit, since I need to make sure the data is in the device
> in the first place.*
>

Yes, you would need to set the values on device for maximum efficiency
(although I would try it out with CPU construction first). You can do this
best on the GPU using MatSetValuesCOO().

  Thanks,

     Matt


> Thanks,
> Feng
>
>
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* 11 February 2026 13:42
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> On Wed, Feb 11, 2026 at 5:55?AM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Thanks for your reply. Probably I did not phrase it in a clear way.
>
> I am using openACC to port the CFD code to the GPU, so the CPU and the GPU
> version essentially share the same source code.  For the original CPU
> version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve
> the sparse linear system.
>
> The current GPU version of the code only port the Jacobi solver to the
> GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to
> make to the existing CPU version of GMRES+ILU(0) to achieve this goal?
>
>
> I think what Junchao is saying, is that if you use the GPU vec and mat
> types, this should be running on the GPU already. Does that not work?
>
>   Thanks,
>
>      Matt
>
>
> BTW: For performance the GPU version of the CFD code has minimum
> communication between the CPU and GPU, so for Ax=b, A, x and b are created
> in the GPU directly
>
> Thanks,
> Feng
>
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 11 February 2026 3:00
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Barry Smith <
> bsmith at petsc.dev>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Sorry, I don't understand your question.  What blocks you from running
> your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about
> the algorithms.
>
> --Junchao Zhang
>
>
> On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> I have managed to configure Petsc for GPU, also managed to run ksp/ex15
> using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster
> compared to the scenario if I don't use " -mat_type aijcusparse -vec_type
> cuda". so I believe it runs okay for GPUs.
>
> I have an existing CFD code that runs natively on GPUs. so all the data is
> offloaded to GPU at the beginning and some data are copied back to the cpu
> at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for
> the implicit solver.  *My question is:  my code also has a GMRES+ILU(0)
> implemented with Petsc but it only runs on cpus (which I implemented a few
> years ago). How can I replace the existing Newton-Jacobi (which runs in
> GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give
> some advice?*
>
> Thanks,
> Feng
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 23:18
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hi Feng,
>   At the first step, you don't need to change your CPU implementation.
> Then do profiling to see where it is worth putting your effort.  Maybe you
> need to assemble your matrices and vectors on GPUs too, but decide that at
> a later stage.
>
>   Thanks!
> --Junchao Zhang
>
>
> On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Many thanks for your reply.
>
> This is great!  Do I need to change anything for my current CPU
> implementation? or I just link to a version of Petsc that is configured
> with  cuda and make sure the necessary data are copied to the "device",
> then Petsc will do the rest magic for me?
>
> Thanks,
> Feng
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 1:55
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hello Feng,
>   It is possible to run GMRES with ILU(0) on GPUs.  You may need to
> configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with
> extra --download-kokkos  --download-kokkos-kernels).  Then run with
> -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
>   But triangular solve is not GPU friendly and the performance might be
> poor.  But you should try it, I think.
>
> Thanks!
> --Junchao Zhang
>
> On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Dear All,
>
> I have an existing implementation of GMRES with ILU(0), it works well for
> cpu now. I went through the Petsc documentation, it seems Petsc has some
> support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?
>
> Many thanks for your help in advance,
> Feng
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d-KREIqaQAY300svdh3ajsScQ_ugv65710AClzWVTz0yXoUEtJbgNcpLxH1j3QQwkdKAAp5LBC2O5kFvTZ2y$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d-KREIqaQAY300svdh3ajsScQ_ugv65710AClzWVTz0yXoUEtJbgNcpLxH1j3QQwkdKAAp5LBC2O5uumbxAw$ >
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d-KREIqaQAY300svdh3ajsScQ_ugv65710AClzWVTz0yXoUEtJbgNcpLxH1j3QQwkdKAAp5LBC2O5kFvTZ2y$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d-KREIqaQAY300svdh3ajsScQ_ugv65710AClzWVTz0yXoUEtJbgNcpLxH1j3QQwkdKAAp5LBC2O5uumbxAw$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/11058bb8/attachment-0001.html>

From junchao.zhang at gmail.com  Wed Feb 11 12:04:09 2026
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Wed, 11 Feb 2026 12:04:09 -0600
Subject: [petsc-users] 
	=?utf-8?q?PETSc_Online_BoF_=E2=80=94_February_11?=
	=?utf-8?q?=2C_2026_=28Free_Registration=29?=
In-Reply-To: <CA+MQGp_YTqRUCrAM5kwj7PUascVs3M_8kumzWUyO5-c+cjfWCA@mail.gmail.com>
References: <CA+MQGp_YTqRUCrAM5kwj7PUascVs3M_8kumzWUyO5-c+cjfWCA@mail.gmail.com>
Message-ID: <CA+MQGp-unPBCWu=LCx3+gxq2hfpS2oaM3aieRw6iOeqLsqjESg@mail.gmail.com>

Thanks to all for participating in the PETSc BoF.  Sorry, we were running
out of time and did have a good time for open discussion.

If you have questions about the talks, and feature requests, you can reach
us by this mailing list, or on the PETSc Discord server,
https://urldefense.us/v3/__https://discord.gg/Fqm8r6Gcyb__;!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysg3J7YYF6$ .  The #general
<https://urldefense.us/v3/__https://discord.com/channels/1119324534303109172/1204601051286609990__;!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysg6vQehDE$ > and
help-forum
<https://urldefense.us/v3/__https://discord.com/channels/1119324534303109172/1126553516077490246__;!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysg5A8mEZv$ >
channels are all good places to ask questions.

The slides of the talks (if available) will be uploaded to the agenda page
soon,
https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysgxjBj9Of$ 

 Thank you!
--Junchao Zhang

On Mon, Feb 2, 2026 at 12:04?PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Dear PETSc community,
>
> PETSc will host a free online Birds-of-a-Feather (BoF) session on *February
> 11, 2026*, from *10:00?11:30 am (Central Time, US and Canada)*. The BoF
> will not be recorded.
>
> The agenda is available at
> https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysgxjBj9Of$ 
>
> Please register in advance at
> https://urldefense.us/v3/__https://argonne.zoomgov.com/meeting/register/ay4bMcRgSZaZ-l7u9AzAzQ__;!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysg57bpbRf$ 
> To receive a Zoom link, the organizer requires all participants to
> register individually. Registration is quick and requires only your name
> and email address.
>
> We look forward to your participation and to a productive and engaging
> discussion.
>
> Thank you,
> Junchao Zhang
> On behalf of the PETSc team
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260211/8ddde4a1/attachment.html>

From alexandre.scotto at irt-saintexupery.com  Thu Feb 12 05:47:41 2026
From: alexandre.scotto at irt-saintexupery.com (SCOTTO Alexandre)
Date: Thu, 12 Feb 2026 11:47:41 +0000
Subject: [petsc-users] Scalability
Message-ID: <a2f3516fb998412a85a235af16c1567a@irt-saintexupery.com>

Dear PETSc community,

I have conducted a quick strong scalability-like test on direct and adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, distributed over 2, 4, ..., 32 and 64 processes and the results I obtained were not so great.
I am not very confident in my setup, so a as a matter of reference, is there any available results on weak and strong scalability of PETSc.Mat mult() and multTranspose() operations?

Best regards,
Alexandre.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260212/7187fe3d/attachment.html>

From knepley at gmail.com  Thu Feb 12 08:00:30 2026
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 12 Feb 2026 09:00:30 -0500
Subject: [petsc-users] Scalability
In-Reply-To: <a2f3516fb998412a85a235af16c1567a@irt-saintexupery.com>
References: <a2f3516fb998412a85a235af16c1567a@irt-saintexupery.com>
Message-ID: <CAMYG4GmSu_q2j+ysWo5LPUBrDrbCnSN7QZw-o87fq4gpyDUz0w@mail.gmail.com>

On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Dear PETSc community,
>
>
>
> I have conducted a quick strong scalability-like test on direct and
> adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix,
> distributed over 2, 4, ?, 32 and 64 processes and the results I obtained
> were not so great.
>
> I am not very confident in my setup, so a as a matter of reference, is
> there any available results on weak and strong scalability of PETSc.Mat
> mult() and multTranspose() operations?
>

1. This behavior depends on available bandwidth, not on cores. Do you know
the bandwidth for your configurations?

2. Strong scaling depends heavily on matrix sparsity. If inevitably
declines, but slower with more work to do.

3. We published a paper on performance recently:
https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NIIoVXnb$ 

  Thanks,

      Matt

>
>
> Best regards,
>
> Alexandre.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NPzkrito$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NEclfLI_$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260212/94b041e9/attachment-0001.html>

From matteo.semplice at uninsubria.it  Thu Feb 12 08:37:07 2026
From: matteo.semplice at uninsubria.it (Matteo Semplice)
Date: Thu, 12 Feb 2026 15:37:07 +0100
Subject: [petsc-users] DMplex periodicity type
Message-ID: <0daa1308-59df-4921-85b3-2e1a18be0b86@uninsubria.it>

Dear all,

 ? ? regarding the two possible ways to represent periodic meshes in 
DMPlex (https://urldefense.us/v3/__https://petsc.org/release/manual/dmplex/*__;Iw!!G_uCfscf7eWS!aZty3mcKlmFws6oDH6dzJajWmHL38hq4105g81qWT-9vNW5RcQ7t39SXetxoJEpVVF_VJno1viNSyiomGVPsLrvdvU75tWZbkMCJaQ$ ), I have noticed that 
when reading in a GMSH mesh, the periodic topology is created.

Is there a way to read in a mesh and create instead the non-periodic 
topology&local-to-global-map kind of mesh? Or, is it possible to convert 
between the two approaces (i.e. generate a DMPlex that is a quasi-clone 
of a given one, but ready for the other kind of periodicity handling)?

Bets regards

 ? ? Matteo

-- 
Prof. Matteo Semplice
Universit? degli Studi dell?Insubria
Dipartimento di Scienza e Alta Tecnologia ? DiSAT
Professore Associato
Via Valleggio, 11 ? 22100 Como (CO) ? Italia
tel.: +39 031 2386316


From knepley at gmail.com  Thu Feb 12 08:41:41 2026
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 12 Feb 2026 09:41:41 -0500
Subject: [petsc-users] DMplex periodicity type
In-Reply-To: <0daa1308-59df-4921-85b3-2e1a18be0b86@uninsubria.it>
References: <0daa1308-59df-4921-85b3-2e1a18be0b86@uninsubria.it>
Message-ID: <CAMYG4Gk2mO3RzxPYgqg3OXcU8OcdrQUwdbOGoDJ-LtLx4mWopQ@mail.gmail.com>

On Thu, Feb 12, 2026 at 9:37?AM Matteo Semplice via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Dear all,
>
>      regarding the two possible ways to represent periodic meshes in
> DMPlex (
> https://urldefense.us/v3/__https://petsc.org/release/manual/dmplex/*__;Iw!!G_uCfscf7eWS!aZty3mcKlmFws6oDH6dzJajWmHL38hq4105g81qWT-9vNW5RcQ7t39SXetxoJEpVVF_VJno1viNSyiomGVPsLrvdvU75tWZbkMCJaQ$
> ), I have noticed that
> when reading in a GMSH mesh, the periodic topology is created.
>
> Is there a way to read in a mesh and create instead the non-periodic
> topology&local-to-global-map kind of mesh?


That code is not in there. This seems easier to write.


> Or, is it possible to convert
> between the two approaces (i.e. generate a DMPlex that is a quasi-clone
> of a given one, but ready for the other kind of periodicity handling)?
>

This code is not either. This seems harder because you have to choose some
periodic boundary to tear along.

I do not really understand the other periodicity, so it would be difficult
for me to
do before the Fall. The libCEED guys wrote the map version.

  Thanks,

     Matt


> Bets regards
>
>      Matteo
>
> --
> Prof. Matteo Semplice
> Universit? degli Studi dell?Insubria
> Dipartimento di Scienza e Alta Tecnologia ? DiSAT
> Professore Associato
> Via Valleggio, 11 ? 22100 Como (CO) ? Italia
> tel.: +39 031 2386316
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dBeL3LumXFsm8dLmfMgJBRhh1F7zXrHaGD1sf6lSy7pnnzjs2EvBJNzne0yCEx6vObTmNqz-Q0nJ2jo8JqHA$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dBeL3LumXFsm8dLmfMgJBRhh1F7zXrHaGD1sf6lSy7pnnzjs2EvBJNzne0yCEx6vObTmNqz-Q0nJ2sF9lMbM$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260212/4ee06b81/attachment.html>

From bsmith at petsc.dev  Thu Feb 12 10:09:47 2026
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 12 Feb 2026 11:09:47 -0500
Subject: [petsc-users] Scalability
In-Reply-To: <CAMYG4GmSu_q2j+ysWo5LPUBrDrbCnSN7QZw-o87fq4gpyDUz0w@mail.gmail.com>
References: <a2f3516fb998412a85a235af16c1567a@irt-saintexupery.com>
	<CAMYG4GmSu_q2j+ysWo5LPUBrDrbCnSN7QZw-o87fq4gpyDUz0w@mail.gmail.com>
Message-ID: <F3979F94-15D9-4F12-9E9C-E19601221CDE@petsc.dev>


   The problem size is also very small. Typically one cannot get speedup when the number of variables per MPI rank is below on the order of 10,000. In your 64 process case you only have 390 variables. I would be stunned with any kind of speedup for such sizes. Run a problem at least 10 times bigger, better yet 20 times.


> On Feb 12, 2026, at 9:00?AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>> Dear PETSc community,
>> 
>>  
>> 
>> I have conducted a quick strong scalability-like test on direct and adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, distributed over 2, 4, ?, 32 and 64 processes and the results I obtained were not so great.
>> 
>> I am not very confident in my setup, so a as a matter of reference, is there any available results on weak and strong scalability of PETSc.Mat mult() and multTranspose() operations?
>> 
> 
> 1. This behavior depends on available bandwidth, not on cores. Do you know the bandwidth for your configurations?
> 
> 2. Strong scaling depends heavily on matrix sparsity. If inevitably declines, but slower with more work to do.
> 
> 3. We published a paper on performance recently: https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!YF3tHamDD69QbY3u6HDwK1Ud9HoLgM-UWJ__gqZwYCS7b4H7RrYYNe6xkgbg7udGqPEhw2n6a8hEURca5-7lmBU$  <https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NIIoVXnb$>
> 
>   Thanks,
> 
>       Matt 
>>  
>> Best regards,
>> 
>> Alexandre.
>> 
> 
> 
> 
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YF3tHamDD69QbY3u6HDwK1Ud9HoLgM-UWJ__gqZwYCS7b4H7RrYYNe6xkgbg7udGqPEhw2n6a8hEURca4quC9R0$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NEclfLI_$>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260212/a8d83ea4/attachment-0001.html>

From snailsoar at hotmail.com  Thu Feb 12 17:14:16 2026
From: snailsoar at hotmail.com (feng wang)
Date: Thu, 12 Feb 2026 23:14:16 +0000
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <CAMYG4G=p0BoHWgYXZR9RkjSFUk+0G2KLbNdQJJi-ChRM-bLeHg@mail.gmail.com>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
	<SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4GnD6L-8F2WeSv4wXPBYqi3+ZyAAgo1e2x2Ue-Fe+DTStQ@mail.gmail.com>
	<SY8P300MB012845BA5F6260F713F82434B963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4G=p0BoHWgYXZR9RkjSFUk+0G2KLbNdQJJi-ChRM-bLeHg@mail.gmail.com>
Message-ID: <SY8P300MB01288EF61FBB9B70625B15D6B960A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>

Hi Mat,

Thanks for your reply.

For "VecCreateGhostBlock",  The CPU version runs in parallel, if we are solving Ax=b, so it also stores the halos in x and b for each partition. This is how my old implementation was done. If the current GPU implementation does not support halos, I can stick to one GPU for the moment. or is there a way around this?

Regarding to "Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.", my current CFD code also takes arguments from the command line, so I prefer I can set the types from the source code directly, so it does not mess around with arguments of the CFD code. Is there a way I can do this?

With respect to "MatSetValuesCOO()", I am new to this, and was using the old way to set the values.  For MatSetValuesCOO, it requires an argument "coo_v", how does it work if I want to set the values in the GPU directly? say, coo_v has the type of PetscScalar, do I need to create coo_v and assign its values directly in the GPU and then give it to MatSetValuesCOO?

Thanks for your help in advance.

Best regards,
Feng

________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: 11 February 2026 16:32
To: feng wang <snailsoar at hotmail.com>
Cc: Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

On Wed, Feb 11, 2026 at 10:58?AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Mat,

Thanks for your reply. Maybe I am overthinking it.

ksp/ex15 works fine with GPUs.

To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device.

Below is a snippet of my current petsc implementation. Suppose I have:

      ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);

This is the problem. Right now VecGhost hardcodes the use of VECSEQ and VECMPI. This is not necessary, and the local and global representations could indeed be device types. Is ghost necessary right now?

      ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);

      //duplicate
      ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);

      //create preconditioning matrix
      ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,
                            maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr);

I would not create the specific type. Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.

If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device?

Below is how I assign values for the matrix:

         nnz=0;
         for(jv=0; jv<nv; jv++)
        {
            for(iv=0; iv<nv; iv++)
           {
               values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the left hand side is [I/dt + (-J)]
               nnz++;
           }
        }

         idxm[0] = ig_mat[iql];
         idxn[0] = ig_mat[iqr];
         ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values, ADD_VALUES);
         CHKERRQ(ierr);
     }

Does petsc first set the value in the host and copy it to the device or the value is directly assigned in the device. in the 2nd case, I would need change my code a bit, since I need to make sure the data is in the device in the first place.

Yes, you would need to set the values on device for maximum efficiency (although I would try it out with CPU construction first). You can do this best on the GPU using MatSetValuesCOO().

  Thanks,

     Matt

Thanks,
Feng



________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: 11 February 2026 13:42
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

On Wed, Feb 11, 2026 at 5:55?AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Thanks for your reply. Probably I did not phrase it in a clear way.

I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code.  For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system.

The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal?

I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work?

  Thanks,

     Matt

BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly

Thanks,
Feng


________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 11 February 2026 3:00
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Sorry, I don't understand your question.  What blocks you from running your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about the algorithms.

--Junchao Zhang


On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs.

I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver.  My question is:  my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice?

Thanks,
Feng

________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 23:18
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hi Feng,
  At the first step, you don't need to change your CPU implementation.  Then do profiling to see where it is worth putting your effort.  Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage.

  Thanks!
--Junchao Zhang


On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Many thanks for your reply.

This is great!  Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with  cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 1:55
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos  --download-kokkos-kernels).  Then run with -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ePnxLVQy-xP1mJxzbezQDrvkZtZS5AvNLT229xh1bxbQNfkSdt2eVXK1-V29JwXphj2aVwaPhe8pWF4MJ_m2wZZbkQ$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ePnxLVQy-xP1mJxzbezQDrvkZtZS5AvNLT229xh1bxbQNfkSdt2eVXK1-V29JwXphj2aVwaPhe8pWF4MJ_lmYohA1Q$ >


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ePnxLVQy-xP1mJxzbezQDrvkZtZS5AvNLT229xh1bxbQNfkSdt2eVXK1-V29JwXphj2aVwaPhe8pWF4MJ_m2wZZbkQ$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ePnxLVQy-xP1mJxzbezQDrvkZtZS5AvNLT229xh1bxbQNfkSdt2eVXK1-V29JwXphj2aVwaPhe8pWF4MJ_lmYohA1Q$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260212/80a104b4/attachment-0001.html>

From knepley at gmail.com  Thu Feb 12 17:36:48 2026
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 12 Feb 2026 18:36:48 -0500
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <SY8P300MB01288EF61FBB9B70625B15D6B960A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
	<SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4GnD6L-8F2WeSv4wXPBYqi3+ZyAAgo1e2x2Ue-Fe+DTStQ@mail.gmail.com>
	<SY8P300MB012845BA5F6260F713F82434B963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4G=p0BoHWgYXZR9RkjSFUk+0G2KLbNdQJJi-ChRM-bLeHg@mail.gmail.com>
	<SY8P300MB01288EF61FBB9B70625B15D6B960A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4G=nstBtVzsa0_T+-6z5G5w12eEaXq1xNbdE93Zt2bhUYA@mail.gmail.com>

On Thu, Feb 12, 2026 at 6:14?PM feng wang <snailsoar at hotmail.com> wrote:

> Hi Mat,
>
> Thanks for your reply.
>
> For "VecCreateGhostBlock",  The CPU version runs in parallel, if we are
> solving Ax=b, so it also stores the halos in x and b for each partition.
> This is how my old implementation was done. If the current GPU
> implementation does not support halos, I can stick to one GPU for the
> moment. or is there a way around this?
>

There is a way around it. We have an open Issue. Someone needs to allow the
vectors to be created with another type. It is not hard, it just takes
time. I can do it starting the middle of March if you need it quickly.


> Regarding to "Rather you create a generic Mat, set the blocksize, and then
> MatSetFromOptions(). Then you can set the type from the command line, like
> baij or aijcusparse, etc.", my current CFD code also takes arguments from
> the command line, so I prefer I can set the types from the source code
> directly, so it does not mess around with arguments of the CFD code. Is
> there a way I can do this?
>

1) You can do that using

  MatCreate()
  MatSetSizes()
  MatSetBlockSize()
  MatSetType()

but, I still don't think you should do that.

2) You can provide PETSc options from any source you want using
PetscOptionsSetValue() and PetscOptionsInsertString(), so you can manage
them however you want.


> With respect to "MatSetValuesCOO()", I am new to this, and was using the
> old way to set the values.  For MatSetValuesCOO, it requires an argument
> "coo_v", how does it work if I want to set the values in the GPU directly?
> say, coo_v has the type of PetscScalar, do I need to create coo_v and
> assign its values directly in the GPU and then give it to MatSetValuesCOO?
>

Yes. COO is much more efficient on the GPU than calling SetValues()
individually. GPUs have horrible latency and hate branching. This is about
the only way to make them competitive with CPUs for building operators.

  Thanks,

      Matt


> Thanks for your help in advance.
>
> Best regards,
> Feng
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* 11 February 2026 16:32
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> On Wed, Feb 11, 2026 at 10:58?AM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Mat,
>
> Thanks for your reply. Maybe I am overthinking it.
>
> ksp/ex15 works fine with GPUs.
>
> To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how
> Petsc handle the memory in the host and the device.
>
> Below is a snippet of my current petsc implementation. Suppose I have:
>
>       ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize,
> blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);
>
>
> This is the problem. Right now VecGhost hardcodes the use of VECSEQ and
> VECMPI. This is not necessary, and the local and global representations
> could indeed be device types. Is ghost necessary right now?
>
>
>       ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);
>
>       //duplicate
>       ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);
>
>       //create preconditioning matrix
>       ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize,
> nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,
>                             maxneig, NULL, maxneig, NULL, &petsc_A_pre);
> CHKERRQ(ierr);
>
>
> I would not create the specific type. Rather you create a generic Mat, set
> the blocksize, and then MatSetFromOptions(). Then you can set the type from
> the command line, like baij or aijcusparse, etc.
>
>
> *If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and
> vectors directly created in the device?*
>
> Below is how I assign values for the matrix:
>
>          nnz=0;
>          for(jv=0; jv<nv; jv++)
>         {
>             for(iv=0; iv<nv; iv++)
>            {
>                values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the
> left hand side is [I/dt + (-J)]
>                nnz++;
>            }
>         }
>
>          idxm[0] = ig_mat[iql];
>          idxn[0] = ig_mat[iqr];
>          ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values,
> ADD_VALUES);
>          CHKERRQ(ierr);
>      }
>
> *Does petsc first set the value in the host and copy it to the device or
> the value is directly assigned in the device. in the 2nd case, I would need
> change my code a bit, since I need to make sure the data is in the device
> in the first place.*
>
>
> Yes, you would need to set the values on device for maximum efficiency
> (although I would try it out with CPU construction first). You can do this
> best on the GPU using MatSetValuesCOO().
>
>   Thanks,
>
>      Matt
>
>
> Thanks,
> Feng
>
>
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* 11 February 2026 13:42
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> On Wed, Feb 11, 2026 at 5:55?AM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Thanks for your reply. Probably I did not phrase it in a clear way.
>
> I am using openACC to port the CFD code to the GPU, so the CPU and the GPU
> version essentially share the same source code.  For the original CPU
> version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve
> the sparse linear system.
>
> The current GPU version of the code only port the Jacobi solver to the
> GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to
> make to the existing CPU version of GMRES+ILU(0) to achieve this goal?
>
>
> I think what Junchao is saying, is that if you use the GPU vec and mat
> types, this should be running on the GPU already. Does that not work?
>
>   Thanks,
>
>      Matt
>
>
> BTW: For performance the GPU version of the CFD code has minimum
> communication between the CPU and GPU, so for Ax=b, A, x and b are created
> in the GPU directly
>
> Thanks,
> Feng
>
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 11 February 2026 3:00
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Barry Smith <
> bsmith at petsc.dev>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Sorry, I don't understand your question.  What blocks you from running
> your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about
> the algorithms.
>
> --Junchao Zhang
>
>
> On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> I have managed to configure Petsc for GPU, also managed to run ksp/ex15
> using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster
> compared to the scenario if I don't use " -mat_type aijcusparse -vec_type
> cuda". so I believe it runs okay for GPUs.
>
> I have an existing CFD code that runs natively on GPUs. so all the data is
> offloaded to GPU at the beginning and some data are copied back to the cpu
> at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for
> the implicit solver.  *My question is:  my code also has a GMRES+ILU(0)
> implemented with Petsc but it only runs on cpus (which I implemented a few
> years ago). How can I replace the existing Newton-Jacobi (which runs in
> GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give
> some advice?*
>
> Thanks,
> Feng
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 23:18
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hi Feng,
>   At the first step, you don't need to change your CPU implementation.
> Then do profiling to see where it is worth putting your effort.  Maybe you
> need to assemble your matrices and vectors on GPUs too, but decide that at
> a later stage.
>
>   Thanks!
> --Junchao Zhang
>
>
> On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Many thanks for your reply.
>
> This is great!  Do I need to change anything for my current CPU
> implementation? or I just link to a version of Petsc that is configured
> with  cuda and make sure the necessary data are copied to the "device",
> then Petsc will do the rest magic for me?
>
> Thanks,
> Feng
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 1:55
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hello Feng,
>   It is possible to run GMRES with ILU(0) on GPUs.  You may need to
> configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with
> extra --download-kokkos  --download-kokkos-kernels).  Then run with
> -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
>   But triangular solve is not GPU friendly and the performance might be
> poor.  But you should try it, I think.
>
> Thanks!
> --Junchao Zhang
>
> On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Dear All,
>
> I have an existing implementation of GMRES with ILU(0), it works well for
> cpu now. I went through the Petsc documentation, it seems Petsc has some
> support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?
>
> Many thanks for your help in advance,
> Feng
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUivJyFt7C$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUiveMOA1n$ >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUivJyFt7C$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUiveMOA1n$ >
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUivJyFt7C$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUiveMOA1n$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260212/822e2835/attachment-0001.html>

From junchao.zhang at gmail.com  Thu Feb 12 17:44:57 2026
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Thu, 12 Feb 2026 17:44:57 -0600
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <SY8P300MB01288EF61FBB9B70625B15D6B960A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
	<SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4GnD6L-8F2WeSv4wXPBYqi3+ZyAAgo1e2x2Ue-Fe+DTStQ@mail.gmail.com>
	<SY8P300MB012845BA5F6260F713F82434B963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4G=p0BoHWgYXZR9RkjSFUk+0G2KLbNdQJJi-ChRM-bLeHg@mail.gmail.com>
	<SY8P300MB01288EF61FBB9B70625B15D6B960A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
Message-ID: <CA+MQGp9KBLW4d9ZYoAN3T6VPAu+owxYPxuaSKQp_APifXpyFqg@mail.gmail.com>

On Thu, Feb 12, 2026 at 5:14?PM feng wang <snailsoar at hotmail.com> wrote:

> Hi Mat,
>
> Thanks for your reply.
>
> For "VecCreateGhostBlock",  The CPU version runs in parallel, if we are
> solving Ax=b, so it also stores the halos in x and b for each partition.
> This is how my old implementation was done. If the current GPU
> implementation does not support halos, I can stick to one GPU for the
> moment. or is there a way around this?
>
PETSc currently doesn't support ghost vectors on device, though we plan to
support it.


>
> Regarding to "Rather you create a generic Mat, set the blocksize, and then
> MatSetFromOptions(). Then you can set the type from the command line, like
> baij or aijcusparse, etc.", my current CFD code also takes arguments from
> the command line, so I prefer I can set the types from the source code
> directly, so it does not mess around with arguments of the CFD code. Is
> there a way I can do this?
>
petsc asscpets options from three sources: 1) command line; 2) the .petscrc
file; 3) the PETSC_OPTIONS env var.  You can use the latter two approaches.


>
> With respect to "MatSetValuesCOO()", I am new to this, and was using the
> old way to set the values.  For MatSetValuesCOO, it requires an argument
> "coo_v", how does it work if I want to set the values in the GPU directly?
> say, coo_v has the type of PetscScalar, do I need to create coo_v and
> assign its values directly in the GPU and then give it to MatSetValuesCOO?
>
COO routines are used to assemble the matrix on device.  If you compute
matrix entries on host, you don't need COO, otherwise you need.  In
 MatSetValuesCOO(A, coo_v, ..), coo_v can be a device pointer, however in
MatSetValues/MatSetValuesBlocked(A, ..., v, ..), v must be a host pointer.


>
> Thanks for your help in advance.
>
> Best regards,
> Feng
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* 11 February 2026 16:32
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> On Wed, Feb 11, 2026 at 10:58?AM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Mat,
>
> Thanks for your reply. Maybe I am overthinking it.
>
> ksp/ex15 works fine with GPUs.
>
> To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how
> Petsc handle the memory in the host and the device.
>
> Below is a snippet of my current petsc implementation. Suppose I have:
>
>       ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize,
> blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);
>
>
> This is the problem. Right now VecGhost hardcodes the use of VECSEQ and
> VECMPI. This is not necessary, and the local and global representations
> could indeed be device types. Is ghost necessary right now?
>
>
>       ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);
>
>       //duplicate
>       ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);
>
>       //create preconditioning matrix
>       ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize,
> nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,
>                             maxneig, NULL, maxneig, NULL, &petsc_A_pre);
> CHKERRQ(ierr);
>
>
> I would not create the specific type. Rather you create a generic Mat, set
> the blocksize, and then MatSetFromOptions(). Then you can set the type from
> the command line, like baij or aijcusparse, etc.
>
>
> *If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and
> vectors directly created in the device?*
>
> Below is how I assign values for the matrix:
>
>          nnz=0;
>          for(jv=0; jv<nv; jv++)
>         {
>             for(iv=0; iv<nv; iv++)
>            {
>                values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the
> left hand side is [I/dt + (-J)]
>                nnz++;
>            }
>         }
>
>          idxm[0] = ig_mat[iql];
>          idxn[0] = ig_mat[iqr];
>          ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values,
> ADD_VALUES);
>          CHKERRQ(ierr);
>      }
>
> *Does petsc first set the value in the host and copy it to the device or
> the value is directly assigned in the device. in the 2nd case, I would need
> change my code a bit, since I need to make sure the data is in the device
> in the first place.*
>
>
> Yes, you would need to set the values on device for maximum efficiency
> (although I would try it out with CPU construction first). You can do this
> best on the GPU using MatSetValuesCOO().
>
>   Thanks,
>
>      Matt
>
>
> Thanks,
> Feng
>
>
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* 11 February 2026 13:42
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> On Wed, Feb 11, 2026 at 5:55?AM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Thanks for your reply. Probably I did not phrase it in a clear way.
>
> I am using openACC to port the CFD code to the GPU, so the CPU and the GPU
> version essentially share the same source code.  For the original CPU
> version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve
> the sparse linear system.
>
> The current GPU version of the code only port the Jacobi solver to the
> GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to
> make to the existing CPU version of GMRES+ILU(0) to achieve this goal?
>
>
> I think what Junchao is saying, is that if you use the GPU vec and mat
> types, this should be running on the GPU already. Does that not work?
>
>   Thanks,
>
>      Matt
>
>
> BTW: For performance the GPU version of the CFD code has minimum
> communication between the CPU and GPU, so for Ax=b, A, x and b are created
> in the GPU directly
>
> Thanks,
> Feng
>
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 11 February 2026 3:00
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Barry Smith <
> bsmith at petsc.dev>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Sorry, I don't understand your question.  What blocks you from running
> your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about
> the algorithms.
>
> --Junchao Zhang
>
>
> On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> I have managed to configure Petsc for GPU, also managed to run ksp/ex15
> using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster
> compared to the scenario if I don't use " -mat_type aijcusparse -vec_type
> cuda". so I believe it runs okay for GPUs.
>
> I have an existing CFD code that runs natively on GPUs. so all the data is
> offloaded to GPU at the beginning and some data are copied back to the cpu
> at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for
> the implicit solver.  *My question is:  my code also has a GMRES+ILU(0)
> implemented with Petsc but it only runs on cpus (which I implemented a few
> years ago). How can I replace the existing Newton-Jacobi (which runs in
> GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give
> some advice?*
>
> Thanks,
> Feng
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 23:18
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hi Feng,
>   At the first step, you don't need to change your CPU implementation.
> Then do profiling to see where it is worth putting your effort.  Maybe you
> need to assemble your matrices and vectors on GPUs too, but decide that at
> a later stage.
>
>   Thanks!
> --Junchao Zhang
>
>
> On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Hi Junchao,
>
> Many thanks for your reply.
>
> This is great!  Do I need to change anything for my current CPU
> implementation? or I just link to a version of Petsc that is configured
> with  cuda and make sure the necessary data are copied to the "device",
> then Petsc will do the rest magic for me?
>
> Thanks,
> Feng
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* 09 February 2026 1:55
> *To:* feng wang <snailsoar at hotmail.com>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to
> GPU
>
> Hello Feng,
>   It is possible to run GMRES with ILU(0) on GPUs.  You may need to
> configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with
> extra --download-kokkos  --download-kokkos-kernels).  Then run with
> -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
>   But triangular solve is not GPU friendly and the performance might be
> poor.  But you should try it, I think.
>
> Thanks!
> --Junchao Zhang
>
> On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com> wrote:
>
> Dear All,
>
> I have an existing implementation of GMRES with ILU(0), it works well for
> cpu now. I went through the Petsc documentation, it seems Petsc has some
> support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?
>
> Many thanks for your help in advance,
> Feng
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhX6QVNE-$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhSOQ13qp$ >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhX6QVNE-$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhSOQ13qp$ >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260212/e6dedef3/attachment-0001.html>

From alexandre.scotto at irt-saintexupery.com  Fri Feb 13 01:14:47 2026
From: alexandre.scotto at irt-saintexupery.com (SCOTTO Alexandre)
Date: Fri, 13 Feb 2026 07:14:47 +0000
Subject: [petsc-users] Scalability
In-Reply-To: <F3979F94-15D9-4F12-9E9C-E19601221CDE@petsc.dev>
References: <a2f3516fb998412a85a235af16c1567a@irt-saintexupery.com>
	<CAMYG4GmSu_q2j+ysWo5LPUBrDrbCnSN7QZw-o87fq4gpyDUz0w@mail.gmail.com>
	<F3979F94-15D9-4F12-9E9C-E19601221CDE@petsc.dev>
Message-ID: <3086a5372d334aadb98092a518da3328@irt-saintexupery.com>

Dear Matthew, Barry,

Thank you for your answers. The question of the problem size was part of my concern regarding the relevance of the quick test setup, I am going to increase the size in the suggested way to see the difference.

Regarding the sparsity pattern, I assume that the more ?diagonal? the matrix is the better the speedup, is this is a correct rule of thumb?

Best regards,
Alexandre.

De : Barry Smith <bsmith at petsc.dev>
Envoy? : jeudi 12 f?vrier 2026 17:10
? : Matthew Knepley <knepley at gmail.com>
Cc : SCOTTO Alexandre <alexandre.scotto at irt-saintexupery.com>; petsc-users at mcs.anl.gov
Objet : Re: [petsc-users] Scalability


   The problem size is also very small. Typically one cannot get speedup when the number of variables per MPI rank is below on the order of 10,000. In your 64 process case you only have 390 variables. I would be stunned with any kind of speedup for such sizes. Run a problem at least 10 times bigger, better yet 20 times.



On Feb 12, 2026, at 9:00?AM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:

On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Dear PETSc community,

I have conducted a quick strong scalability-like test on direct and adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, distributed over 2, 4, ?, 32 and 64 processes and the results I obtained were not so great.
I am not very confident in my setup, so a as a matter of reference, is there any available results on weak and strong scalability of PETSc.Mat mult() and multTranspose() operations?

1. This behavior depends on available bandwidth, not on cores. Do you know the bandwidth for your configurations?

2. Strong scaling depends heavily on matrix sparsity. If inevitably declines, but slower with more work to do.

3. We published a paper on performance recently: https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!eV_IiuO3vH2ZpW_dBJVlJMaQrV4cgKIZY9zbKqT1jv_KxjWYt5QxPk9CZ6YQsDkA3nBdSIOKvGAf0E55rPv1G1x0j_G4MnIEMljD3zuFSA$ <https://urldefense.us/v3/__https:/www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NIIoVXnb$>

  Thanks,

      Matt

Best regards,
Alexandre.


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eV_IiuO3vH2ZpW_dBJVlJMaQrV4cgKIZY9zbKqT1jv_KxjWYt5QxPk9CZ6YQsDkA3nBdSIOKvGAf0E55rPv1G1x0j_G4MnIEMlgIc0eMgg$ <https://urldefense.us/v3/__http:/www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NEclfLI_$>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260213/c0c2cb3e/attachment.html>

From snailsoar at hotmail.com  Fri Feb 13 04:42:42 2026
From: snailsoar at hotmail.com (feng wang)
Date: Fri, 13 Feb 2026 10:42:42 +0000
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <CAMYG4G=nstBtVzsa0_T+-6z5G5w12eEaXq1xNbdE93Zt2bhUYA@mail.gmail.com>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
	<SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4GnD6L-8F2WeSv4wXPBYqi3+ZyAAgo1e2x2Ue-Fe+DTStQ@mail.gmail.com>
	<SY8P300MB012845BA5F6260F713F82434B963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4G=p0BoHWgYXZR9RkjSFUk+0G2KLbNdQJJi-ChRM-bLeHg@mail.gmail.com>
	<SY8P300MB01288EF61FBB9B70625B15D6B960A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4G=nstBtVzsa0_T+-6z5G5w12eEaXq1xNbdE93Zt2bhUYA@mail.gmail.com>
Message-ID: <SY8P300MB0128F8D9B65B4F7AED047CF8B961A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>

Hi Mat,

Thanks for your reply.  I roughly know what to do now. I will give it a try. If I have some issues, I will come back to this thread.

Thanks,
Feng
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: 12 February 2026 23:36
To: feng wang <snailsoar at hotmail.com>
Cc: Junchao Zhang <junchao.zhang at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

On Thu, Feb 12, 2026 at 6:14?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Mat,

Thanks for your reply.

For "VecCreateGhostBlock",  The CPU version runs in parallel, if we are solving Ax=b, so it also stores the halos in x and b for each partition. This is how my old implementation was done. If the current GPU implementation does not support halos, I can stick to one GPU for the moment. or is there a way around this?

There is a way around it. We have an open Issue. Someone needs to allow the vectors to be created with another type. It is not hard, it just takes time. I can do it starting the middle of March if you need it quickly.

Regarding to "Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.", my current CFD code also takes arguments from the command line, so I prefer I can set the types from the source code directly, so it does not mess around with arguments of the CFD code. Is there a way I can do this?

1) You can do that using

  MatCreate()
  MatSetSizes()
  MatSetBlockSize()
  MatSetType()

but, I still don't think you should do that.

2) You can provide PETSc options from any source you want using PetscOptionsSetValue() and PetscOptionsInsertString(), so you can manage them however you want.

With respect to "MatSetValuesCOO()", I am new to this, and was using the old way to set the values.  For MatSetValuesCOO, it requires an argument "coo_v", how does it work if I want to set the values in the GPU directly? say, coo_v has the type of PetscScalar, do I need to create coo_v and assign its values directly in the GPU and then give it to MatSetValuesCOO?

Yes. COO is much more efficient on the GPU than calling SetValues() individually. GPUs have horrible latency and hate branching. This is about the only way to make them competitive with CPUs for building operators.

  Thanks,

      Matt

Thanks for your help in advance.

Best regards,
Feng

________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: 11 February 2026 16:32
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

On Wed, Feb 11, 2026 at 10:58?AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Mat,

Thanks for your reply. Maybe I am overthinking it.

ksp/ex15 works fine with GPUs.

To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device.

Below is a snippet of my current petsc implementation. Suppose I have:

      ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);

This is the problem. Right now VecGhost hardcodes the use of VECSEQ and VECMPI. This is not necessary, and the local and global representations could indeed be device types. Is ghost necessary right now?

      ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);

      //duplicate
      ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);

      //create preconditioning matrix
      ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,
                            maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr);

I would not create the specific type. Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.

If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device?

Below is how I assign values for the matrix:

         nnz=0;
         for(jv=0; jv<nv; jv++)
        {
            for(iv=0; iv<nv; iv++)
           {
               values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the left hand side is [I/dt + (-J)]
               nnz++;
           }
        }

         idxm[0] = ig_mat[iql];
         idxn[0] = ig_mat[iqr];
         ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values, ADD_VALUES);
         CHKERRQ(ierr);
     }

Does petsc first set the value in the host and copy it to the device or the value is directly assigned in the device. in the 2nd case, I would need change my code a bit, since I need to make sure the data is in the device in the first place.

Yes, you would need to set the values on device for maximum efficiency (although I would try it out with CPU construction first). You can do this best on the GPU using MatSetValuesCOO().

  Thanks,

     Matt

Thanks,
Feng



________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: 11 February 2026 13:42
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

On Wed, Feb 11, 2026 at 5:55?AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Thanks for your reply. Probably I did not phrase it in a clear way.

I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code.  For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system.

The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal?

I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work?

  Thanks,

     Matt

BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly

Thanks,
Feng


________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 11 February 2026 3:00
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Sorry, I don't understand your question.  What blocks you from running your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about the algorithms.

--Junchao Zhang


On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs.

I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver.  My question is:  my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice?

Thanks,
Feng

________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 23:18
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hi Feng,
  At the first step, you don't need to change your CPU implementation.  Then do profiling to see where it is worth putting your effort.  Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage.

  Thanks!
--Junchao Zhang


On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Many thanks for your reply.

This is great!  Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with  cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 1:55
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos  --download-kokkos-kernels).  Then run with -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!b2llbQDqs4ecGK2tBcTFeOgmdFfE1qYGcreiV-BROsWCYjew0pnZo4AIUuIuguF36WWhfh1rsSV51VlrPnLsP7ievw$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!b2llbQDqs4ecGK2tBcTFeOgmdFfE1qYGcreiV-BROsWCYjew0pnZo4AIUuIuguF36WWhfh1rsSV51VlrPnIvyg3_TA$ >


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!b2llbQDqs4ecGK2tBcTFeOgmdFfE1qYGcreiV-BROsWCYjew0pnZo4AIUuIuguF36WWhfh1rsSV51VlrPnLsP7ievw$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!b2llbQDqs4ecGK2tBcTFeOgmdFfE1qYGcreiV-BROsWCYjew0pnZo4AIUuIuguF36WWhfh1rsSV51VlrPnIvyg3_TA$ >


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!b2llbQDqs4ecGK2tBcTFeOgmdFfE1qYGcreiV-BROsWCYjew0pnZo4AIUuIuguF36WWhfh1rsSV51VlrPnLsP7ievw$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!b2llbQDqs4ecGK2tBcTFeOgmdFfE1qYGcreiV-BROsWCYjew0pnZo4AIUuIuguF36WWhfh1rsSV51VlrPnIvyg3_TA$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260213/f32ba0a8/attachment-0001.html>

From snailsoar at hotmail.com  Fri Feb 13 04:49:03 2026
From: snailsoar at hotmail.com (feng wang)
Date: Fri, 13 Feb 2026 10:49:03 +0000
Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU
In-Reply-To: <CA+MQGp9KBLW4d9ZYoAN3T6VPAu+owxYPxuaSKQp_APifXpyFqg@mail.gmail.com>
References: <SY7P300MB0115FACCEC5F6A2840A9E011B964A@SY7P300MB0115.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp8GC3VUZtDYf1Hd8TQOGi3OB=YGEX5DfjupNcJ5qT+Jqw@mail.gmail.com>
	<SY8P300MB01288A9B608B5903A653B036B965A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp_H+15B4vQw-AQ2y3SO8+NJ+kDNDR=FnK0rPuVh7UEjSQ@mail.gmail.com>
	<SY8P300MB0128DC87A4F3C16CDF7F500FB962A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9WZTy09CNHdWHNH1Nv0KYzdg9qfdmEmoYUP_FrRMkXtw@mail.gmail.com>
	<SY8P300MB0128FADB672BE7C3138FE67FB963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4GnD6L-8F2WeSv4wXPBYqi3+ZyAAgo1e2x2Ue-Fe+DTStQ@mail.gmail.com>
	<SY8P300MB012845BA5F6260F713F82434B963A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CAMYG4G=p0BoHWgYXZR9RkjSFUk+0G2KLbNdQJJi-ChRM-bLeHg@mail.gmail.com>
	<SY8P300MB01288EF61FBB9B70625B15D6B960A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>
	<CA+MQGp9KBLW4d9ZYoAN3T6VPAu+owxYPxuaSKQp_APifXpyFqg@mail.gmail.com>
Message-ID: <SY8P300MB01283B103F99A306EF4A09AAB961A@SY8P300MB0128.AUSP300.PROD.OUTLOOK.COM>

Hi Junchao,

Thanks for your reply. It is very helpful. I roughly know what to do now.

One more question, Does this mean that for the moment I can only use a GPU? If so, I can live with it for the moment.

Thanks,
Feng

________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: 12 February 2026 23:44
To: feng wang <snailsoar at hotmail.com>
Cc: Matthew Knepley <knepley at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU




On Thu, Feb 12, 2026 at 5:14?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Mat,

Thanks for your reply.

For "VecCreateGhostBlock",  The CPU version runs in parallel, if we are solving Ax=b, so it also stores the halos in x and b for each partition. This is how my old implementation was done. If the current GPU implementation does not support halos, I can stick to one GPU for the moment. or is there a way around this?
PETSc currently doesn't support ghost vectors on device, though we plan to support it.


Regarding to "Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.", my current CFD code also takes arguments from the command line, so I prefer I can set the types from the source code directly, so it does not mess around with arguments of the CFD code. Is there a way I can do this?
petsc asscpets options from three sources: 1) command line; 2) the .petscrc file; 3) the PETSC_OPTIONS env var.  You can use the latter two approaches.


With respect to "MatSetValuesCOO()", I am new to this, and was using the old way to set the values.  For MatSetValuesCOO, it requires an argument "coo_v", how does it work if I want to set the values in the GPU directly? say, coo_v has the type of PetscScalar, do I need to create coo_v and assign its values directly in the GPU and then give it to MatSetValuesCOO?
COO routines are used to assemble the matrix on device.  If you compute matrix entries on host, you don't need COO, otherwise you need.  In  MatSetValuesCOO(A, coo_v, ..), coo_v can be a device pointer, however in MatSetValues/MatSetValuesBlocked(A, ..., v, ..), v must be a host pointer.


Thanks for your help in advance.

Best regards,
Feng

________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: 11 February 2026 16:32
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

On Wed, Feb 11, 2026 at 10:58?AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Mat,

Thanks for your reply. Maybe I am overthinking it.

ksp/ex15 works fine with GPUs.

To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device.

Below is a snippet of my current petsc implementation. Suppose I have:

      ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr);

This is the problem. Right now VecGhost hardcodes the use of VECSEQ and VECMPI. This is not necessary, and the local and global representations could indeed be device types. Is ghost necessary right now?

      ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr);

      //duplicate
      ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr);

      //create preconditioning matrix
      ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE,
                            maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr);

I would not create the specific type. Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.

If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device?

Below is how I assign values for the matrix:

         nnz=0;
         for(jv=0; jv<nv; jv++)
        {
            for(iv=0; iv<nv; iv++)
           {
               values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the left hand side is [I/dt + (-J)]
               nnz++;
           }
        }

         idxm[0] = ig_mat[iql];
         idxn[0] = ig_mat[iqr];
         ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values, ADD_VALUES);
         CHKERRQ(ierr);
     }

Does petsc first set the value in the host and copy it to the device or the value is directly assigned in the device. in the 2nd case, I would need change my code a bit, since I need to make sure the data is in the device in the first place.

Yes, you would need to set the values on device for maximum efficiency (although I would try it out with CPU construction first). You can do this best on the GPU using MatSetValuesCOO().

  Thanks,

     Matt

Thanks,
Feng



________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: 11 February 2026 13:42
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

On Wed, Feb 11, 2026 at 5:55?AM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Thanks for your reply. Probably I did not phrase it in a clear way.

I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code.  For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system.

The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal?

I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work?

  Thanks,

     Matt

BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly

Thanks,
Feng


________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 11 February 2026 3:00
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Sorry, I don't understand your question.  What blocks you from running your GMRES+ILU(0) on GPUs?  I Cc'ed Barry, who knows better about the algorithms.

--Junchao Zhang


On Tue, Feb 10, 2026 at 3:57?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda.  It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs.

I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver.  My question is:  my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice?

Thanks,
Feng

________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 23:18
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hi Feng,
  At the first step, you don't need to change your CPU implementation.  Then do profiling to see where it is worth putting your effort.  Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage.

  Thanks!
--Junchao Zhang


On Mon, Feb 9, 2026 at 4:31?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Hi Junchao,

Many thanks for your reply.

This is great!  Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with  cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me?

Thanks,
Feng
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: 09 February 2026 1:55
To: feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU

Hello Feng,
  It is possible to run GMRES with ILU(0) on GPUs.  You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos  --download-kokkos-kernels).  Then run with -mat_type {aijcusparse or aijkokkos}  -vec_type {cuda or kokkos}.
  But triangular solve is not GPU friendly and the performance might be poor.  But you should try it, I think.

Thanks!
--Junchao Zhang

On Sun, Feb 8, 2026 at 5:46?PM feng wang <snailsoar at hotmail.com<mailto:snailsoar at hotmail.com>> wrote:
Dear All,

I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs?

Many thanks for your help in advance,
Feng


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fSHu6CilF6ZFnmEGCN9QPpP8ryHsYCNhI3ZnggfkMJpGA59zhN9K_-HF3K5Hu_SCj_Yh321ySgw1ltFU981pvXINSw$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fSHu6CilF6ZFnmEGCN9QPpP8ryHsYCNhI3ZnggfkMJpGA59zhN9K_-HF3K5Hu_SCj_Yh321ySgw1ltFU981LSA_rgw$ >


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fSHu6CilF6ZFnmEGCN9QPpP8ryHsYCNhI3ZnggfkMJpGA59zhN9K_-HF3K5Hu_SCj_Yh321ySgw1ltFU981pvXINSw$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fSHu6CilF6ZFnmEGCN9QPpP8ryHsYCNhI3ZnggfkMJpGA59zhN9K_-HF3K5Hu_SCj_Yh321ySgw1ltFU981LSA_rgw$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260213/ff6f5d73/attachment-0001.html>

From knepley at gmail.com  Fri Feb 13 07:43:23 2026
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 13 Feb 2026 08:43:23 -0500
Subject: [petsc-users] Scalability
In-Reply-To: <3086a5372d334aadb98092a518da3328@irt-saintexupery.com>
References: <a2f3516fb998412a85a235af16c1567a@irt-saintexupery.com>
	<CAMYG4GmSu_q2j+ysWo5LPUBrDrbCnSN7QZw-o87fq4gpyDUz0w@mail.gmail.com>
	<F3979F94-15D9-4F12-9E9C-E19601221CDE@petsc.dev>
	<3086a5372d334aadb98092a518da3328@irt-saintexupery.com>
Message-ID: <CAMYG4GnYcJKH2tY_=2ZdkEJ8z75UzZ1G4t4QcMAFR3_WhiJD=A@mail.gmail.com>

On Fri, Feb 13, 2026 at 2:14?AM SCOTTO Alexandre <
alexandre.scotto at irt-saintexupery.com> wrote:

> Dear Matthew, Barry,
>
>
>
> Thank you for your answers. The question of the problem size was part of
> my concern regarding the relevance of the quick test setup, I am going to
> increase the size in the suggested way to see the difference.
>
>
>
> Regarding the sparsity pattern, I assume that the more ?diagonal? the
> matrix is the better the speedup, is this is a correct rule of thumb?
>

What I was referring to was the density. The pattern has implications for
the cache efficiency. Here is a good paper explaining what is going on:


https://urldefense.us/v3/__https://d1wqtxts1xzle7.cloudfront.net/40652293/Toward_Realistic_Performance_Bounds_for_20151205-5192-8jxqcg-libre.pdf?1449311168=&response-content-disposition=inline*3B*filename*3DToward_realistic_performance_bounds_for.pdf&Expires=1770993790&Signature=ITtMQ-YNb5x*ZZnYof32wXbghpN9y5Bf50*ioozZi6O7GXATT4e4wApHuDX0qsrED1Pv--bv*rXFkMz9BpeGHP491X-qcDdKbRNxp7tg2zhKMwTeGpzzUCDV6UGjWcof39UCWzBSgNDhC35BVObFeDelhewIvn0dNI9O-Msr3wOjO51yDYzh1KJO-oTZ6mIDIYDL8S8ioLhnL0z6ec-3dQOmdDJfV6Vty3gkMJAjAhkhUNst2JEqIuRuygYGizCuVhYksH3p-51et7FWtu043MTmBO6lRCbKodbWMGBXvKe8Kox03NDQ2fs5-ClAWTwjd6VTiGpPq6PxP0a9UPvWZQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA__;JSslfn5-!!G_uCfscf7eWS!Zyta7xbfKs3GEHTK5_8CF8c_Fz0cB2U8ek1mg8Vb0LsqwXKq9nGutXOd3rCZw5mI5nmdiPAXXPUWo1vSYcsq$ 

  Thanks,

     Matt


> Best regards,
>
> Alexandre.
>
>
>
> *De :* Barry Smith <bsmith at petsc.dev>
> *Envoy? :* jeudi 12 f?vrier 2026 17:10
> *? :* Matthew Knepley <knepley at gmail.com>
> *Cc :* SCOTTO Alexandre <alexandre.scotto at irt-saintexupery.com>;
> petsc-users at mcs.anl.gov
> *Objet :* Re: [petsc-users] Scalability
>
>
>
>
>
>    The problem size is also very small. Typically one cannot get speedup
> when the number of variables per MPI rank is below on the order of 10,000.
> In your 64 process case you only have 390 variables. I would be stunned
> with any kind of speedup for such sizes. Run a problem at least 10 times
> bigger, better yet 20 times.
>
>
>
>
>
> On Feb 12, 2026, at 9:00?AM, Matthew Knepley <knepley at gmail.com> wrote:
>
>
>
> On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Dear PETSc community,
>
>
>
> I have conducted a quick strong scalability-like test on direct and
> adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix,
> distributed over 2, 4, ?, 32 and 64 processes and the results I obtained
> were not so great.
>
> I am not very confident in my setup, so a as a matter of reference, is
> there any available results on weak and strong scalability of PETSc.Mat
> mult() and multTranspose() operations?
>
>
>
> 1. This behavior depends on available bandwidth, not on cores. Do you know
> the bandwidth for your configurations?
>
>
>
> 2. Strong scaling depends heavily on matrix sparsity. If inevitably
> declines, but slower with more work to do.
>
>
>
> 3. We published a paper on performance recently:
> https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!Zyta7xbfKs3GEHTK5_8CF8c_Fz0cB2U8ek1mg8Vb0LsqwXKq9nGutXOd3rCZw5mI5nmdiPAXXPUWo5KkdLTl$ 
> <https://urldefense.us/v3/__https:/www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NIIoVXnb$>
>
>
>
>   Thanks,
>
>
>
>       Matt
>
>
>
> Best regards,
>
> Alexandre.
>
>
>
>
> --
>
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zyta7xbfKs3GEHTK5_8CF8c_Fz0cB2U8ek1mg8Vb0LsqwXKq9nGutXOd3rCZw5mI5nmdiPAXXPUWo_B6tlSa$ 
> <https://urldefense.us/v3/__http:/www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NEclfLI_$>
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zyta7xbfKs3GEHTK5_8CF8c_Fz0cB2U8ek1mg8Vb0LsqwXKq9nGutXOd3rCZw5mI5nmdiPAXXPUWo_B6tlSa$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zyta7xbfKs3GEHTK5_8CF8c_Fz0cB2U8ek1mg8Vb0LsqwXKq9nGutXOd3rCZw5mI5nmdiPAXXPUWo0Xlde1W$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260213/e49bd926/attachment.html>

From bsmith at petsc.dev  Fri Feb 13 09:11:13 2026
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 13 Feb 2026 10:11:13 -0500
Subject: [petsc-users] Scalability
In-Reply-To: <CAMYG4GnYcJKH2tY_=2ZdkEJ8z75UzZ1G4t4QcMAFR3_WhiJD=A@mail.gmail.com>
References: <a2f3516fb998412a85a235af16c1567a@irt-saintexupery.com>
	<CAMYG4GmSu_q2j+ysWo5LPUBrDrbCnSN7QZw-o87fq4gpyDUz0w@mail.gmail.com>
	<F3979F94-15D9-4F12-9E9C-E19601221CDE@petsc.dev>
	<3086a5372d334aadb98092a518da3328@irt-saintexupery.com>
	<CAMYG4GnYcJKH2tY_=2ZdkEJ8z75UzZ1G4t4QcMAFR3_WhiJD=A@mail.gmail.com>
Message-ID: <24761395-691B-429F-BB3A-FE9E1CDC4760@petsc.dev>


   Yes the more the values are near the diagonal likely the better scaling. Also the number of nonzeros per row, the higher that number the better the scaling.

   Barry


> On Feb 13, 2026, at 8:43?AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Fri, Feb 13, 2026 at 2:14?AM SCOTTO Alexandre <alexandre.scotto at irt-saintexupery.com <mailto:alexandre.scotto at irt-saintexupery.com>> wrote:
>> Dear Matthew, Barry,
>> 
>>  
>> 
>> Thank you for your answers. The question of the problem size was part of my concern regarding the relevance of the quick test setup, I am going to increase the size in the suggested way to see the difference.
>> 
>>  
>> 
>> Regarding the sparsity pattern, I assume that the more ?diagonal? the matrix is the better the speedup, is this is a correct rule of thumb?
>> 
> 
> What I was referring to was the density. The pattern has implications for the cache efficiency. Here is a good paper explaining what is going on:
> 
>   https://urldefense.us/v3/__https://d1wqtxts1xzle7.cloudfront.net/40652293/Toward_Realistic_Performance_Bounds_for_20151205-5192-8jxqcg-libre.pdf?1449311168=&response-content-disposition=inline*3B*filename*3DToward_realistic_performance_bounds_for.pdf&Expires=1770993790&Signature=ITtMQ-YNb5x*ZZnYof32wXbghpN9y5Bf50*ioozZi6O7GXATT4e4wApHuDX0qsrED1Pv--bv*rXFkMz9BpeGHP491X-qcDdKbRNxp7tg2zhKMwTeGpzzUCDV6UGjWcof39UCWzBSgNDhC35BVObFeDelhewIvn0dNI9O-Msr3wOjO51yDYzh1KJO-oTZ6mIDIYDL8S8ioLhnL0z6ec-3dQOmdDJfV6Vty3gkMJAjAhkhUNst2JEqIuRuygYGizCuVhYksH3p-51et7FWtu043MTmBO6lRCbKodbWMGBXvKe8Kox03NDQ2fs5-ClAWTwjd6VTiGpPq6PxP0a9UPvWZQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA__;JSslfn5-!!G_uCfscf7eWS!axZVqLk0h37e_aRGUtvMl_doja_7Vw4wdRhxvWhzyvFMozVXizBarj_RQS-_qxWkNTGjz50ihtujnoqWm896D_g$ 
> 
>   Thanks,
> 
>      Matt
>  
>> Best regards,
>> 
>> Alexandre.
>> 
>>  
>> 
>> De : Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>> Envoy? : jeudi 12 f?vrier 2026 17:10
>> ? : Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
>> Cc : SCOTTO Alexandre <alexandre.scotto at irt-saintexupery.com <mailto:alexandre.scotto at irt-saintexupery.com>>; petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
>> Objet : Re: [petsc-users] Scalability
>> 
>>  
>> 
>>  
>> 
>>    The problem size is also very small. Typically one cannot get speedup when the number of variables per MPI rank is below on the order of 10,000. In your 64 process case you only have 390 variables. I would be stunned with any kind of speedup for such sizes. Run a problem at least 10 times bigger, better yet 20 times.
>> 
>>  
>> 
>> 
>> 
>> 
>> On Feb 12, 2026, at 9:00?AM, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>> 
>>  
>> 
>> On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>> 
>> Dear PETSc community,
>> 
>>  
>> 
>> I have conducted a quick strong scalability-like test on direct and adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, distributed over 2, 4, ?, 32 and 64 processes and the results I obtained were not so great.
>> 
>> I am not very confident in my setup, so a as a matter of reference, is there any available results on weak and strong scalability of PETSc.Mat mult() and multTranspose() operations?
>> 
>>  
>> 
>> 1. This behavior depends on available bandwidth, not on cores. Do you know the bandwidth for your configurations?
>> 
>>  
>> 
>> 2. Strong scaling depends heavily on matrix sparsity. If inevitably declines, but slower with more work to do.
>> 
>>  
>> 
>> 3. We published a paper on performance recently: https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!axZVqLk0h37e_aRGUtvMl_doja_7Vw4wdRhxvWhzyvFMozVXizBarj_RQS-_qxWkNTGjz50ihtujnoqWSwoa0nA$  <https://urldefense.us/v3/__https:/www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NIIoVXnb$>
>>  
>> 
>>   Thanks,
>> 
>>  
>> 
>>       Matt 
>> 
>>  
>> 
>> Best regards,
>> 
>> Alexandre.
>> 
>> 
>> 
>>  
>> 
>> -- 
>> 
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>> 
>>  
>> 
>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!axZVqLk0h37e_aRGUtvMl_doja_7Vw4wdRhxvWhzyvFMozVXizBarj_RQS-_qxWkNTGjz50ihtujnoqWiVlDisA$  <https://urldefense.us/v3/__http:/www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NEclfLI_$>
>>  
>> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!axZVqLk0h37e_aRGUtvMl_doja_7Vw4wdRhxvWhzyvFMozVXizBarj_RQS-_qxWkNTGjz50ihtujnoqWiVlDisA$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!axZVqLk0h37e_aRGUtvMl_doja_7Vw4wdRhxvWhzyvFMozVXizBarj_RQS-_qxWkNTGjz50ihtujnoqWfV1bHoU$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260213/20ee4a51/attachment-0001.html>

From jed at jedbrown.org  Sat Feb 14 00:34:24 2026
From: jed at jedbrown.org (Jed Brown)
Date: Fri, 13 Feb 2026 23:34:24 -0700
Subject: [petsc-users] 2026 Colorado Conference on Iterative and Multigrid
 Methods, June 21-26 in Boulder, CO
References: <0100019c340dc317-b70c38b1-4985-4c7b-8b12-7d120c13b6ca-000000@email.amazonses.com>
Message-ID: <87y0kvq1e7.fsf@jedbrown.org>

As Scott announces below, the conference formerly known as Copper will be held in Boulder this June. The student paper competition abstract deadline is Feb 18, with paper submission a week later. We'll have an affordable on-campus housing option.

We look forward to your student paper submissions.

Registration and lodging will be available soon.

https://urldefense.us/v3/__https://coloradoconference.github.io/2026/__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T406EDeiok$ 

-------------------- Start of forwarded message --------------------
From: Scott Maclachlan via SIAM <Mail at ConnectedCommunity.org>
Date: Fri, 6 Feb 2026 17:44:03 +0000
Subject: SIAG on Computational Science and Engineering Community: 2026
 Colorado Conference on Iterative and Multigrid Methods
To: jed at jedbrown.org

-------------- next part --------------
The Copper Mountain Conference on Multigrid Methods was founded in 1983 and held every two years (odd years) since then.  In 1990, the Copper Mountain Conference on Iterative Methods was formed as a companion conference to be held in even years. Together, they are widely regarded as premier international conferences on iterative and multigrid methods. In 2026, we continue this conference series with the Colorado Conference on Iterative and Multigrid Methods to be held on the CU Boulder campus.



Abstract submission for the conference will soon open, including for the student paper competition (a tradition at the meeting with a cash award). For more information, see our website at


    https://urldefense.us/v3/__https://coloradoconference.github.io/2026/__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T406EDeiok$ 


Updates will be provided on the web site and in follow-up announcements in the coming weeks.  Don't forget to mark your calendars!



Important Deadlines:


Student Paper Competition (no extensions):


Abstract: Wednesday, February 18, 2026


Paper (reserved for students who submit an abstract): Wednesday, February 25, 2026


Presentation Abstracts: Friday, April 3, 2026


Early Registration: Friday, April 17, 2026





------------------------------
Scott MacLachlan
Memorial University of Newfoundland, Canada
------------------------------


Reply to Sender : https://urldefense.us/v3/__https://engage.siam.org/eGroups/PostReply/?GroupId=79&MID=12597&SenderKey=7385094f-4ce3-4897-a820-b2a77145ce90__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T40WNcMrmI$ 

Reply to Discussion : https://urldefense.us/v3/__https://engage.siam.org/eGroups/PostReply/?GroupId=79&MID=12597__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T40HtZlATY$ 



You are subscribed to "SIAG on Computational Science and Engineering Community" as jed at jedbrown.org. To change your subscriptions, go to https://urldefense.us/v3/__http://siam.connectedcommunity.org/preferences?section=Subscriptions__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T40BbSxG_w$ .  To unsubscribe from this community discussion, go to https://urldefense.us/v3/__https://siam.connectedcommunity.org/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=70e74b35-bc03-4037-9b99-eabb1cd118f9&sKey=6061e99933064319987e&GroupKey=d8049ff6-9924-44fb-bbe6-c2f867948e95__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T40dO2c5Dk$ .
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260213/b25d19ee/attachment.html>
-------------- next part --------------
-------------------- End of forwarded message --------------------

From drwells at email.unc.edu  Tue Feb 17 10:51:58 2026
From: drwells at email.unc.edu (Wells, David)
Date: Tue, 17 Feb 2026 16:51:58 +0000
Subject: [petsc-users] Limiting the number of vectors allocated at a
 time by fgmres etc.
In-Reply-To: <6F3A41A6-90B5-475D-88A3-C2BF6AE53547@petsc.dev>
References: <SA1PR03MB6387D98F49E67D0FA2A54181ED62A@SA1PR03MB6387.namprd03.prod.outlook.com>
	<CAMYG4GmFLwVv8LsK8niEwOsRaW==c30pvjvtUEMpx7vk1MJxOg@mail.gmail.com>
	<SA1PR03MB6387FD4C342777BD437ABA1AED63A@SA1PR03MB6387.namprd03.prod.outlook.com>
	<6F3A41A6-90B5-475D-88A3-C2BF6AE53547@petsc.dev>
Message-ID: <SA1PR03MB63876BE023E08D985F27D699ED6DA@SA1PR03MB6387.namprd03.prod.outlook.com>

Hi Barry,

Sorry for the slow response - yes, that makes sense, and it is a sensible default to have. I appreciate the writeup.

Best,
David
________________________________
From: Barry Smith <bsmith at petsc.dev>
Sent: Tuesday, February 10, 2026 9:32 PM
To: Wells, David <drwells at email.unc.edu>
Cc: Matthew Knepley <knepley at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc.

You don't often get email from bsmith at petsc.dev. Learn why this is important<https://urldefense.us/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!G_uCfscf7eWS!fGPdLXU1J2z90EOLzcOjKmEhpgVOTMjLIZFkcTqlZ1InQNY5B79RxHFXzmqiSb7O09i1rWuXObNqZRj-hZmQ43XwjnQ$ >

  1) For a fixed restart (say of 30) FGMRES needs 60 vectors, while GMRES only needs 30. This is a big disadvantage of FGMRES over GMRES.

  2) By default PETSc GMRES uses a restart of 30 meaning it keeps 30 previous Krylov vectors (and FGMRES needs 60 vectors). You can use a smaller restart with KSPGMRESSetRestart or -ksp_gmres_restart to need less memory (of course the convergence may get far worse or not depending on the problem.

  3) When GMRES (or FGMRES) starts up it does not immediately allocate all 30 (or whatever) restart vectors because it may be that GMRES only takes 15 steps so why allocate all of them? Instead  it allocates a chunk at a time GMRES_DELTA_DIRECTIONS which is 10 when it uses up the 10 it allocates another 10 (if needed) etc until it gets to the restart You can force GMRES to allocate all 30 (or whatever) initially instead of the chunk of a time approach by using ?KSPGMRESSetPreAllocateVectors() or -ksp_gmres_preallocate  This will prevent confusion about why more vectors are allocated later and why they are not all allocated when the solve starts.

4) PETSc?s GMRES tries to use BLAS 2 operations for MDot() and MAXPY (the orthogonalization in GMRES). It can only use the BLAS 2 on vector chunks that are allocated together. By preallocating all the vectors at the beginning one gets a single chunk and hence more efficient orthogonalization; this is more important on GPUs. For CPUs whether you have 10 or 30 vectors together doesn?t matter much at all.

   I hope this clarifies why you are seeing the memory allocations. Note that these are NOT ?reallocations? in the sense of KSPGMRES allocating more memory and then copying something into the new memory and freeing the old. They are just allocations of new memory which will then be used.

   Barry




On Feb 10, 2026, at 9:04?PM, Wells, David via petsc-users <petsc-users at mcs.anl.gov> wrote:

Hi Matt,

Thanks for the quick response!

> I cannot understand precisely what is happening here. You specify a restart
> size when you setup the KSP. It allocates that many vecs (roughly). Why are
> there reallocations? Do you increase the restart size during the iteration?

I don't believe there are any reallocations (I didn't write this solver, but I
don't see any calls which set the restart size or any other relevant parameter [1]):
as far as I can tell, the solver just allocates a lot of vectors. I'm working
off of traces computed by heaptrack, which is my only insight into how this
works. The allocations come from KSPCreateVecs(), which is called by
1. KSPFGMRESGetNewVectors() (for about 1.7 GB [2] of memory)
2. KSPSetUp_GMRES() (for about 300 MB of memory)
3. KSPSetUp_FGMRES() (for about 264 MB of memory)
4. KSPSetWorkVecs() (for about 236 MB of memory)

Is there some relevant set of monitoring flags I can set which will show me how
many vectors I allocate or use? That would also help.

Best,
David

[1] This is IBAMR's PETScKrylovLinearSolver.
[2] This is half the total memory we use for side-centered data vectors.
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Tuesday, February 10, 2026 6:28 PM
To: Wells, David <drwells at email.unc.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc.

On Tue, Feb 10, 2026 at 5:32?PM Wells, David via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hello,

I've been profiling the memory usage of my solver and it looks like a huge number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I read through the source code and it looks like these vectors are allocated ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that KSP. Is there a way to change this value?

We could add an option to change this delta. Actually theory suggests that a constant is not optimal, but rather we should double the number each time. I would also be willing to code that.

If not - how hard would it be to add an API to set a different initial value for that? These vectors take up a lot of memory and I would rather just one at a time.

I cannot understand precisely what is happening here. You specify a restart size when you setup the KSP. It allocates that many vecs (roughly). Why are there reallocations? Do you increase the restart size during the iteration?

  Thanks,

     Matt

Best,
David Wells


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fGPdLXU1J2z90EOLzcOjKmEhpgVOTMjLIZFkcTqlZ1InQNY5B79RxHFXzmqiSb7O09i1rWuXObNqZRj-hZmQV7v23JY$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aBTZ4oaySeEsyg3U98-DZfMmoj0Wg8RUNEzlPdQoEnZ6pfCuP3cZSF7ib3bqpT7GBGVct81F2vjmak_Zva9875mpIc0$>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260217/22bdd74f/attachment-0001.html>

From mfadams at lbl.gov  Thu Feb 19 11:11:21 2026
From: mfadams at lbl.gov (Mark Adams)
Date: Thu, 19 Feb 2026 12:11:21 -0500
Subject: [petsc-users] DG methods in PETSc
In-Reply-To: <AS8P251MB08532944C525E4C6EF1B4223816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
References: <AS8P251MB0853D33BE6BB232B3B7519EF8162A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh4D_KCYEK14NYeHA8My+bsLU9suknntLKZU36v3PY6BuA@mail.gmail.com>
	<AS8P251MB085386B07B0AA270D1F8ACFE8163A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh5SDji9uJSoMGnYxf_W7dLJMEmaRXtwYDvNi70EunU1qw@mail.gmail.com>
	<AS8P251MB08532944C525E4C6EF1B4223816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
Message-ID: <CADOhEh4H6xvbf1n8J0E1mF9DKVmWfSf9x4JrbrRWfNEiqCootQ@mail.gmail.com>

please keep this on the list.



On Thu, Feb 19, 2026 at 9:36?AM Matteo Leone <matteo4.leone at mail.polimi.it>
wrote:

> Hello, I now have tested quite a bit the code, and I have some doubts.
> Note I am quite new to PETSc.
>
> I have modified a bit your code to test it and I tried to print from the
> functions to check if they were actually called with negative results.
> The Riemann solver is never called and also the solution is just static in
> time.
>

This was a work in progress, but it does run:  ./ex9 -dm_plex_box_faces 8,8
-dm_plex_dim 2 -dm_plex_simplex 0 -order 1 -ts_max_steps 10 -ts_monitor
-velocity 1.0,0.5

I see that the Riemann solver is set:

  PetscCall(PetscDSSetRiemannSolver(ds, 0, RiemannSolver_Advection));

I would not recommend using DG because there are no tests for it.
There are tests for PetscDSSetRiemannSolver that you could clone and
replace the FE/FV construction in those with DG:

src/ts/tutorials/ex11.c:  PetscCall(PetscDSSetRiemannSolver(prob, 0,
user->model->physics->riemann));
src/ts/tutorials/ex18.c:  if (user->velocityDist == VEL_ZERO)
PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_advection));
src/ts/tutorials/ex18.c:  else PetscCall(PetscDSSetRiemannSolver(prob, 1,
riemann_coupled_advection));

If you get this far, you might want to look at internal examples (not
tests) and clone these DG constructors:

src/dm/dt/fe/interface/fe.c:  PetscFECreateBrokenElement - Create a
discontinuous version of the input `PetscFE`
src/dm/dt/fe/interface/fe.c:PetscErrorCode
PetscFECreateBrokenElement(PetscFE cgfe, PetscFE *dgfe)
src/dm/impls/plex/plexcreate.c:
 PetscCall(PetscFECreateBrokenElement(fe, &dgfe));
src/dm/interface/dmcoordinates.c:
 PetscCall(PetscFECreateBrokenElement(feLinear, &dgfe));

Sorry we do not have DG fully deployed yet.

Good luck,
Mark


> I share the code. It's a bit modified and no more c but c++ (I do not know
> to write the csv file fom c, in case is a dealbreaker I'll try to give a
> look on how to do it).
> (For reference I use PETSc 3.24.4 by means of nix, the nix flake is in the
> shared docs if you are used to it lmk).
>
> There is also a small .py to handle the visualization (It was made by
> Claude code, but it just plots the results)
> Hopefully we can go deep into this PETSc DG stuff.
>
>
> Thanks in advace.
> Matteo
>
>
> ------------------------------
> *Da:* Mark Adams <mfadams at lbl.gov>
> *Inviato:* mercoled? 11 febbraio 2026 17:25
> *A:* Matteo Leone <matteo4.leone at mail.polimi.it>; PETSc users list <
> petsc-users at mcs.anl.gov>
> *Oggetto:* Re: [petsc-users] DG methods in PETSc
>
> Great, and keep it on the list. Lots of people here to help!
>
> On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone <matteo4.leone at mail.polimi.it>
> wrote:
>
> Wow thank you so much! I was almost hopeless, I'll deep dive into it and
> I'll give you a feedback.
>
> Matteo
>
> ------------------------------
> *From:* Mark Adams <mfadams at lbl.gov>
> *Sent:* Wednesday, February 11, 2026 3:38:30 PM
> *To:* Matteo Leone <matteo4.leone at mail.polimi.it>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] DG methods in PETSc
>
> DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see
> a test.
> Here is a test but it is not well verified.
> Mark
>
> On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello, I already posted on Reddit but just to be sure I write even here.
>
> First thanks for the job you do for PETSc, I have used it for several
> projects and is  always nice.
>
> I am writing cause  I am getting mad trying to implement DG solver in
> PETSc, the target is the Euler equations, however I am failing even with
> just the simplest transport equation (u/t + u/x = 0). I was wondering if I
> am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces,
> and more, but I keep failing, I tried also with LLMs, but seems like there
> is no DG code with PETSc on the web, however I see many papers that do it.
>
> I was  wondering if I am maybe missing  something out or what.
>
> For reference I use PETSc 3.24.3 by means of nix.
>
> Thanks in advance, cheers.
>
> Matteo
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260219/24292f04/attachment.html>

From matteo4.leone at mail.polimi.it  Thu Feb 19 11:13:54 2026
From: matteo4.leone at mail.polimi.it (Matteo Leone)
Date: Thu, 19 Feb 2026 17:13:54 +0000
Subject: [petsc-users] Fw:  DG methods in PETSc
In-Reply-To: <AS8P251MB08532944C525E4C6EF1B4223816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
References: <AS8P251MB0853D33BE6BB232B3B7519EF8162A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh4D_KCYEK14NYeHA8My+bsLU9suknntLKZU36v3PY6BuA@mail.gmail.com>
	<AS8P251MB085386B07B0AA270D1F8ACFE8163A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh5SDji9uJSoMGnYxf_W7dLJMEmaRXtwYDvNi70EunU1qw@mail.gmail.com>
	<AS8P251MB08532944C525E4C6EF1B4223816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
Message-ID: <AS8P251MB08538D6AE2ADBA22D3F7B907816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>



Inviato da Outlook per Android<https://urldefense.us/v3/__https://aka.ms/AAb9ysg__;!!G_uCfscf7eWS!esCSra9pt1Q2VQtIwFRk5ds5BQ6-sbeQ66hUETSln-bCt8DbHI1HPvjOPOT6hyKlCgM8Xa4rlvZ6s9Gz4-MYSg9RvFGxWhhnjWc$ >
________________________________
From: Matteo Leone <matteo4.leone at mail.polimi.it>
Sent: Thursday, February 19, 2026 3:36:44 PM
To: Mark Adams <mfadams at lbl.gov>
Subject: R: [petsc-users] DG methods in PETSc

Hello, I now have tested quite a bit the code, and I have some doubts.
Note I am quite new to PETSc.

I have modified a bit your code to test it and I tried to print from the functions to check if they were actually called with negative results.
The Riemann solver is never called and also the solution is just static in time.
I share the code. It's a bit modified and no more c but c++ (I do not know to write the csv file fom c, in case is a dealbreaker I'll try to give a look on how to do it).
(For reference I use PETSc 3.24.4 by means of nix, the nix flake is in the shared docs if you are used to it lmk).

There is also a small .py to handle the visualization (It was made by Claude code, but it just plots the results)
Hopefully we can go deep into this PETSc DG stuff.


Thanks in advace.
Matteo


________________________________
Da: Mark Adams <mfadams at lbl.gov>
Inviato: mercoled? 11 febbraio 2026 17:25
A: Matteo Leone <matteo4.leone at mail.polimi.it>; PETSc users list <petsc-users at mcs.anl.gov>
Oggetto: Re: [petsc-users] DG methods in PETSc

Great, and keep it on the list. Lots of people here to help!

On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone <matteo4.leone at mail.polimi.it<mailto:matteo4.leone at mail.polimi.it>> wrote:
Wow thank you so much! I was almost hopeless, I'll deep dive into it and I'll give you a feedback.

Matteo

________________________________
From: Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>>
Sent: Wednesday, February 11, 2026 3:38:30 PM
To: Matteo Leone <matteo4.leone at mail.polimi.it<mailto:matteo4.leone at mail.polimi.it>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] DG methods in PETSc

DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see a test.
Here is a test but it is not well verified.
Mark

On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hello, I already posted on Reddit but just to be sure I write even here.

First thanks for the job you do for PETSc, I have used it for several projects and is  always nice.

I am writing cause  I am getting mad trying to implement DG solver in PETSc, the target is the Euler equations, however I am failing even with just the simplest transport equation (u/t + u/x = 0). I was wondering if I am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, and more, but I keep failing, I tried also with LLMs, but seems like there is no DG code with PETSc on the web, however I see many papers that do it.

I was  wondering if I am maybe missing  something out or what.

For reference I use PETSc 3.24.3 by means of nix.

Thanks in advance, cheers.

Matteo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260219/bbb4a228/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dg_transp_eq.cpp
Type: text/x-c++src
Size: 16452 bytes
Desc: dg_transp_eq.cpp
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260219/bbb4a228/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plot.py
Type: text/x-python
Size: 7225 bytes
Desc: plot.py
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260219/bbb4a228/attachment-0001.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flake.nix
Type: application/x-mix-transfer
Size: 2748 bytes
Desc: flake.nix
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260219/bbb4a228/attachment-0003.bin>

From mfadams at lbl.gov  Thu Feb 19 11:15:58 2026
From: mfadams at lbl.gov (Mark Adams)
Date: Thu, 19 Feb 2026 12:15:58 -0500
Subject: [petsc-users] Fw: DG methods in PETSc
In-Reply-To: <AS8P251MB08538D6AE2ADBA22D3F7B907816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
References: <AS8P251MB0853D33BE6BB232B3B7519EF8162A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh4D_KCYEK14NYeHA8My+bsLU9suknntLKZU36v3PY6BuA@mail.gmail.com>
	<AS8P251MB085386B07B0AA270D1F8ACFE8163A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh5SDji9uJSoMGnYxf_W7dLJMEmaRXtwYDvNi70EunU1qw@mail.gmail.com>
	<AS8P251MB08532944C525E4C6EF1B4223816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<AS8P251MB08538D6AE2ADBA22D3F7B907816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
Message-ID: <CADOhEh5pSMO_oHtLurSWVzbXxx1VOuxz5X_B4Y52AEPjhJ5UXw@mail.gmail.com>

Just to avoid confusion with a convoluted thread, here is my response
reposed:

This was a work in progress, but it does run:  ./ex9 -dm_plex_box_faces 8,8
-dm_plex_dim 2 -dm_plex_simplex 0 -order 1 -ts_max_steps 10 -ts_monitor
-velocity 1.0,0.5

I see that the Riemann solver is set:

  PetscCall(PetscDSSetRiemannSolver(ds, 0, RiemannSolver_Advection));

I would not recommend using DG because there are no tests for it.
There are tests for PetscDSSetRiemannSolver that you could clone and
replace the FE/FV construction in those with DG:

src/ts/tutorials/ex11.c:  PetscCall(PetscDSSetRiemannSolver(prob, 0,
user->model->physics->riemann));
src/ts/tutorials/ex18.c:  if (user->velocityDist == VEL_ZERO)
PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_advection));
src/ts/tutorials/ex18.c:  else PetscCall(PetscDSSetRiemannSolver(prob, 1,
riemann_coupled_advection));

If you get this far, you might want to look at internal examples (not
tests) and clone these DG constructors:

src/dm/dt/fe/interface/fe.c:  PetscFECreateBrokenElement - Create a
discontinuous version of the input `PetscFE`
src/dm/dt/fe/interface/fe.c:PetscErrorCode
PetscFECreateBrokenElement(PetscFE cgfe, PetscFE *dgfe)
src/dm/impls/plex/plexcreate.c:
 PetscCall(PetscFECreateBrokenElement(fe, &dgfe));
src/dm/interface/dmcoordinates.c:
 PetscCall(PetscFECreateBrokenElement(feLinear, &dgfe));

Sorry we do not have DG fully deployed yet.

Good luck,
Mark

On Thu, Feb 19, 2026 at 12:14?PM Matteo Leone via petsc-users <
petsc-users at mcs.anl.gov> wrote:

>
>
> Inviato da Outlook per Android
> <https://urldefense.us/v3/__https://aka.ms/AAb9ysg__;!!G_uCfscf7eWS!esCSra9pt1Q2VQtIwFRk5ds5BQ6-sbeQ66hUETSln-bCt8DbHI1HPvjOPOT6hyKlCgM8Xa4rlvZ6s9Gz4-MYSg9RvFGxWhhnjWc$>
> ------------------------------
> *From:* Matteo Leone <matteo4.leone at mail.polimi.it>
> *Sent:* Thursday, February 19, 2026 3:36:44 PM
> *To:* Mark Adams <mfadams at lbl.gov>
> *Subject:* R: [petsc-users] DG methods in PETSc
>
> Hello, I now have tested quite a bit the code, and I have some doubts.
> Note I am quite new to PETSc.
>
> I have modified a bit your code to test it and I tried to print from the
> functions to check if they were actually called with negative results.
> The Riemann solver is never called and also the solution is just static in
> time.
> I share the code. It's a bit modified and no more c but c++ (I do not know
> to write the csv file fom c, in case is a dealbreaker I'll try to give a
> look on how to do it).
> (For reference I use PETSc 3.24.4 by means of nix, the nix flake is in the
> shared docs if you are used to it lmk).
>
> There is also a small .py to handle the visualization (It was made by
> Claude code, but it just plots the results)
> Hopefully we can go deep into this PETSc DG stuff.
>
>
> Thanks in advace.
> Matteo
>
>
> ------------------------------
> *Da:* Mark Adams <mfadams at lbl.gov>
> *Inviato:* mercoled? 11 febbraio 2026 17:25
> *A:* Matteo Leone <matteo4.leone at mail.polimi.it>; PETSc users list <
> petsc-users at mcs.anl.gov>
> *Oggetto:* Re: [petsc-users] DG methods in PETSc
>
> Great, and keep it on the list. Lots of people here to help!
>
> On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone <matteo4.leone at mail.polimi.it>
> wrote:
>
> Wow thank you so much! I was almost hopeless, I'll deep dive into it and
> I'll give you a feedback.
>
> Matteo
>
> ------------------------------
> *From:* Mark Adams <mfadams at lbl.gov>
> *Sent:* Wednesday, February 11, 2026 3:38:30 PM
> *To:* Matteo Leone <matteo4.leone at mail.polimi.it>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] DG methods in PETSc
>
> DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see
> a test.
> Here is a test but it is not well verified.
> Mark
>
> On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hello, I already posted on Reddit but just to be sure I write even here.
>
> First thanks for the job you do for PETSc, I have used it for several
> projects and is  always nice.
>
> I am writing cause  I am getting mad trying to implement DG solver in
> PETSc, the target is the Euler equations, however I am failing even with
> just the simplest transport equation (u/t + u/x = 0). I was wondering if I
> am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces,
> and more, but I keep failing, I tried also with LLMs, but seems like there
> is no DG code with PETSc on the web, however I see many papers that do it.
>
> I was  wondering if I am maybe missing  something out or what.
>
> For reference I use PETSc 3.24.3 by means of nix.
>
> Thanks in advance, cheers.
>
> Matteo
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260219/22324910/attachment.html>

From peter.j.macneice at nasa.gov  Thu Feb 19 20:08:51 2026
From: peter.j.macneice at nasa.gov (Macneice, Peter J. (GSFC-6740))
Date: Fri, 20 Feb 2026 02:08:51 +0000
Subject: [petsc-users] Is use of Mirror Boundary with Box Stencil supported
 for 2D?
Message-ID: <BY3PR09MB7666DA9436C7E70FB27E9390A068A@BY3PR09MB7666.namprd09.prod.outlook.com>


>From my searching of the documentation, it looks to me as though this combination should work.

However for my modified version of the ex66.c tutorial code, I get the error message below.


Is this really not yet supported?



Regards


Peter MacNeice





[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------

[0]PETSC ERROR: No support for this operation for this object type

[0]PETSC ERROR: Mirror boundary and box stencil

[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!

[0]PETSC ERROR:   Option left: name:-dm_view (no value) source: command line

[0]PETSC ERROR:   Option left: name:-ksp_monitor (no value) source: command line

[0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!boqIOhIgncdTr1z60slIrCjusZEY0_bk-qkGbrlrmiepTiBAvCwW8s9c1IQ6vyT4fOeshVO3Bsu1F5HuERiBGRsq1r2LJ1zBtrM$  for trouble shooting.

[0]PETSC ERROR: PETSc Release Version 3.24.3, Jan 01, 2026

[0]PETSC ERROR: ex66_9pt with 1 MPI process(es) and PETSC_ARCH arch-darwin-c-debug on gs67-5186361 by pmacneic Thu Feb 19 21:03:16 2026

[0]PETSC ERROR: Configure options: --with-mpi-dir=/Users/pmacneic/mpich-install-3.3-gcc15 --force

[0]PETSC ERROR: #1 DMSetUp_DA_2D() at /Users/pmacneic/petsc-3.24.3/src/dm/impls/da/da2.c:212

[0]PETSC ERROR: #2 DMSetUp_DA() at /Users/pmacneic/petsc-3.24.3/src/dm/impls/da/dareg.c:17

[0]PETSC ERROR: #3 DMSetUp() at /Users/pmacneic/petsc-3.24.3/src/dm/interface/dm.c:807

[0]PETSC ERROR: #4 main() at ex66_9pt.c:77

[0]PETSC ERROR: PETSc Option Table entries:

[0]PETSC ERROR: -dm_view (source: command line)

[0]PETSC ERROR: -ksp_monitor (source: command line)

[0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov----------

application called MPI_Abort(MPI_COMM_SELF, 56) - process 0







Macneice, Peter J. (GSFC-6740)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260220/813efebb/attachment.html>

From bsmith at petsc.dev  Thu Feb 19 21:37:32 2026
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 19 Feb 2026 22:37:32 -0500
Subject: [petsc-users] Is use of Mirror Boundary with Box Stencil
 supported for 2D?
In-Reply-To: <BY3PR09MB7666DA9436C7E70FB27E9390A068A@BY3PR09MB7666.namprd09.prod.outlook.com>
References: <BY3PR09MB7666DA9436C7E70FB27E9390A068A@BY3PR09MB7666.namprd09.prod.outlook.com>
Message-ID: <C030F2DD-F27A-41FB-9682-008E0FFF055A@petsc.dev>


   I believe the error checking is in place due to how to manage the extreme ghost corners of a rectangular (or box) region.   Consider the grid with values as indicated


      1 2 3 4 5
      6 7 8 9 10

that we mirror in both directions with

      x  6 7 8 9 10 y
      2 1 2 3 4 5 4
      7 6 7 8 9 10 9
       z 1 2 3 4 5 w

I think it is likely I did not want to think about this case when I wrote the code hence put the error checking in.

Quickly looking now it seems the mirroring is well defined so it is possible the error checking is not needed so long as the code properly handles those points. Sadly the code to set up all the communication patterns is complicated and my short-term memory was too small to be able to think about the box code. 

The code is in src/dm/impls/da/da2.c I hope you have more stamina than I do and can take a look at it and see if it needs changes etc.

Note that if you install PETSc not using a ?prefix configure options you can change the code and just run make libs in a PETSc directory with PETSC_DIR (and PETSC_ARCH) set and it will update the library so you first turn off the error check and see what happens.

If you can get it to work please let us know, it would be nice to support this case. Good luck


   Barry


]
> On Feb 19, 2026, at 9:08?PM, Macneice, Peter J. (GSFC-6740) via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> 
> From my searching of the documentation, it looks to me as though this combination should work.
> However for my modified version of the ex66.c tutorial code, I get the error message below.
> 
> Is this really not yet supported?
> 
> 
> Regards
> 
> Peter MacNeice
> 
> 
> 
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: No support for this operation for this object type
> [0]PETSC ERROR: Mirror boundary and box stencil
> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
> [0]PETSC ERROR:   Option left: name:-dm_view (no value) source: command line
> [0]PETSC ERROR:   Option left: name:-ksp_monitor (no value) source: command line
> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!c4lEIB-12hqgDjSzBQ8Q3sro45es-1d6wdQ2tLBRtAUg6mudiRWjcIFMcTu1RZYjSTK1WPXqCKVkPyoOIdVy568$  <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!boqIOhIgncdTr1z60slIrCjusZEY0_bk-qkGbrlrmiepTiBAvCwW8s9c1IQ6vyT4fOeshVO3Bsu1F5HuERiBGRsq1r2LJ1zBtrM$> for trouble shooting.
> [0]PETSC ERROR: PETSc Release Version 3.24.3, Jan 01, 2026
> [0]PETSC ERROR: ex66_9pt with 1 MPI process(es) and PETSC_ARCH arch-darwin-c-debug on gs67-5186361 by pmacneic Thu Feb 19 21:03:16 2026
> [0]PETSC ERROR: Configure options: --with-mpi-dir=/Users/pmacneic/mpich-install-3.3-gcc15 --force
> [0]PETSC ERROR: #1 DMSetUp_DA_2D() at /Users/pmacneic/petsc-3.24.3/src/dm/impls/da/da2.c:212
> [0]PETSC ERROR: #2 DMSetUp_DA() at /Users/pmacneic/petsc-3.24.3/src/dm/impls/da/dareg.c:17
> [0]PETSC ERROR: #3 DMSetUp() at /Users/pmacneic/petsc-3.24.3/src/dm/interface/dm.c:807
> [0]PETSC ERROR: #4 main() at ex66_9pt.c:77
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -dm_view (source: command line)
> [0]PETSC ERROR: -ksp_monitor (source: command line)
> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov----------
> application called MPI_Abort(MPI_COMM_SELF, 56) - process 0
> 
> 
> 
> 
> 
> 
> 
> Macneice, Peter J. (GSFC-6740)
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260219/d353052c/attachment-0001.html>

From matteo4.leone at mail.polimi.it  Fri Feb 20 07:00:11 2026
From: matteo4.leone at mail.polimi.it (Matteo Leone)
Date: Fri, 20 Feb 2026 13:00:11 +0000
Subject: [petsc-users] R:  Fw: DG methods in PETSc
In-Reply-To: <CADOhEh5pSMO_oHtLurSWVzbXxx1VOuxz5X_B4Y52AEPjhJ5UXw@mail.gmail.com>
References: <AS8P251MB0853D33BE6BB232B3B7519EF8162A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh4D_KCYEK14NYeHA8My+bsLU9suknntLKZU36v3PY6BuA@mail.gmail.com>
	<AS8P251MB085386B07B0AA270D1F8ACFE8163A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh5SDji9uJSoMGnYxf_W7dLJMEmaRXtwYDvNi70EunU1qw@mail.gmail.com>
	<AS8P251MB08532944C525E4C6EF1B4223816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<AS8P251MB08538D6AE2ADBA22D3F7B907816BA@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>
	<CADOhEh5pSMO_oHtLurSWVzbXxx1VOuxz5X_B4Y52AEPjhJ5UXw@mail.gmail.com>
Message-ID: <AS8P251MB0853747AA5000714F7C404578168A@AS8P251MB0853.EURP251.PROD.OUTLOOK.COM>

Yes it works,
but does not provide a correct solution and is stationary in time, I was just pointing this out.

I'll try to get as much as I can from the PetscFE object, PETSc provides already a lot of stuff, it cannot do everything.

Thanks for the support and time dedicated,

Matteo
________________________________
Da: Mark Adams <mfadams at lbl.gov>
Inviato: gioved? 19 febbraio 2026 18:15
A: Matteo Leone <matteo4.leone at mail.polimi.it>
Cc: PETSc users list <petsc-users at mcs.anl.gov>
Oggetto: Re: [petsc-users] Fw: DG methods in PETSc

Just to avoid confusion with a convoluted thread, here is my response reposed:

This was a work in progress, but it does run:  ./ex9 -dm_plex_box_faces 8,8 -dm_plex_dim 2 -dm_plex_simplex 0 -order 1 -ts_max_steps 10 -ts_monitor -velocity 1.0,0.5

I see that the Riemann solver is set:

  PetscCall(PetscDSSetRiemannSolver(ds, 0, RiemannSolver_Advection));

I would not recommend using DG because there are no tests for it.
There are tests for PetscDSSetRiemannSolver that you could clone and replace the FE/FV construction in those with DG:

src/ts/tutorials/ex11.c:  PetscCall(PetscDSSetRiemannSolver(prob, 0, user->model->physics->riemann));
src/ts/tutorials/ex18.c:  if (user->velocityDist == VEL_ZERO) PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_advection));
src/ts/tutorials/ex18.c:  else PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_coupled_advection));

If you get this far, you might want to look at internal examples (not tests) and clone these DG constructors:

src/dm/dt/fe/interface/fe.c:  PetscFECreateBrokenElement - Create a discontinuous version of the input `PetscFE`
src/dm/dt/fe/interface/fe.c:PetscErrorCode PetscFECreateBrokenElement(PetscFE cgfe, PetscFE *dgfe)
src/dm/impls/plex/plexcreate.c:        PetscCall(PetscFECreateBrokenElement(fe, &dgfe));
src/dm/interface/dmcoordinates.c:      PetscCall(PetscFECreateBrokenElement(feLinear, &dgfe));

Sorry we do not have DG fully deployed yet.

Good luck,
Mark

On Thu, Feb 19, 2026 at 12:14?PM Matteo Leone via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:


Inviato da Outlook per Android<https://urldefense.us/v3/__https://aka.ms/AAb9ysg__;!!G_uCfscf7eWS!esCSra9pt1Q2VQtIwFRk5ds5BQ6-sbeQ66hUETSln-bCt8DbHI1HPvjOPOT6hyKlCgM8Xa4rlvZ6s9Gz4-MYSg9RvFGxWhhnjWc$>
________________________________
From: Matteo Leone <matteo4.leone at mail.polimi.it<mailto:matteo4.leone at mail.polimi.it>>
Sent: Thursday, February 19, 2026 3:36:44 PM
To: Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>>
Subject: R: [petsc-users] DG methods in PETSc

Hello, I now have tested quite a bit the code, and I have some doubts.
Note I am quite new to PETSc.

I have modified a bit your code to test it and I tried to print from the functions to check if they were actually called with negative results.
The Riemann solver is never called and also the solution is just static in time.
I share the code. It's a bit modified and no more c but c++ (I do not know to write the csv file fom c, in case is a dealbreaker I'll try to give a look on how to do it).
(For reference I use PETSc 3.24.4 by means of nix, the nix flake is in the shared docs if you are used to it lmk).

There is also a small .py to handle the visualization (It was made by Claude code, but it just plots the results)
Hopefully we can go deep into this PETSc DG stuff.


Thanks in advace.
Matteo


________________________________
Da: Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>>
Inviato: mercoled? 11 febbraio 2026 17:25
A: Matteo Leone <matteo4.leone at mail.polimi.it<mailto:matteo4.leone at mail.polimi.it>>; PETSc users list <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Oggetto: Re: [petsc-users] DG methods in PETSc

Great, and keep it on the list. Lots of people here to help!

On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone <matteo4.leone at mail.polimi.it<mailto:matteo4.leone at mail.polimi.it>> wrote:
Wow thank you so much! I was almost hopeless, I'll deep dive into it and I'll give you a feedback.

Matteo

________________________________
From: Mark Adams <mfadams at lbl.gov<mailto:mfadams at lbl.gov>>
Sent: Wednesday, February 11, 2026 3:38:30 PM
To: Matteo Leone <matteo4.leone at mail.polimi.it<mailto:matteo4.leone at mail.polimi.it>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] DG methods in PETSc

DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see a test.
Here is a test but it is not well verified.
Mark

On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hello, I already posted on Reddit but just to be sure I write even here.

First thanks for the job you do for PETSc, I have used it for several projects and is  always nice.

I am writing cause  I am getting mad trying to implement DG solver in PETSc, the target is the Euler equations, however I am failing even with just the simplest transport equation (u/t + u/x = 0). I was wondering if I am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, and more, but I keep failing, I tried also with LLMs, but seems like there is no DG code with PETSc on the web, however I see many papers that do it.

I was  wondering if I am maybe missing  something out or what.

For reference I use PETSc 3.24.3 by means of nix.

Thanks in advance, cheers.

Matteo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20260220/f26948b3/attachment-0001.html>