From junchao.zhang at gmail.com Mon Feb 2 12:04:45 2026 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 2 Feb 2026 12:04:45 -0600 Subject: [petsc-users] =?utf-8?q?PETSc_Online_BoF_=E2=80=94_February_11?= =?utf-8?q?=2C_2026_=28Free_Registration=29?= Message-ID: Dear PETSc community, PETSc will host a free online Birds-of-a-Feather (BoF) session on *February 11, 2026*, from *10:00?11:30 am (Central Time, US and Canada)*. The BoF will not be recorded. The agenda is available at https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!Yah9fv8-ZyaZp-y5UUT3HzG0M5Sw3hmSKm7WgsrvebX3FP3sSq-jYj-6_etbVFAhdmoWrRayUOLjjPB9GTMk2PX-9gX-$ Please register in advance at https://urldefense.us/v3/__https://argonne.zoomgov.com/meeting/register/ay4bMcRgSZaZ-l7u9AzAzQ__;!!G_uCfscf7eWS!Yah9fv8-ZyaZp-y5UUT3HzG0M5Sw3hmSKm7WgsrvebX3FP3sSq-jYj-6_etbVFAhdmoWrRayUOLjjPB9GTMk2Nelux0G$ To receive a Zoom link, the organizer requires all participants to register individually. Registration is quick and requires only your name and email address. We look forward to your participation and to a productive and engaging discussion. Thank you, Junchao Zhang On behalf of the PETSc team -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Sun Feb 8 17:46:37 2026 From: snailsoar at hotmail.com (feng wang) Date: Sun, 8 Feb 2026 23:46:37 +0000 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Message-ID: Dear All, I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? Many thanks for your help in advance, Feng -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Sun Feb 8 19:55:45 2026 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sun, 8 Feb 2026 19:55:45 -0600 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hello Feng, It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. Thanks! --Junchao Zhang On Sun, Feb 8, 2026 at 5:46?PM feng wang wrote: > Dear All, > > I have an existing implementation of GMRES with ILU(0), it works well for > cpu now. I went through the Petsc documentation, it seems Petsc has some > support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? > > Many thanks for your help in advance, > Feng > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldo.bonfiglioli at unibas.it Mon Feb 9 06:24:27 2026 From: aldo.bonfiglioli at unibas.it (Aldo Bonfiglioli) Date: Mon, 9 Feb 2026 13:24:27 +0100 Subject: [petsc-users] Switching between sections of the same DM Message-ID: <3308564d-6195-4ca1-b9d3-86a6eb387211@unibas.it> Hi there, I am trying to switch btw two different sections (section_l(1:2)) defined on the same DM (one section for the dependent variable, the other for their gradient) > ! > ! dependent variables and their nodal gradient > ! > ??PetscCall(DMSetLocalSection(dm, section_l(1), ierr)) ! dependent > variables > ??PetscCall(DMGetLocalVector(dm, localu, ierr)) ! dependent variables > ??PetscCall(DMGlobaltoLocal(dm, u, INSERT_VALUES, localu, ierr)) > ??PetscCall(VecSet(gradu, 0.d0, ierr)) > ??PetscCall(DMSetLocalSection(dm, section_l(2), ierr)) ! gradient of > the dependent variables > ??PetscCall(DMGetLocalVector(dm, localdu, ierr)) ! gradient of the > dependent variables > ??PetscCall(VecSet(localdu, 0.d0, ierr)) > ! > callinnerloop(dm, localu, localdu, section_l, ierr) > subroutine innerloop is supposed to do work on both Vecs. When trying to do so, I get the following error: > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Clearing DM of local vectors that has a local vector > obtained with DMGetLocalVector() > as if two sections living simultaneously on the same DM are not allowed. Should I instead clone the DM and create the "second" section of the clone? Thanks, Aldo -- Dr. Aldo Bonfiglioli Associate professor of Fluid Mechanics Dipartimento di Ingegneria Universita' della Basilicata V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY tel:+39.0971.205203 fax:+39.0971.205215 web:https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!YVPmxuMFuutpshSi89LXW-8W6oF1lauFcuN2HzqwcvecXbGkhDoRlhVqyIdHvFHEXN7_zUPSCbdoeElWRBtO1fQ_1q-v4pzF0QU$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Feb 9 06:41:00 2026 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 9 Feb 2026 07:41:00 -0500 Subject: [petsc-users] Switching between sections of the same DM In-Reply-To: <3308564d-6195-4ca1-b9d3-86a6eb387211@unibas.it> References: <3308564d-6195-4ca1-b9d3-86a6eb387211@unibas.it> Message-ID: On Mon, Feb 9, 2026 at 6:29?AM Aldo Bonfiglioli wrote: > Hi there, > > I am trying to switch btw two different sections (section_l(1:2)) defined > on the same DM (one section for the dependent variable, the other for their > gradient) > > ! > ! dependent variables and their nodal gradient > ! > PetscCall(DMSetLocalSection(dm, section_l(1), ierr)) ! dependent > variables > PetscCall(DMGetLocalVector(dm, localu, ierr)) ! dependent variables > PetscCall(DMGlobaltoLocal(dm, u, INSERT_VALUES, localu, ierr)) > PetscCall(VecSet(gradu, 0.d0, ierr)) > PetscCall(DMSetLocalSection(dm, section_l(2), ierr)) ! gradient of the > dependent variables > PetscCall(DMGetLocalVector(dm, localdu, ierr)) ! gradient of the > dependent variables > PetscCall(VecSet(localdu, 0.d0, ierr)) > ! > call innerloop(dm, localu, localdu, section_l, ierr) > > subroutine innerloop is supposed to do work on both Vecs. > > When trying to do so, I get the following error: > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Clearing DM of local vectors that has a local vector > obtained with DMGetLocalVector() > > as if two sections living simultaneously on the same DM are not allowed. > > Should I instead clone the DM and create the "second" section of the clone? > > Yes, this is exactly right. It is lightweight, and this is what clone is intended for. Thanks, Matt > Thanks, > > Aldo > > -- > Dr. Aldo Bonfiglioli > Associate professor of Fluid Mechanics > Dipartimento di Ingegneria > Universita' della Basilicata > V.le dell'Ateneo Lucano, 10 85100 Potenza ITALY > tel:+39.0971.205203 fax:+39.0971.205215 > web: https://urldefense.us/v3/__http://docenti.unibas.it/site/home/docente.html?m=002423__;!!G_uCfscf7eWS!aoLehySjF-RiaGi_oyCRn_dYj6QvTlk5w028Xwvyvfl42tLcztJ0dySq0cpcEjn2g2b_md_LL-09Jcu-WXxH$ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aoLehySjF-RiaGi_oyCRn_dYj6QvTlk5w028Xwvyvfl42tLcztJ0dySq0cpcEjn2g2b_md_LL-09JRBYgR6D$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Feb 9 08:00:00 2026 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 9 Feb 2026 08:00:00 -0600 Subject: [petsc-users] =?utf-8?q?PETSc_Online_BoF_=E2=80=94_February_11?= =?utf-8?q?=2C_2026_=28Free_Registration=29?= In-Reply-To: References: Message-ID: This is a kind reminder that the PETSc birds-of-a-feather (BoF) session will take place on *February 11, 2026*, from *10:00?11:30 am (Central Time, US and Canada)*. If you have not already done so, please register in advance (see links below). We look forward to seeing you. Best regards, Junchao Zhang On Mon, Feb 2, 2026 at 12:04?PM Junchao Zhang wrote: > Dear PETSc community, > > PETSc will host a free online Birds-of-a-Feather (BoF) session on *February > 11, 2026*, from *10:00?11:30 am (Central Time, US and Canada)*. The BoF > will not be recorded. > > The agenda is available at > https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!cowQnabNB_YIg5gdeJVFG0uqoKlgZFcbGCyAA6GIoBAeNfDRREg1c1wbcxAMOan5O_LKzDTB8BbJQGTVOpYIKrtwIDXz$ > > Please register in advance at > https://urldefense.us/v3/__https://argonne.zoomgov.com/meeting/register/ay4bMcRgSZaZ-l7u9AzAzQ__;!!G_uCfscf7eWS!cowQnabNB_YIg5gdeJVFG0uqoKlgZFcbGCyAA6GIoBAeNfDRREg1c1wbcxAMOan5O_LKzDTB8BbJQGTVOpYIKvHWg0Nt$ > To receive a Zoom link, the organizer requires all participants to > register individually. Registration is quick and requires only your name > and email address. > > We look forward to your participation and to a productive and engaging > discussion. > > Thank you, > Junchao Zhang > On behalf of the PETSc team > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Mon Feb 9 16:31:44 2026 From: snailsoar at hotmail.com (feng wang) Date: Mon, 9 Feb 2026 22:31:44 +0000 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hi Junchao, Many thanks for your reply. This is great! Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me? Thanks, Feng ________________________________ From: Junchao Zhang Sent: 09 February 2026 1:55 To: feng wang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hello Feng, It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. Thanks! --Junchao Zhang On Sun, Feb 8, 2026 at 5:46?PM feng wang > wrote: Dear All, I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? Many thanks for your help in advance, Feng -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Feb 9 17:18:04 2026 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 9 Feb 2026 17:18:04 -0600 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hi Feng, At the first step, you don't need to change your CPU implementation. Then do profiling to see where it is worth putting your effort. Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage. Thanks! --Junchao Zhang On Mon, Feb 9, 2026 at 4:31?PM feng wang wrote: > Hi Junchao, > > Many thanks for your reply. > > This is great! Do I need to change anything for my current CPU > implementation? or I just link to a version of Petsc that is configured > with cuda and make sure the necessary data are copied to the "device", > then Petsc will do the rest magic for me? > > Thanks, > Feng > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 1:55 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hello Feng, > It is possible to run GMRES with ILU(0) on GPUs. You may need to > configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with > extra --download-kokkos --download-kokkos-kernels). Then run with > -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. > But triangular solve is not GPU friendly and the performance might be > poor. But you should try it, I think. > > Thanks! > --Junchao Zhang > > On Sun, Feb 8, 2026 at 5:46?PM feng wang wrote: > > Dear All, > > I have an existing implementation of GMRES with ILU(0), it works well for > cpu now. I went through the Petsc documentation, it seems Petsc has some > support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? > > Many thanks for your help in advance, > Feng > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Mon Feb 9 19:43:19 2026 From: rlmackie862 at gmail.com (Randall Mackie) Date: Mon, 9 Feb 2026 17:43:19 -0800 Subject: [petsc-users] missing Fortran interfaces Message-ID: <1867E7AD-2133-4B56-BBD1-8D47A1FC970D@gmail.com> Hi Barry and PETSc team: Is it possible that there are missing Fortran interfaces (in PETSc v3.24) for the following routines: PetscViewerASCIISynchronizedPrintf PetscViewerASCIIPushSynchronized and related routines? Thanks, Randy M. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Feb 9 21:33:53 2026 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 9 Feb 2026 22:33:53 -0500 Subject: [petsc-users] missing Fortran interfaces In-Reply-To: <1867E7AD-2133-4B56-BBD1-8D47A1FC970D@gmail.com> References: <1867E7AD-2133-4B56-BBD1-8D47A1FC970D@gmail.com> Message-ID: Added in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/9020__;!!G_uCfscf7eWS!dv9dZw32F2nr49elnSrO-lbHwxSvKM9rDw7fW9dQqjp8GqagNUgN9zM_KUGTixJNxEkaI_h-8C7kWUgV3rP6nWk$ The PetscViewerASCIIPushSynchronized should be generated automatically so it should just be there. Let us know at that MR if it is not. Barry > On Feb 9, 2026, at 8:43?PM, Randall Mackie wrote: > > Hi Barry and PETSc team: > > Is it possible that there are missing Fortran interfaces (in PETSc v3.24) for the following routines: > > PetscViewerASCIISynchronizedPrintf > > PetscViewerASCIIPushSynchronized > > > and related routines? > > > Thanks, > > Randy M. -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Tue Feb 10 04:29:55 2026 From: snailsoar at hotmail.com (feng wang) Date: Tue, 10 Feb 2026 10:29:55 +0000 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hi Junchao, Thanks for your reply. I will try that. If I have any issues, I will come back to this thread. Thanks, Feng ________________________________ From: Junchao Zhang Sent: 09 February 2026 23:18 To: feng wang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hi Feng, At the first step, you don't need to change your CPU implementation. Then do profiling to see where it is worth putting your effort. Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage. Thanks! --Junchao Zhang On Mon, Feb 9, 2026 at 4:31?PM feng wang > wrote: Hi Junchao, Many thanks for your reply. This is great! Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 1:55 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hello Feng, It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. Thanks! --Junchao Zhang On Sun, Feb 8, 2026 at 5:46?PM feng wang > wrote: Dear All, I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? Many thanks for your help in advance, Feng -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo4.leone at mail.polimi.it Mon Feb 9 18:58:22 2026 From: matteo4.leone at mail.polimi.it (Matteo Leone) Date: Tue, 10 Feb 2026 00:58:22 +0000 Subject: [petsc-users] DG methods in PETSc Message-ID: Hello, I already posted on Reddit but just to be sure I write even here. First thanks for the job you do for PETSc, I have used it for several projects and is always nice. I am writing cause I am getting mad trying to implement DG solver in PETSc, the target is the Euler equations, however I am failing even with just the simplest transport equation (u/t + u/x = 0). I was wondering if I am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, and more, but I keep failing, I tried also with LLMs, but seems like there is no DG code with PETSc on the web, however I see many papers that do it. I was wondering if I am maybe missing something out or what. For reference I use PETSc 3.24.3 by means of nix. Thanks in advance, cheers. Matteo -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Tue Feb 10 11:02:39 2026 From: rlmackie862 at gmail.com (Randall Mackie) Date: Tue, 10 Feb 2026 09:02:39 -0800 Subject: [petsc-users] missing Fortran interfaces In-Reply-To: References: <1867E7AD-2133-4B56-BBD1-8D47A1FC970D@gmail.com> Message-ID: <8C3C956A-FF59-4F9D-B7AC-76DD07761928@gmail.com> Thanks Barry, We will check this out and report back at the MR if anything is missing. Randy M. > On Feb 9, 2026, at 7:33?PM, Barry Smith wrote: > > > Added in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/9020__;!!G_uCfscf7eWS!b2PdLz9P_gp9R4H2xGEBLuTQMCOglUn8ANgxlMaH3yiB8jKlOlW0tqbFGU5BJ1C8hYtkgF0BLqzKwx_NquC-t9Q9lw$ > > The PetscViewerASCIIPushSynchronized should be generated automatically so it should just be there. Let us know at that MR if it is not. > > Barry > > >> On Feb 9, 2026, at 8:43?PM, Randall Mackie wrote: >> >> Hi Barry and PETSc team: >> >> Is it possible that there are missing Fortran interfaces (in PETSc v3.24) for the following routines: >> >> PetscViewerASCIISynchronizedPrintf >> >> PetscViewerASCIIPushSynchronized >> >> >> and related routines? >> >> >> Thanks, >> >> Randy M. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Tue Feb 10 15:57:11 2026 From: snailsoar at hotmail.com (feng wang) Date: Tue, 10 Feb 2026 21:57:11 +0000 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hi Junchao, I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda. It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs. I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver. My question is: my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice? Thanks, Feng ________________________________ From: Junchao Zhang Sent: 09 February 2026 23:18 To: feng wang Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hi Feng, At the first step, you don't need to change your CPU implementation. Then do profiling to see where it is worth putting your effort. Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage. Thanks! --Junchao Zhang On Mon, Feb 9, 2026 at 4:31?PM feng wang > wrote: Hi Junchao, Many thanks for your reply. This is great! Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 1:55 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hello Feng, It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. Thanks! --Junchao Zhang On Sun, Feb 8, 2026 at 5:46?PM feng wang > wrote: Dear All, I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? Many thanks for your help in advance, Feng -------------- next part -------------- An HTML attachment was scrubbed... URL: From drwells at email.unc.edu Tue Feb 10 16:31:53 2026 From: drwells at email.unc.edu (Wells, David) Date: Tue, 10 Feb 2026 22:31:53 +0000 Subject: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc. Message-ID: Hello, I've been profiling the memory usage of my solver and it looks like a huge number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I read through the source code and it looks like these vectors are allocated ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that KSP. Is there a way to change this value? If not - how hard would it be to add an API to set a different initial value for that? These vectors take up a lot of memory and I would rather just one at a time. Best, David Wells -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Feb 10 17:28:09 2026 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Feb 2026 18:28:09 -0500 Subject: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc. In-Reply-To: References: Message-ID: On Tue, Feb 10, 2026 at 5:32?PM Wells, David via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello, > > I've been profiling the memory usage of my solver and it looks like a huge > number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I > read through the source code and it looks like these vectors are allocated > ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that > KSP. Is there a way to change this value? > We could add an option to change this delta. Actually theory suggests that a constant is not optimal, but rather we should double the number each time. I would also be willing to code that. > If not - how hard would it be to add an API to set a different initial > value for that? These vectors take up a lot of memory and I would rather > just one at a time. > I cannot understand precisely what is happening here. You specify a restart size when you setup the KSP. It allocates that many vecs (roughly). Why are there reallocations? Do you increase the restart size during the iteration? Thanks, Matt > Best, > David Wells > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bSUDVgOBX_MDxTjSivXOuXsYl5KKCBhJDYseaa3Gb4DMURCyG3nv1cAESszMf_OsNZRuR6JWB7VFthuvYWD6$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From drwells at email.unc.edu Tue Feb 10 20:04:20 2026 From: drwells at email.unc.edu (Wells, David) Date: Wed, 11 Feb 2026 02:04:20 +0000 Subject: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc. In-Reply-To: References: Message-ID: Hi Matt, Thanks for the quick response! > I cannot understand precisely what is happening here. You specify a restart > size when you setup the KSP. It allocates that many vecs (roughly). Why are > there reallocations? Do you increase the restart size during the iteration? I don't believe there are any reallocations (I didn't write this solver, but I don't see any calls which set the restart size or any other relevant parameter [1]): as far as I can tell, the solver just allocates a lot of vectors. I'm working off of traces computed by heaptrack, which is my only insight into how this works. The allocations come from KSPCreateVecs(), which is called by 1. KSPFGMRESGetNewVectors() (for about 1.7 GB [2] of memory) 2. KSPSetUp_GMRES() (for about 300 MB of memory) 3. KSPSetUp_FGMRES() (for about 264 MB of memory) 4. KSPSetWorkVecs() (for about 236 MB of memory) Is there some relevant set of monitoring flags I can set which will show me how many vectors I allocate or use? That would also help. Best, David [1] This is IBAMR's PETScKrylovLinearSolver. [2] This is half the total memory we use for side-centered data vectors. ________________________________ From: Matthew Knepley Sent: Tuesday, February 10, 2026 6:28 PM To: Wells, David Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc. On Tue, Feb 10, 2026 at 5:32?PM Wells, David via petsc-users > wrote: Hello, I've been profiling the memory usage of my solver and it looks like a huge number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I read through the source code and it looks like these vectors are allocated ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that KSP. Is there a way to change this value? We could add an option to change this delta. Actually theory suggests that a constant is not optimal, but rather we should double the number each time. I would also be willing to code that. If not - how hard would it be to add an API to set a different initial value for that? These vectors take up a lot of memory and I would rather just one at a time. I cannot understand precisely what is happening here. You specify a restart size when you setup the KSP. It allocates that many vecs (roughly). Why are there reallocations? Do you increase the restart size during the iteration? Thanks, Matt Best, David Wells -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aBTZ4oaySeEsyg3U98-DZfMmoj0Wg8RUNEzlPdQoEnZ6pfCuP3cZSF7ib3bqpT7GBGVct81F2vjmak_Zva98zbh7D1M$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Feb 10 20:32:39 2026 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Feb 2026 21:32:39 -0500 Subject: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc. In-Reply-To: References: Message-ID: <6F3A41A6-90B5-475D-88A3-C2BF6AE53547@petsc.dev> 1) For a fixed restart (say of 30) FGMRES needs 60 vectors, while GMRES only needs 30. This is a big disadvantage of FGMRES over GMRES. 2) By default PETSc GMRES uses a restart of 30 meaning it keeps 30 previous Krylov vectors (and FGMRES needs 60 vectors). You can use a smaller restart with KSPGMRESSetRestart or -ksp_gmres_restart to need less memory (of course the convergence may get far worse or not depending on the problem. 3) When GMRES (or FGMRES) starts up it does not immediately allocate all 30 (or whatever) restart vectors because it may be that GMRES only takes 15 steps so why allocate all of them? Instead it allocates a chunk at a time GMRES_DELTA_DIRECTIONS which is 10 when it uses up the 10 it allocates another 10 (if needed) etc until it gets to the restart You can force GMRES to allocate all 30 (or whatever) initially instead of the chunk of a time approach by using ?KSPGMRESSetPreAllocateVectors() or -ksp_gmres_preallocate This will prevent confusion about why more vectors are allocated later and why they are not all allocated when the solve starts. 4) PETSc?s GMRES tries to use BLAS 2 operations for MDot() and MAXPY (the orthogonalization in GMRES). It can only use the BLAS 2 on vector chunks that are allocated together. By preallocating all the vectors at the beginning one gets a single chunk and hence more efficient orthogonalization; this is more important on GPUs. For CPUs whether you have 10 or 30 vectors together doesn?t matter much at all. I hope this clarifies why you are seeing the memory allocations. Note that these are NOT ?reallocations? in the sense of KSPGMRES allocating more memory and then copying something into the new memory and freeing the old. They are just allocations of new memory which will then be used. Barry > On Feb 10, 2026, at 9:04?PM, Wells, David via petsc-users wrote: > > Hi Matt, > > Thanks for the quick response! > > > I cannot understand precisely what is happening here. You specify a restart > > size when you setup the KSP. It allocates that many vecs (roughly). Why are > > there reallocations? Do you increase the restart size during the iteration? > > I don't believe there are any reallocations (I didn't write this solver, but I > don't see any calls which set the restart size or any other relevant parameter [1]): > as far as I can tell, the solver just allocates a lot of vectors. I'm working > off of traces computed by heaptrack, which is my only insight into how this > works. The allocations come from KSPCreateVecs(), which is called by > 1. KSPFGMRESGetNewVectors() (for about 1.7 GB [2] of memory) > 2. KSPSetUp_GMRES() (for about 300 MB of memory) > 3. KSPSetUp_FGMRES() (for about 264 MB of memory) > 4. KSPSetWorkVecs() (for about 236 MB of memory) > > Is there some relevant set of monitoring flags I can set which will show me how > many vectors I allocate or use? That would also help. > > Best, > David > > [1] This is IBAMR's PETScKrylovLinearSolver. > [2] This is half the total memory we use for side-centered data vectors. > From: Matthew Knepley > Sent: Tuesday, February 10, 2026 6:28 PM > To: Wells, David > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc. > > On Tue, Feb 10, 2026 at 5:32?PM Wells, David via petsc-users > wrote: > Hello, > > I've been profiling the memory usage of my solver and it looks like a huge number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I read through the source code and it looks like these vectors are allocated ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that KSP. Is there a way to change this value? > > We could add an option to change this delta. Actually theory suggests that a constant is not optimal, but rather we should double the number each time. I would also be willing to code that. > > If not - how hard would it be to add an API to set a different initial value for that? These vectors take up a lot of memory and I would rather just one at a time. > > I cannot understand precisely what is happening here. You specify a restart size when you setup the KSP. It allocates that many vecs (roughly). Why are there reallocations? Do you increase the restart size during the iteration? > > Thanks, > > Matt > > Best, > David Wells > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f_cD6IEkmZUnH3gBVZJzlw3nqwNOD8FwvFPE32j3bn32qr-X0lkuz3NQ5CjDMqt5xLP-9CV1Et5Me4cGEVdFSVo$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Feb 10 21:00:38 2026 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 10 Feb 2026 21:00:38 -0600 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Sorry, I don't understand your question. What blocks you from running your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about the algorithms. --Junchao Zhang On Tue, Feb 10, 2026 at 3:57?PM feng wang wrote: > Hi Junchao, > > I have managed to configure Petsc for GPU, also managed to run ksp/ex15 > using -mat_type aijcusparse -vec_type cuda. It seems runs much faster > compared to the scenario if I don't use " -mat_type aijcusparse -vec_type > cuda". so I believe it runs okay for GPUs. > > I have an existing CFD code that runs natively on GPUs. so all the data is > offloaded to GPU at the beginning and some data are copied back to the cpu > at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for > the implicit solver. *My question is: my code also has a GMRES+ILU(0) > implemented with Petsc but it only runs on cpus (which I implemented a few > years ago). How can I replace the existing Newton-Jacobi (which runs in > GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give > some advice?* > > Thanks, > Feng > > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 23:18 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hi Feng, > At the first step, you don't need to change your CPU implementation. > Then do profiling to see where it is worth putting your effort. Maybe you > need to assemble your matrices and vectors on GPUs too, but decide that at > a later stage. > > Thanks! > --Junchao Zhang > > > On Mon, Feb 9, 2026 at 4:31?PM feng wang wrote: > > Hi Junchao, > > Many thanks for your reply. > > This is great! Do I need to change anything for my current CPU > implementation? or I just link to a version of Petsc that is configured > with cuda and make sure the necessary data are copied to the "device", > then Petsc will do the rest magic for me? > > Thanks, > Feng > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 1:55 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hello Feng, > It is possible to run GMRES with ILU(0) on GPUs. You may need to > configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with > extra --download-kokkos --download-kokkos-kernels). Then run with > -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. > But triangular solve is not GPU friendly and the performance might be > poor. But you should try it, I think. > > Thanks! > --Junchao Zhang > > On Sun, Feb 8, 2026 at 5:46?PM feng wang wrote: > > Dear All, > > I have an existing implementation of GMRES with ILU(0), it works well for > cpu now. I went through the Petsc documentation, it seems Petsc has some > support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? > > Many thanks for your help in advance, > Feng > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Wed Feb 11 04:47:51 2026 From: snailsoar at hotmail.com (feng wang) Date: Wed, 11 Feb 2026 10:47:51 +0000 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hi Junchao, Thanks for your reply. Probably I did not phrase it in a clear way. I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code. For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system. The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal? BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly Thanks, Feng ________________________________ From: Junchao Zhang Sent: 11 February 2026 3:00 To: feng wang Cc: petsc-users at mcs.anl.gov ; Barry Smith Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Sorry, I don't understand your question. What blocks you from running your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about the algorithms. --Junchao Zhang On Tue, Feb 10, 2026 at 3:57?PM feng wang > wrote: Hi Junchao, I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda. It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs. I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver. My question is: my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 23:18 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hi Feng, At the first step, you don't need to change your CPU implementation. Then do profiling to see where it is worth putting your effort. Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage. Thanks! --Junchao Zhang On Mon, Feb 9, 2026 at 4:31?PM feng wang > wrote: Hi Junchao, Many thanks for your reply. This is great! Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 1:55 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hello Feng, It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. Thanks! --Junchao Zhang On Sun, Feb 8, 2026 at 5:46?PM feng wang > wrote: Dear All, I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? Many thanks for your help in advance, Feng -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 11 07:42:22 2026 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Feb 2026 08:42:22 -0500 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: On Wed, Feb 11, 2026 at 5:55?AM feng wang wrote: > Hi Junchao, > > Thanks for your reply. Probably I did not phrase it in a clear way. > > I am using openACC to port the CFD code to the GPU, so the CPU and the GPU > version essentially share the same source code. For the original CPU > version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve > the sparse linear system. > > The current GPU version of the code only port the Jacobi solver to the > GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to > make to the existing CPU version of GMRES+ILU(0) to achieve this goal? > I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work? Thanks, Matt > BTW: For performance the GPU version of the CFD code has minimum > communication between the CPU and GPU, so for Ax=b, A, x and b are created > in the GPU directly > > Thanks, > Feng > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* 11 February 2026 3:00 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov ; Barry Smith < > bsmith at petsc.dev> > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Sorry, I don't understand your question. What blocks you from running > your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about > the algorithms. > > --Junchao Zhang > > > On Tue, Feb 10, 2026 at 3:57?PM feng wang wrote: > > Hi Junchao, > > I have managed to configure Petsc for GPU, also managed to run ksp/ex15 > using -mat_type aijcusparse -vec_type cuda. It seems runs much faster > compared to the scenario if I don't use " -mat_type aijcusparse -vec_type > cuda". so I believe it runs okay for GPUs. > > I have an existing CFD code that runs natively on GPUs. so all the data is > offloaded to GPU at the beginning and some data are copied back to the cpu > at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for > the implicit solver. *My question is: my code also has a GMRES+ILU(0) > implemented with Petsc but it only runs on cpus (which I implemented a few > years ago). How can I replace the existing Newton-Jacobi (which runs in > GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give > some advice?* > > Thanks, > Feng > > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 23:18 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hi Feng, > At the first step, you don't need to change your CPU implementation. > Then do profiling to see where it is worth putting your effort. Maybe you > need to assemble your matrices and vectors on GPUs too, but decide that at > a later stage. > > Thanks! > --Junchao Zhang > > > On Mon, Feb 9, 2026 at 4:31?PM feng wang wrote: > > Hi Junchao, > > Many thanks for your reply. > > This is great! Do I need to change anything for my current CPU > implementation? or I just link to a version of Petsc that is configured > with cuda and make sure the necessary data are copied to the "device", > then Petsc will do the rest magic for me? > > Thanks, > Feng > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 1:55 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hello Feng, > It is possible to run GMRES with ILU(0) on GPUs. You may need to > configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with > extra --download-kokkos --download-kokkos-kernels). Then run with > -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. > But triangular solve is not GPU friendly and the performance might be > poor. But you should try it, I think. > > Thanks! > --Junchao Zhang > > On Sun, Feb 8, 2026 at 5:46?PM feng wang wrote: > > Dear All, > > I have an existing implementation of GMRES with ILU(0), it works well for > cpu now. I went through the Petsc documentation, it seems Petsc has some > support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? > > Many thanks for your help in advance, > Feng > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f7i0WZFkMTRDMqONYUCeDNn9XDQjXS7bps7XWsgAlnO54oH90yfmvfuu-0QJAbxqCNYSof3G34TuqHuiTIAb$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Feb 11 08:38:30 2026 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 11 Feb 2026 09:38:30 -0500 Subject: [petsc-users] DG methods in PETSc In-Reply-To: References: Message-ID: DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see a test. Here is a test but it is not well verified. Mark On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello, I already posted on Reddit but just to be sure I write even here. > > First thanks for the job you do for PETSc, I have used it for several > projects and is always nice. > > I am writing cause I am getting mad trying to implement DG solver in > PETSc, the target is the Euler equations, however I am failing even with > just the simplest transport equation (u/t + u/x = 0). I was wondering if I > am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, > and more, but I keep failing, I tried also with LLMs, but seems like there > is no DG code with PETSc on the web, however I see many papers that do it. > > I was wondering if I am maybe missing something out or what. > > For reference I use PETSc 3.24.3 by means of nix. > > Thanks in advance, cheers. > > Matteo > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex9.c Type: application/octet-stream Size: 10367 bytes Desc: not available URL: From mpll at dhigroup.com Wed Feb 11 02:43:04 2026 From: mpll at dhigroup.com (Dr. Milan Pelletier) Date: Wed, 11 Feb 2026 08:43:04 +0000 Subject: [petsc-users] Error when configuring PETSc with CUDA on Windows Message-ID: Dear PETSc team, I am currently trying to upgrade my PETSc package on Windows to 3.24.4, and I'm facing issues when configuring with CUDA. Unfortunately, I cannot clearly see the error that causes the configure process to bail out Please find enclosed the configure.log file. My laptop setup: * OS: Windows 11 23H2 (if that helps, running "uname -a" in Cygwin yields: "CYGWIN_NT-10.0-22631 MPLL-PC2 3.6.6-1.x86_64 2026-01-09 17:39 UTC x86_64 Cygwin") * CPU: Intel Core i7-1370P * RAM: 32 GB * MPI: Intel oneAPI MPI * Compiler: MSVC v19.44.35222 * CUDA toolkit v13.1 Please let me know if some information is missing. Many thanks for your help, Kind regards, Dr. Milan Pelletier -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 3572665 bytes Desc: configure.log URL: From balay.anl at fastmail.org Wed Feb 11 09:07:00 2026 From: balay.anl at fastmail.org (Satish Balay) Date: Wed, 11 Feb 2026 09:07:00 -0600 (CST) Subject: [petsc-users] Error when configuring PETSc with CUDA on Windows In-Reply-To: References: Message-ID: Sorry - PETSc+CUDA does not currently work on windows [and we have not been able to get a successful build here] Satish On Wed, 11 Feb 2026, Dr. Milan Pelletier wrote: > Dear PETSc team, > > I am currently trying to upgrade my PETSc package on Windows to 3.24.4, and I'm facing issues when configuring with CUDA. > Unfortunately, I cannot clearly see the error that causes the configure process to bail out > Please find enclosed the configure.log file. > > My laptop setup: > > * > OS: Windows 11 23H2 (if that helps, running "uname -a" in Cygwin yields: "CYGWIN_NT-10.0-22631 MPLL-PC2 3.6.6-1.x86_64 2026-01-09 17:39 UTC x86_64 Cygwin") > * > CPU: Intel Core i7-1370P > * > RAM: 32 GB > * > MPI: Intel oneAPI MPI > * > Compiler: MSVC v19.44.35222 > * > CUDA toolkit v13.1 > > Please let me know if some information is missing. > Many thanks for your help, > > Kind regards, > Dr. Milan Pelletier > From junchao.zhang at gmail.com Wed Feb 11 09:17:57 2026 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 11 Feb 2026 09:17:57 -0600 Subject: [petsc-users] Happening in less 45 minutes -- PETSc Online BoF (Free Registration) In-Reply-To: References: Message-ID: The BoF session will begin in less than 45 minutes. The Zoom is already open for testing (with the CASS intro slides). See link info below. See you soon! -Junchao > On Mon, Feb 2, 2026 at 12:04?PM Junchao Zhang > wrote: > >> Dear PETSc community, >> >> PETSc will host a free online Birds-of-a-Feather (BoF) session on *February >> 11, 2026*, from *10:00?11:30 am (Central Time, US and Canada)*. The BoF >> will not be recorded. >> >> The agenda is available at >> https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!fR02TLjeNiZskbApDs8e50ol_dTg0JXjJge8YtUT75h0T4h6Lxk1vSXCKfyQ5QOZX1bUYowZsUZ8yUSePt52D5vogrCO$ >> >> Please register in advance at >> https://urldefense.us/v3/__https://argonne.zoomgov.com/meeting/register/ay4bMcRgSZaZ-l7u9AzAzQ__;!!G_uCfscf7eWS!fR02TLjeNiZskbApDs8e50ol_dTg0JXjJge8YtUT75h0T4h6Lxk1vSXCKfyQ5QOZX1bUYowZsUZ8yUSePt52D1HyPQTE$ >> To receive a Zoom link, the organizer requires all participants to >> register individually. Registration is quick and requires only your name >> and email address. >> >> We look forward to your participation and to a productive and engaging >> discussion. >> >> Thank you, >> Junchao Zhang >> On behalf of the PETSc team >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Wed Feb 11 09:58:48 2026 From: snailsoar at hotmail.com (feng wang) Date: Wed, 11 Feb 2026 15:58:48 +0000 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hi Mat, Thanks for your reply. Maybe I am overthinking it. ksp/ex15 works fine with GPUs. To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device. Below is a snippet of my current petsc implementation. Suppose I have: ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr); ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr); //duplicate ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr); //create preconditioning matrix ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr); If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device? Below is how I assign values for the matrix: nnz=0; for(jv=0; jv Sent: 11 February 2026 13:42 To: feng wang Cc: Junchao Zhang ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU On Wed, Feb 11, 2026 at 5:55?AM feng wang > wrote: Hi Junchao, Thanks for your reply. Probably I did not phrase it in a clear way. I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code. For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system. The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal? I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work? Thanks, Matt BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 11 February 2026 3:00 To: feng wang > Cc: petsc-users at mcs.anl.gov >; Barry Smith > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Sorry, I don't understand your question. What blocks you from running your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about the algorithms. --Junchao Zhang On Tue, Feb 10, 2026 at 3:57?PM feng wang > wrote: Hi Junchao, I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda. It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs. I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver. My question is: my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 23:18 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hi Feng, At the first step, you don't need to change your CPU implementation. Then do profiling to see where it is worth putting your effort. Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage. Thanks! --Junchao Zhang On Mon, Feb 9, 2026 at 4:31?PM feng wang > wrote: Hi Junchao, Many thanks for your reply. This is great! Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 1:55 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hello Feng, It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. Thanks! --Junchao Zhang On Sun, Feb 8, 2026 at 5:46?PM feng wang > wrote: Dear All, I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? Many thanks for your help in advance, Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d_EEF6jyNQtfEhGsUPb8_rQ08cf8731pynjpilB9qWxrAda9t0oHNPkHWuJRVp1YEtQPM66JtPnQ9YHHHcW44l3FHA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Feb 11 10:25:53 2026 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 11 Feb 2026 11:25:53 -0500 Subject: [petsc-users] DG methods in PETSc In-Reply-To: References: Message-ID: Great, and keep it on the list. Lots of people here to help! On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone wrote: > Wow thank you so much! I was almost hopeless, I'll deep dive into it and > I'll give you a feedback. > > Matteo > > ------------------------------ > *From:* Mark Adams > *Sent:* Wednesday, February 11, 2026 3:38:30 PM > *To:* Matteo Leone > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] DG methods in PETSc > > DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see > a test. > Here is a test but it is not well verified. > Mark > > On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello, I already posted on Reddit but just to be sure I write even here. > > First thanks for the job you do for PETSc, I have used it for several > projects and is always nice. > > I am writing cause I am getting mad trying to implement DG solver in > PETSc, the target is the Euler equations, however I am failing even with > just the simplest transport equation (u/t + u/x = 0). I was wondering if I > am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, > and more, but I keep failing, I tried also with LLMs, but seems like there > is no DG code with PETSc on the web, however I see many papers that do it. > > I was wondering if I am maybe missing something out or what. > > For reference I use PETSc 3.24.3 by means of nix. > > Thanks in advance, cheers. > > Matteo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Feb 11 10:32:37 2026 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 11 Feb 2026 11:32:37 -0500 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: On Wed, Feb 11, 2026 at 10:58?AM feng wang wrote: > Hi Mat, > > Thanks for your reply. Maybe I am overthinking it. > > ksp/ex15 works fine with GPUs. > > To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how > Petsc handle the memory in the host and the device. > > Below is a snippet of my current petsc implementation. Suppose I have: > > ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, > blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr); > This is the problem. Right now VecGhost hardcodes the use of VECSEQ and VECMPI. This is not necessary, and the local and global representations could indeed be device types. Is ghost necessary right now? > ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr); > > //duplicate > ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr); > > //create preconditioning matrix > ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, > nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, > maxneig, NULL, maxneig, NULL, &petsc_A_pre); > CHKERRQ(ierr); > I would not create the specific type. Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc. > *If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and > vectors directly created in the device?* > > Below is how I assign values for the matrix: > > nnz=0; > for(jv=0; jv { > for(iv=0; iv { > values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the > left hand side is [I/dt + (-J)] > nnz++; > } > } > > idxm[0] = ig_mat[iql]; > idxn[0] = ig_mat[iqr]; > ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values, > ADD_VALUES); > CHKERRQ(ierr); > } > > *Does petsc first set the value in the host and copy it to the device or > the value is directly assigned in the device. in the 2nd case, I would need > change my code a bit, since I need to make sure the data is in the device > in the first place.* > Yes, you would need to set the values on device for maximum efficiency (although I would try it out with CPU construction first). You can do this best on the GPU using MatSetValuesCOO(). Thanks, Matt > Thanks, > Feng > > > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 11 February 2026 13:42 > *To:* feng wang > *Cc:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > On Wed, Feb 11, 2026 at 5:55?AM feng wang wrote: > > Hi Junchao, > > Thanks for your reply. Probably I did not phrase it in a clear way. > > I am using openACC to port the CFD code to the GPU, so the CPU and the GPU > version essentially share the same source code. For the original CPU > version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve > the sparse linear system. > > The current GPU version of the code only port the Jacobi solver to the > GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to > make to the existing CPU version of GMRES+ILU(0) to achieve this goal? > > > I think what Junchao is saying, is that if you use the GPU vec and mat > types, this should be running on the GPU already. Does that not work? > > Thanks, > > Matt > > > BTW: For performance the GPU version of the CFD code has minimum > communication between the CPU and GPU, so for Ax=b, A, x and b are created > in the GPU directly > > Thanks, > Feng > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* 11 February 2026 3:00 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov ; Barry Smith < > bsmith at petsc.dev> > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Sorry, I don't understand your question. What blocks you from running > your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about > the algorithms. > > --Junchao Zhang > > > On Tue, Feb 10, 2026 at 3:57?PM feng wang wrote: > > Hi Junchao, > > I have managed to configure Petsc for GPU, also managed to run ksp/ex15 > using -mat_type aijcusparse -vec_type cuda. It seems runs much faster > compared to the scenario if I don't use " -mat_type aijcusparse -vec_type > cuda". so I believe it runs okay for GPUs. > > I have an existing CFD code that runs natively on GPUs. so all the data is > offloaded to GPU at the beginning and some data are copied back to the cpu > at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for > the implicit solver. *My question is: my code also has a GMRES+ILU(0) > implemented with Petsc but it only runs on cpus (which I implemented a few > years ago). How can I replace the existing Newton-Jacobi (which runs in > GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give > some advice?* > > Thanks, > Feng > > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 23:18 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hi Feng, > At the first step, you don't need to change your CPU implementation. > Then do profiling to see where it is worth putting your effort. Maybe you > need to assemble your matrices and vectors on GPUs too, but decide that at > a later stage. > > Thanks! > --Junchao Zhang > > > On Mon, Feb 9, 2026 at 4:31?PM feng wang wrote: > > Hi Junchao, > > Many thanks for your reply. > > This is great! Do I need to change anything for my current CPU > implementation? or I just link to a version of Petsc that is configured > with cuda and make sure the necessary data are copied to the "device", > then Petsc will do the rest magic for me? > > Thanks, > Feng > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 1:55 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hello Feng, > It is possible to run GMRES with ILU(0) on GPUs. You may need to > configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with > extra --download-kokkos --download-kokkos-kernels). Then run with > -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. > But triangular solve is not GPU friendly and the performance might be > poor. But you should try it, I think. > > Thanks! > --Junchao Zhang > > On Sun, Feb 8, 2026 at 5:46?PM feng wang wrote: > > Dear All, > > I have an existing implementation of GMRES with ILU(0), it works well for > cpu now. I went through the Petsc documentation, it seems Petsc has some > support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? > > Many thanks for your help in advance, > Feng > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d-KREIqaQAY300svdh3ajsScQ_ugv65710AClzWVTz0yXoUEtJbgNcpLxH1j3QQwkdKAAp5LBC2O5kFvTZ2y$ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!d-KREIqaQAY300svdh3ajsScQ_ugv65710AClzWVTz0yXoUEtJbgNcpLxH1j3QQwkdKAAp5LBC2O5kFvTZ2y$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Feb 11 12:04:09 2026 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 11 Feb 2026 12:04:09 -0600 Subject: [petsc-users] =?utf-8?q?PETSc_Online_BoF_=E2=80=94_February_11?= =?utf-8?q?=2C_2026_=28Free_Registration=29?= In-Reply-To: References: Message-ID: Thanks to all for participating in the PETSc BoF. Sorry, we were running out of time and did have a good time for open discussion. If you have questions about the talks, and feature requests, you can reach us by this mailing list, or on the PETSc Discord server, https://urldefense.us/v3/__https://discord.gg/Fqm8r6Gcyb__;!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysg3J7YYF6$ . The #general and help-forum channels are all good places to ask questions. The slides of the talks (if available) will be uploaded to the agenda page soon, https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysgxjBj9Of$ Thank you! --Junchao Zhang On Mon, Feb 2, 2026 at 12:04?PM Junchao Zhang wrote: > Dear PETSc community, > > PETSc will host a free online Birds-of-a-Feather (BoF) session on *February > 11, 2026*, from *10:00?11:30 am (Central Time, US and Canada)*. The BoF > will not be recorded. > > The agenda is available at > https://urldefense.us/v3/__https://petsc.org/release/community/bofs/2026_Feb_CASS/*feb-cass-petsc-bof__;Iw!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysgxjBj9Of$ > > Please register in advance at > https://urldefense.us/v3/__https://argonne.zoomgov.com/meeting/register/ay4bMcRgSZaZ-l7u9AzAzQ__;!!G_uCfscf7eWS!cQ6K7l63oIttr7TZ6t-OhsW-E0bdVyYIuttT1fQDAKGokhzdqT3GyKa9OLNRFkQWEXhnvGWxd5hsyjBZw4ysg57bpbRf$ > To receive a Zoom link, the organizer requires all participants to > register individually. Registration is quick and requires only your name > and email address. > > We look forward to your participation and to a productive and engaging > discussion. > > Thank you, > Junchao Zhang > On behalf of the PETSc team > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.scotto at irt-saintexupery.com Thu Feb 12 05:47:41 2026 From: alexandre.scotto at irt-saintexupery.com (SCOTTO Alexandre) Date: Thu, 12 Feb 2026 11:47:41 +0000 Subject: [petsc-users] Scalability Message-ID: Dear PETSc community, I have conducted a quick strong scalability-like test on direct and adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, distributed over 2, 4, ..., 32 and 64 processes and the results I obtained were not so great. I am not very confident in my setup, so a as a matter of reference, is there any available results on weak and strong scalability of PETSc.Mat mult() and multTranspose() operations? Best regards, Alexandre. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 12 08:00:30 2026 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Feb 2026 09:00:30 -0500 Subject: [petsc-users] Scalability In-Reply-To: References: Message-ID: On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear PETSc community, > > > > I have conducted a quick strong scalability-like test on direct and > adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, > distributed over 2, 4, ?, 32 and 64 processes and the results I obtained > were not so great. > > I am not very confident in my setup, so a as a matter of reference, is > there any available results on weak and strong scalability of PETSc.Mat > mult() and multTranspose() operations? > 1. This behavior depends on available bandwidth, not on cores. Do you know the bandwidth for your configurations? 2. Strong scaling depends heavily on matrix sparsity. If inevitably declines, but slower with more work to do. 3. We published a paper on performance recently: https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NIIoVXnb$ Thanks, Matt > > > Best regards, > > Alexandre. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zr5jUpk1srGDF2h9mXmw_GIn1OFZ2g3APzC0JHZREcxRzzy-2Oz2yyBWtzSI6F21kV4W_ubmc7A0NPzkrito$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Thu Feb 12 08:37:07 2026 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Thu, 12 Feb 2026 15:37:07 +0100 Subject: [petsc-users] DMplex periodicity type Message-ID: <0daa1308-59df-4921-85b3-2e1a18be0b86@uninsubria.it> Dear all, ? ? regarding the two possible ways to represent periodic meshes in DMPlex (https://urldefense.us/v3/__https://petsc.org/release/manual/dmplex/*__;Iw!!G_uCfscf7eWS!aZty3mcKlmFws6oDH6dzJajWmHL38hq4105g81qWT-9vNW5RcQ7t39SXetxoJEpVVF_VJno1viNSyiomGVPsLrvdvU75tWZbkMCJaQ$ ), I have noticed that when reading in a GMSH mesh, the periodic topology is created. Is there a way to read in a mesh and create instead the non-periodic topology&local-to-global-map kind of mesh? Or, is it possible to convert between the two approaces (i.e. generate a DMPlex that is a quasi-clone of a given one, but ready for the other kind of periodicity handling)? Bets regards ? ? Matteo -- Prof. Matteo Semplice Universit? degli Studi dell?Insubria Dipartimento di Scienza e Alta Tecnologia ? DiSAT Professore Associato Via Valleggio, 11 ? 22100 Como (CO) ? Italia tel.: +39 031 2386316 From knepley at gmail.com Thu Feb 12 08:41:41 2026 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Feb 2026 09:41:41 -0500 Subject: [petsc-users] DMplex periodicity type In-Reply-To: <0daa1308-59df-4921-85b3-2e1a18be0b86@uninsubria.it> References: <0daa1308-59df-4921-85b3-2e1a18be0b86@uninsubria.it> Message-ID: On Thu, Feb 12, 2026 at 9:37?AM Matteo Semplice via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear all, > > regarding the two possible ways to represent periodic meshes in > DMPlex ( > https://urldefense.us/v3/__https://petsc.org/release/manual/dmplex/*__;Iw!!G_uCfscf7eWS!aZty3mcKlmFws6oDH6dzJajWmHL38hq4105g81qWT-9vNW5RcQ7t39SXetxoJEpVVF_VJno1viNSyiomGVPsLrvdvU75tWZbkMCJaQ$ > ), I have noticed that > when reading in a GMSH mesh, the periodic topology is created. > > Is there a way to read in a mesh and create instead the non-periodic > topology&local-to-global-map kind of mesh? That code is not in there. This seems easier to write. > Or, is it possible to convert > between the two approaces (i.e. generate a DMPlex that is a quasi-clone > of a given one, but ready for the other kind of periodicity handling)? > This code is not either. This seems harder because you have to choose some periodic boundary to tear along. I do not really understand the other periodicity, so it would be difficult for me to do before the Fall. The libCEED guys wrote the map version. Thanks, Matt > Bets regards > > Matteo > > -- > Prof. Matteo Semplice > Universit? degli Studi dell?Insubria > Dipartimento di Scienza e Alta Tecnologia ? DiSAT > Professore Associato > Via Valleggio, 11 ? 22100 Como (CO) ? Italia > tel.: +39 031 2386316 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dBeL3LumXFsm8dLmfMgJBRhh1F7zXrHaGD1sf6lSy7pnnzjs2EvBJNzne0yCEx6vObTmNqz-Q0nJ2jo8JqHA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Feb 12 10:09:47 2026 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 12 Feb 2026 11:09:47 -0500 Subject: [petsc-users] Scalability In-Reply-To: References: Message-ID: The problem size is also very small. Typically one cannot get speedup when the number of variables per MPI rank is below on the order of 10,000. In your 64 process case you only have 390 variables. I would be stunned with any kind of speedup for such sizes. Run a problem at least 10 times bigger, better yet 20 times. > On Feb 12, 2026, at 9:00?AM, Matthew Knepley wrote: > > On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users > wrote: >> Dear PETSc community, >> >> >> >> I have conducted a quick strong scalability-like test on direct and adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, distributed over 2, 4, ?, 32 and 64 processes and the results I obtained were not so great. >> >> I am not very confident in my setup, so a as a matter of reference, is there any available results on weak and strong scalability of PETSc.Mat mult() and multTranspose() operations? >> > > 1. This behavior depends on available bandwidth, not on cores. Do you know the bandwidth for your configurations? > > 2. Strong scaling depends heavily on matrix sparsity. If inevitably declines, but slower with more work to do. > > 3. We published a paper on performance recently: https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!YF3tHamDD69QbY3u6HDwK1Ud9HoLgM-UWJ__gqZwYCS7b4H7RrYYNe6xkgbg7udGqPEhw2n6a8hEURca5-7lmBU$ > > Thanks, > > Matt >> >> Best regards, >> >> Alexandre. >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!YF3tHamDD69QbY3u6HDwK1Ud9HoLgM-UWJ__gqZwYCS7b4H7RrYYNe6xkgbg7udGqPEhw2n6a8hEURca4quC9R0$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Thu Feb 12 17:14:16 2026 From: snailsoar at hotmail.com (feng wang) Date: Thu, 12 Feb 2026 23:14:16 +0000 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hi Mat, Thanks for your reply. For "VecCreateGhostBlock", The CPU version runs in parallel, if we are solving Ax=b, so it also stores the halos in x and b for each partition. This is how my old implementation was done. If the current GPU implementation does not support halos, I can stick to one GPU for the moment. or is there a way around this? Regarding to "Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.", my current CFD code also takes arguments from the command line, so I prefer I can set the types from the source code directly, so it does not mess around with arguments of the CFD code. Is there a way I can do this? With respect to "MatSetValuesCOO()", I am new to this, and was using the old way to set the values. For MatSetValuesCOO, it requires an argument "coo_v", how does it work if I want to set the values in the GPU directly? say, coo_v has the type of PetscScalar, do I need to create coo_v and assign its values directly in the GPU and then give it to MatSetValuesCOO? Thanks for your help in advance. Best regards, Feng ________________________________ From: Matthew Knepley Sent: 11 February 2026 16:32 To: feng wang Cc: Junchao Zhang ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU On Wed, Feb 11, 2026 at 10:58?AM feng wang > wrote: Hi Mat, Thanks for your reply. Maybe I am overthinking it. ksp/ex15 works fine with GPUs. To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device. Below is a snippet of my current petsc implementation. Suppose I have: ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr); This is the problem. Right now VecGhost hardcodes the use of VECSEQ and VECMPI. This is not necessary, and the local and global representations could indeed be device types. Is ghost necessary right now? ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr); //duplicate ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr); //create preconditioning matrix ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr); I would not create the specific type. Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc. If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device? Below is how I assign values for the matrix: nnz=0; for(jv=0; jv> Sent: 11 February 2026 13:42 To: feng wang > Cc: Junchao Zhang >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU On Wed, Feb 11, 2026 at 5:55?AM feng wang > wrote: Hi Junchao, Thanks for your reply. Probably I did not phrase it in a clear way. I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code. For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system. The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal? I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work? Thanks, Matt BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 11 February 2026 3:00 To: feng wang > Cc: petsc-users at mcs.anl.gov >; Barry Smith > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Sorry, I don't understand your question. What blocks you from running your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about the algorithms. --Junchao Zhang On Tue, Feb 10, 2026 at 3:57?PM feng wang > wrote: Hi Junchao, I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda. It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs. I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver. My question is: my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 23:18 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hi Feng, At the first step, you don't need to change your CPU implementation. Then do profiling to see where it is worth putting your effort. Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage. Thanks! --Junchao Zhang On Mon, Feb 9, 2026 at 4:31?PM feng wang > wrote: Hi Junchao, Many thanks for your reply. This is great! Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 1:55 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hello Feng, It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. Thanks! --Junchao Zhang On Sun, Feb 8, 2026 at 5:46?PM feng wang > wrote: Dear All, I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? Many thanks for your help in advance, Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ePnxLVQy-xP1mJxzbezQDrvkZtZS5AvNLT229xh1bxbQNfkSdt2eVXK1-V29JwXphj2aVwaPhe8pWF4MJ_m2wZZbkQ$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!ePnxLVQy-xP1mJxzbezQDrvkZtZS5AvNLT229xh1bxbQNfkSdt2eVXK1-V29JwXphj2aVwaPhe8pWF4MJ_m2wZZbkQ$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Feb 12 17:36:48 2026 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Feb 2026 18:36:48 -0500 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: On Thu, Feb 12, 2026 at 6:14?PM feng wang wrote: > Hi Mat, > > Thanks for your reply. > > For "VecCreateGhostBlock", The CPU version runs in parallel, if we are > solving Ax=b, so it also stores the halos in x and b for each partition. > This is how my old implementation was done. If the current GPU > implementation does not support halos, I can stick to one GPU for the > moment. or is there a way around this? > There is a way around it. We have an open Issue. Someone needs to allow the vectors to be created with another type. It is not hard, it just takes time. I can do it starting the middle of March if you need it quickly. > Regarding to "Rather you create a generic Mat, set the blocksize, and then > MatSetFromOptions(). Then you can set the type from the command line, like > baij or aijcusparse, etc.", my current CFD code also takes arguments from > the command line, so I prefer I can set the types from the source code > directly, so it does not mess around with arguments of the CFD code. Is > there a way I can do this? > 1) You can do that using MatCreate() MatSetSizes() MatSetBlockSize() MatSetType() but, I still don't think you should do that. 2) You can provide PETSc options from any source you want using PetscOptionsSetValue() and PetscOptionsInsertString(), so you can manage them however you want. > With respect to "MatSetValuesCOO()", I am new to this, and was using the > old way to set the values. For MatSetValuesCOO, it requires an argument > "coo_v", how does it work if I want to set the values in the GPU directly? > say, coo_v has the type of PetscScalar, do I need to create coo_v and > assign its values directly in the GPU and then give it to MatSetValuesCOO? > Yes. COO is much more efficient on the GPU than calling SetValues() individually. GPUs have horrible latency and hate branching. This is about the only way to make them competitive with CPUs for building operators. Thanks, Matt > Thanks for your help in advance. > > Best regards, > Feng > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 11 February 2026 16:32 > *To:* feng wang > *Cc:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > On Wed, Feb 11, 2026 at 10:58?AM feng wang wrote: > > Hi Mat, > > Thanks for your reply. Maybe I am overthinking it. > > ksp/ex15 works fine with GPUs. > > To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how > Petsc handle the memory in the host and the device. > > Below is a snippet of my current petsc implementation. Suppose I have: > > ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, > blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr); > > > This is the problem. Right now VecGhost hardcodes the use of VECSEQ and > VECMPI. This is not necessary, and the local and global representations > could indeed be device types. Is ghost necessary right now? > > > ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr); > > //duplicate > ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr); > > //create preconditioning matrix > ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, > nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, > maxneig, NULL, maxneig, NULL, &petsc_A_pre); > CHKERRQ(ierr); > > > I would not create the specific type. Rather you create a generic Mat, set > the blocksize, and then MatSetFromOptions(). Then you can set the type from > the command line, like baij or aijcusparse, etc. > > > *If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and > vectors directly created in the device?* > > Below is how I assign values for the matrix: > > nnz=0; > for(jv=0; jv { > for(iv=0; iv { > values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the > left hand side is [I/dt + (-J)] > nnz++; > } > } > > idxm[0] = ig_mat[iql]; > idxn[0] = ig_mat[iqr]; > ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values, > ADD_VALUES); > CHKERRQ(ierr); > } > > *Does petsc first set the value in the host and copy it to the device or > the value is directly assigned in the device. in the 2nd case, I would need > change my code a bit, since I need to make sure the data is in the device > in the first place.* > > > Yes, you would need to set the values on device for maximum efficiency > (although I would try it out with CPU construction first). You can do this > best on the GPU using MatSetValuesCOO(). > > Thanks, > > Matt > > > Thanks, > Feng > > > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 11 February 2026 13:42 > *To:* feng wang > *Cc:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > On Wed, Feb 11, 2026 at 5:55?AM feng wang wrote: > > Hi Junchao, > > Thanks for your reply. Probably I did not phrase it in a clear way. > > I am using openACC to port the CFD code to the GPU, so the CPU and the GPU > version essentially share the same source code. For the original CPU > version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve > the sparse linear system. > > The current GPU version of the code only port the Jacobi solver to the > GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to > make to the existing CPU version of GMRES+ILU(0) to achieve this goal? > > > I think what Junchao is saying, is that if you use the GPU vec and mat > types, this should be running on the GPU already. Does that not work? > > Thanks, > > Matt > > > BTW: For performance the GPU version of the CFD code has minimum > communication between the CPU and GPU, so for Ax=b, A, x and b are created > in the GPU directly > > Thanks, > Feng > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* 11 February 2026 3:00 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov ; Barry Smith < > bsmith at petsc.dev> > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Sorry, I don't understand your question. What blocks you from running > your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about > the algorithms. > > --Junchao Zhang > > > On Tue, Feb 10, 2026 at 3:57?PM feng wang wrote: > > Hi Junchao, > > I have managed to configure Petsc for GPU, also managed to run ksp/ex15 > using -mat_type aijcusparse -vec_type cuda. It seems runs much faster > compared to the scenario if I don't use " -mat_type aijcusparse -vec_type > cuda". so I believe it runs okay for GPUs. > > I have an existing CFD code that runs natively on GPUs. so all the data is > offloaded to GPU at the beginning and some data are copied back to the cpu > at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for > the implicit solver. *My question is: my code also has a GMRES+ILU(0) > implemented with Petsc but it only runs on cpus (which I implemented a few > years ago). How can I replace the existing Newton-Jacobi (which runs in > GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give > some advice?* > > Thanks, > Feng > > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 23:18 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hi Feng, > At the first step, you don't need to change your CPU implementation. > Then do profiling to see where it is worth putting your effort. Maybe you > need to assemble your matrices and vectors on GPUs too, but decide that at > a later stage. > > Thanks! > --Junchao Zhang > > > On Mon, Feb 9, 2026 at 4:31?PM feng wang wrote: > > Hi Junchao, > > Many thanks for your reply. > > This is great! Do I need to change anything for my current CPU > implementation? or I just link to a version of Petsc that is configured > with cuda and make sure the necessary data are copied to the "device", > then Petsc will do the rest magic for me? > > Thanks, > Feng > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 1:55 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hello Feng, > It is possible to run GMRES with ILU(0) on GPUs. You may need to > configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with > extra --download-kokkos --download-kokkos-kernels). Then run with > -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. > But triangular solve is not GPU friendly and the performance might be > poor. But you should try it, I think. > > Thanks! > --Junchao Zhang > > On Sun, Feb 8, 2026 at 5:46?PM feng wang wrote: > > Dear All, > > I have an existing implementation of GMRES with ILU(0), it works well for > cpu now. I went through the Petsc documentation, it seems Petsc has some > support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? > > Many thanks for your help in advance, > Feng > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUivJyFt7C$ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUivJyFt7C$ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cXpmVvZxc-TxBksnlk_2BJ9ShOVFrvTVXFQ4MoNkrD3Ah0fPbqnx9Qw4ZAwScITqFUlFyXEwxtSUivJyFt7C$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Feb 12 17:44:57 2026 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 12 Feb 2026 17:44:57 -0600 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: On Thu, Feb 12, 2026 at 5:14?PM feng wang wrote: > Hi Mat, > > Thanks for your reply. > > For "VecCreateGhostBlock", The CPU version runs in parallel, if we are > solving Ax=b, so it also stores the halos in x and b for each partition. > This is how my old implementation was done. If the current GPU > implementation does not support halos, I can stick to one GPU for the > moment. or is there a way around this? > PETSc currently doesn't support ghost vectors on device, though we plan to support it. > > Regarding to "Rather you create a generic Mat, set the blocksize, and then > MatSetFromOptions(). Then you can set the type from the command line, like > baij or aijcusparse, etc.", my current CFD code also takes arguments from > the command line, so I prefer I can set the types from the source code > directly, so it does not mess around with arguments of the CFD code. Is > there a way I can do this? > petsc asscpets options from three sources: 1) command line; 2) the .petscrc file; 3) the PETSC_OPTIONS env var. You can use the latter two approaches. > > With respect to "MatSetValuesCOO()", I am new to this, and was using the > old way to set the values. For MatSetValuesCOO, it requires an argument > "coo_v", how does it work if I want to set the values in the GPU directly? > say, coo_v has the type of PetscScalar, do I need to create coo_v and > assign its values directly in the GPU and then give it to MatSetValuesCOO? > COO routines are used to assemble the matrix on device. If you compute matrix entries on host, you don't need COO, otherwise you need. In MatSetValuesCOO(A, coo_v, ..), coo_v can be a device pointer, however in MatSetValues/MatSetValuesBlocked(A, ..., v, ..), v must be a host pointer. > > Thanks for your help in advance. > > Best regards, > Feng > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 11 February 2026 16:32 > *To:* feng wang > *Cc:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > On Wed, Feb 11, 2026 at 10:58?AM feng wang wrote: > > Hi Mat, > > Thanks for your reply. Maybe I am overthinking it. > > ksp/ex15 works fine with GPUs. > > To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how > Petsc handle the memory in the host and the device. > > Below is a snippet of my current petsc implementation. Suppose I have: > > ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, > blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr); > > > This is the problem. Right now VecGhost hardcodes the use of VECSEQ and > VECMPI. This is not necessary, and the local and global representations > could indeed be device types. Is ghost necessary right now? > > > ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr); > > //duplicate > ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr); > > //create preconditioning matrix > ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, > nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, > maxneig, NULL, maxneig, NULL, &petsc_A_pre); > CHKERRQ(ierr); > > > I would not create the specific type. Rather you create a generic Mat, set > the blocksize, and then MatSetFromOptions(). Then you can set the type from > the command line, like baij or aijcusparse, etc. > > > *If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and > vectors directly created in the device?* > > Below is how I assign values for the matrix: > > nnz=0; > for(jv=0; jv { > for(iv=0; iv { > values[nnz] = -1*sign*blk.jac[jv][iv]; //"-1" because the > left hand side is [I/dt + (-J)] > nnz++; > } > } > > idxm[0] = ig_mat[iql]; > idxn[0] = ig_mat[iqr]; > ierr = MatSetValuesBlocked(matrix, 1, idxm, 1, idxn, values, > ADD_VALUES); > CHKERRQ(ierr); > } > > *Does petsc first set the value in the host and copy it to the device or > the value is directly assigned in the device. in the 2nd case, I would need > change my code a bit, since I need to make sure the data is in the device > in the first place.* > > > Yes, you would need to set the values on device for maximum efficiency > (although I would try it out with CPU construction first). You can do this > best on the GPU using MatSetValuesCOO(). > > Thanks, > > Matt > > > Thanks, > Feng > > > > ------------------------------ > *From:* Matthew Knepley > *Sent:* 11 February 2026 13:42 > *To:* feng wang > *Cc:* Junchao Zhang ; petsc-users at mcs.anl.gov < > petsc-users at mcs.anl.gov> > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > On Wed, Feb 11, 2026 at 5:55?AM feng wang wrote: > > Hi Junchao, > > Thanks for your reply. Probably I did not phrase it in a clear way. > > I am using openACC to port the CFD code to the GPU, so the CPU and the GPU > version essentially share the same source code. For the original CPU > version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve > the sparse linear system. > > The current GPU version of the code only port the Jacobi solver to the > GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to > make to the existing CPU version of GMRES+ILU(0) to achieve this goal? > > > I think what Junchao is saying, is that if you use the GPU vec and mat > types, this should be running on the GPU already. Does that not work? > > Thanks, > > Matt > > > BTW: For performance the GPU version of the CFD code has minimum > communication between the CPU and GPU, so for Ax=b, A, x and b are created > in the GPU directly > > Thanks, > Feng > > > ------------------------------ > *From:* Junchao Zhang > *Sent:* 11 February 2026 3:00 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov ; Barry Smith < > bsmith at petsc.dev> > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Sorry, I don't understand your question. What blocks you from running > your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about > the algorithms. > > --Junchao Zhang > > > On Tue, Feb 10, 2026 at 3:57?PM feng wang wrote: > > Hi Junchao, > > I have managed to configure Petsc for GPU, also managed to run ksp/ex15 > using -mat_type aijcusparse -vec_type cuda. It seems runs much faster > compared to the scenario if I don't use " -mat_type aijcusparse -vec_type > cuda". so I believe it runs okay for GPUs. > > I have an existing CFD code that runs natively on GPUs. so all the data is > offloaded to GPU at the beginning and some data are copied back to the cpu > at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for > the implicit solver. *My question is: my code also has a GMRES+ILU(0) > implemented with Petsc but it only runs on cpus (which I implemented a few > years ago). How can I replace the existing Newton-Jacobi (which runs in > GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give > some advice?* > > Thanks, > Feng > > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 23:18 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hi Feng, > At the first step, you don't need to change your CPU implementation. > Then do profiling to see where it is worth putting your effort. Maybe you > need to assemble your matrices and vectors on GPUs too, but decide that at > a later stage. > > Thanks! > --Junchao Zhang > > > On Mon, Feb 9, 2026 at 4:31?PM feng wang wrote: > > Hi Junchao, > > Many thanks for your reply. > > This is great! Do I need to change anything for my current CPU > implementation? or I just link to a version of Petsc that is configured > with cuda and make sure the necessary data are copied to the "device", > then Petsc will do the rest magic for me? > > Thanks, > Feng > ------------------------------ > *From:* Junchao Zhang > *Sent:* 09 February 2026 1:55 > *To:* feng wang > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Port existing GMRES+ILU(0) implementation to > GPU > > Hello Feng, > It is possible to run GMRES with ILU(0) on GPUs. You may need to > configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with > extra --download-kokkos --download-kokkos-kernels). Then run with > -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. > But triangular solve is not GPU friendly and the performance might be > poor. But you should try it, I think. > > Thanks! > --Junchao Zhang > > On Sun, Feb 8, 2026 at 5:46?PM feng wang wrote: > > Dear All, > > I have an existing implementation of GMRES with ILU(0), it works well for > cpu now. I went through the Petsc documentation, it seems Petsc has some > support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? > > Many thanks for your help in advance, > Feng > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhX6QVNE-$ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!aQna-PwsGB52rzYYunZ3ZnWtNpIxYxLs1D_gJ0FyjNNa3JmUvsBRCQE2iu1aLTjOhrj46zUC6L43C5q1cowPhX6QVNE-$ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.scotto at irt-saintexupery.com Fri Feb 13 01:14:47 2026 From: alexandre.scotto at irt-saintexupery.com (SCOTTO Alexandre) Date: Fri, 13 Feb 2026 07:14:47 +0000 Subject: [petsc-users] Scalability In-Reply-To: References: Message-ID: <3086a5372d334aadb98092a518da3328@irt-saintexupery.com> Dear Matthew, Barry, Thank you for your answers. The question of the problem size was part of my concern regarding the relevance of the quick test setup, I am going to increase the size in the suggested way to see the difference. Regarding the sparsity pattern, I assume that the more ?diagonal? the matrix is the better the speedup, is this is a correct rule of thumb? Best regards, Alexandre. De : Barry Smith Envoy? : jeudi 12 f?vrier 2026 17:10 ? : Matthew Knepley Cc : SCOTTO Alexandre ; petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Scalability The problem size is also very small. Typically one cannot get speedup when the number of variables per MPI rank is below on the order of 10,000. In your 64 process case you only have 390 variables. I would be stunned with any kind of speedup for such sizes. Run a problem at least 10 times bigger, better yet 20 times. On Feb 12, 2026, at 9:00?AM, Matthew Knepley > wrote: On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users > wrote: Dear PETSc community, I have conducted a quick strong scalability-like test on direct and adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, distributed over 2, 4, ?, 32 and 64 processes and the results I obtained were not so great. I am not very confident in my setup, so a as a matter of reference, is there any available results on weak and strong scalability of PETSc.Mat mult() and multTranspose() operations? 1. This behavior depends on available bandwidth, not on cores. Do you know the bandwidth for your configurations? 2. Strong scaling depends heavily on matrix sparsity. If inevitably declines, but slower with more work to do. 3. We published a paper on performance recently: https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!eV_IiuO3vH2ZpW_dBJVlJMaQrV4cgKIZY9zbKqT1jv_KxjWYt5QxPk9CZ6YQsDkA3nBdSIOKvGAf0E55rPv1G1x0j_G4MnIEMljD3zuFSA$ Thanks, Matt Best regards, Alexandre. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eV_IiuO3vH2ZpW_dBJVlJMaQrV4cgKIZY9zbKqT1jv_KxjWYt5QxPk9CZ6YQsDkA3nBdSIOKvGAf0E55rPv1G1x0j_G4MnIEMlgIc0eMgg$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Fri Feb 13 04:42:42 2026 From: snailsoar at hotmail.com (feng wang) Date: Fri, 13 Feb 2026 10:42:42 +0000 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hi Mat, Thanks for your reply. I roughly know what to do now. I will give it a try. If I have some issues, I will come back to this thread. Thanks, Feng ________________________________ From: Matthew Knepley Sent: 12 February 2026 23:36 To: feng wang Cc: Junchao Zhang ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU On Thu, Feb 12, 2026 at 6:14?PM feng wang > wrote: Hi Mat, Thanks for your reply. For "VecCreateGhostBlock", The CPU version runs in parallel, if we are solving Ax=b, so it also stores the halos in x and b for each partition. This is how my old implementation was done. If the current GPU implementation does not support halos, I can stick to one GPU for the moment. or is there a way around this? There is a way around it. We have an open Issue. Someone needs to allow the vectors to be created with another type. It is not hard, it just takes time. I can do it starting the middle of March if you need it quickly. Regarding to "Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.", my current CFD code also takes arguments from the command line, so I prefer I can set the types from the source code directly, so it does not mess around with arguments of the CFD code. Is there a way I can do this? 1) You can do that using MatCreate() MatSetSizes() MatSetBlockSize() MatSetType() but, I still don't think you should do that. 2) You can provide PETSc options from any source you want using PetscOptionsSetValue() and PetscOptionsInsertString(), so you can manage them however you want. With respect to "MatSetValuesCOO()", I am new to this, and was using the old way to set the values. For MatSetValuesCOO, it requires an argument "coo_v", how does it work if I want to set the values in the GPU directly? say, coo_v has the type of PetscScalar, do I need to create coo_v and assign its values directly in the GPU and then give it to MatSetValuesCOO? Yes. COO is much more efficient on the GPU than calling SetValues() individually. GPUs have horrible latency and hate branching. This is about the only way to make them competitive with CPUs for building operators. Thanks, Matt Thanks for your help in advance. Best regards, Feng ________________________________ From: Matthew Knepley > Sent: 11 February 2026 16:32 To: feng wang > Cc: Junchao Zhang >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU On Wed, Feb 11, 2026 at 10:58?AM feng wang > wrote: Hi Mat, Thanks for your reply. Maybe I am overthinking it. ksp/ex15 works fine with GPUs. To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device. Below is a snippet of my current petsc implementation. Suppose I have: ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr); This is the problem. Right now VecGhost hardcodes the use of VECSEQ and VECMPI. This is not necessary, and the local and global representations could indeed be device types. Is ghost necessary right now? ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr); //duplicate ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr); //create preconditioning matrix ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr); I would not create the specific type. Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc. If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device? Below is how I assign values for the matrix: nnz=0; for(jv=0; jv> Sent: 11 February 2026 13:42 To: feng wang > Cc: Junchao Zhang >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU On Wed, Feb 11, 2026 at 5:55?AM feng wang > wrote: Hi Junchao, Thanks for your reply. Probably I did not phrase it in a clear way. I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code. For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system. The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal? I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work? Thanks, Matt BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 11 February 2026 3:00 To: feng wang > Cc: petsc-users at mcs.anl.gov >; Barry Smith > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Sorry, I don't understand your question. What blocks you from running your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about the algorithms. --Junchao Zhang On Tue, Feb 10, 2026 at 3:57?PM feng wang > wrote: Hi Junchao, I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda. It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs. I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver. My question is: my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 23:18 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hi Feng, At the first step, you don't need to change your CPU implementation. Then do profiling to see where it is worth putting your effort. Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage. Thanks! --Junchao Zhang On Mon, Feb 9, 2026 at 4:31?PM feng wang > wrote: Hi Junchao, Many thanks for your reply. This is great! Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 1:55 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hello Feng, It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. Thanks! --Junchao Zhang On Sun, Feb 8, 2026 at 5:46?PM feng wang > wrote: Dear All, I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? Many thanks for your help in advance, Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!b2llbQDqs4ecGK2tBcTFeOgmdFfE1qYGcreiV-BROsWCYjew0pnZo4AIUuIuguF36WWhfh1rsSV51VlrPnLsP7ievw$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!b2llbQDqs4ecGK2tBcTFeOgmdFfE1qYGcreiV-BROsWCYjew0pnZo4AIUuIuguF36WWhfh1rsSV51VlrPnLsP7ievw$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!b2llbQDqs4ecGK2tBcTFeOgmdFfE1qYGcreiV-BROsWCYjew0pnZo4AIUuIuguF36WWhfh1rsSV51VlrPnLsP7ievw$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From snailsoar at hotmail.com Fri Feb 13 04:49:03 2026 From: snailsoar at hotmail.com (feng wang) Date: Fri, 13 Feb 2026 10:49:03 +0000 Subject: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU In-Reply-To: References: Message-ID: Hi Junchao, Thanks for your reply. It is very helpful. I roughly know what to do now. One more question, Does this mean that for the moment I can only use a GPU? If so, I can live with it for the moment. Thanks, Feng ________________________________ From: Junchao Zhang Sent: 12 February 2026 23:44 To: feng wang Cc: Matthew Knepley ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU On Thu, Feb 12, 2026 at 5:14?PM feng wang > wrote: Hi Mat, Thanks for your reply. For "VecCreateGhostBlock", The CPU version runs in parallel, if we are solving Ax=b, so it also stores the halos in x and b for each partition. This is how my old implementation was done. If the current GPU implementation does not support halos, I can stick to one GPU for the moment. or is there a way around this? PETSc currently doesn't support ghost vectors on device, though we plan to support it. Regarding to "Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc.", my current CFD code also takes arguments from the command line, so I prefer I can set the types from the source code directly, so it does not mess around with arguments of the CFD code. Is there a way I can do this? petsc asscpets options from three sources: 1) command line; 2) the .petscrc file; 3) the PETSC_OPTIONS env var. You can use the latter two approaches. With respect to "MatSetValuesCOO()", I am new to this, and was using the old way to set the values. For MatSetValuesCOO, it requires an argument "coo_v", how does it work if I want to set the values in the GPU directly? say, coo_v has the type of PetscScalar, do I need to create coo_v and assign its values directly in the GPU and then give it to MatSetValuesCOO? COO routines are used to assemble the matrix on device. If you compute matrix entries on host, you don't need COO, otherwise you need. In MatSetValuesCOO(A, coo_v, ..), coo_v can be a device pointer, however in MatSetValues/MatSetValuesBlocked(A, ..., v, ..), v must be a host pointer. Thanks for your help in advance. Best regards, Feng ________________________________ From: Matthew Knepley > Sent: 11 February 2026 16:32 To: feng wang > Cc: Junchao Zhang >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU On Wed, Feb 11, 2026 at 10:58?AM feng wang > wrote: Hi Mat, Thanks for your reply. Maybe I am overthinking it. ksp/ex15 works fine with GPUs. To port my existing GMRES+ILU(0) to GPU, What i am not very clear is how Petsc handle the memory in the host and the device. Below is a snippet of my current petsc implementation. Suppose I have: ierr = VecCreateGhostBlock(*A_COMM_WORLD, blocksize, blocksize*nlocal, PETSC_DECIDE ,nghost, ighost, &petsc_dcsv); CHKERRQ(ierr); This is the problem. Right now VecGhost hardcodes the use of VECSEQ and VECMPI. This is not necessary, and the local and global representations could indeed be device types. Is ghost necessary right now? ierr = VecSetFromOptions(petsc_dcsv);CHKERRQ(ierr); //duplicate ierr = VecDuplicate(petsc_dcsv, &petsc_rhs);CHKERRQ(ierr); //create preconditioning matrix ierr = MatCreateBAIJ(*A_COMM_WORLD, blocksize, nlocal*blocksize, nlocal*blocksize, PETSC_DETERMINE, PETSC_DETERMINE, maxneig, NULL, maxneig, NULL, &petsc_A_pre); CHKERRQ(ierr); I would not create the specific type. Rather you create a generic Mat, set the blocksize, and then MatSetFromOptions(). Then you can set the type from the command line, like baij or aijcusparse, etc. If I use "-mat_type aijcusparse -vec_type cuda". Are these matrices and vectors directly created in the device? Below is how I assign values for the matrix: nnz=0; for(jv=0; jv> Sent: 11 February 2026 13:42 To: feng wang > Cc: Junchao Zhang >; petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU On Wed, Feb 11, 2026 at 5:55?AM feng wang > wrote: Hi Junchao, Thanks for your reply. Probably I did not phrase it in a clear way. I am using openACC to port the CFD code to the GPU, so the CPU and the GPU version essentially share the same source code. For the original CPU version, it uses Jacobi (hand-coded) or GMRES+ILU(0) (with pestc) to solve the sparse linear system. The current GPU version of the code only port the Jacobi solver to the GPU, now I want to port GMRES+ILU(0) to the GPU. What changes do I need to make to the existing CPU version of GMRES+ILU(0) to achieve this goal? I think what Junchao is saying, is that if you use the GPU vec and mat types, this should be running on the GPU already. Does that not work? Thanks, Matt BTW: For performance the GPU version of the CFD code has minimum communication between the CPU and GPU, so for Ax=b, A, x and b are created in the GPU directly Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 11 February 2026 3:00 To: feng wang > Cc: petsc-users at mcs.anl.gov >; Barry Smith > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Sorry, I don't understand your question. What blocks you from running your GMRES+ILU(0) on GPUs? I Cc'ed Barry, who knows better about the algorithms. --Junchao Zhang On Tue, Feb 10, 2026 at 3:57?PM feng wang > wrote: Hi Junchao, I have managed to configure Petsc for GPU, also managed to run ksp/ex15 using -mat_type aijcusparse -vec_type cuda. It seems runs much faster compared to the scenario if I don't use " -mat_type aijcusparse -vec_type cuda". so I believe it runs okay for GPUs. I have an existing CFD code that runs natively on GPUs. so all the data is offloaded to GPU at the beginning and some data are copied back to the cpu at the very end. It got a hand-coded Newton-Jacobi that runs in GPUs for the implicit solver. My question is: my code also has a GMRES+ILU(0) implemented with Petsc but it only runs on cpus (which I implemented a few years ago). How can I replace the existing Newton-Jacobi (which runs in GPUs) with GMRES+ILU(0) which should run in GPUs. Could you please give some advice? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 23:18 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hi Feng, At the first step, you don't need to change your CPU implementation. Then do profiling to see where it is worth putting your effort. Maybe you need to assemble your matrices and vectors on GPUs too, but decide that at a later stage. Thanks! --Junchao Zhang On Mon, Feb 9, 2026 at 4:31?PM feng wang > wrote: Hi Junchao, Many thanks for your reply. This is great! Do I need to change anything for my current CPU implementation? or I just link to a version of Petsc that is configured with cuda and make sure the necessary data are copied to the "device", then Petsc will do the rest magic for me? Thanks, Feng ________________________________ From: Junchao Zhang > Sent: 09 February 2026 1:55 To: feng wang > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Port existing GMRES+ILU(0) implementation to GPU Hello Feng, It is possible to run GMRES with ILU(0) on GPUs. You may need to configure PETSc with CUDA (--with-cuda --with-cudac=nvcc) or Kokkos (with extra --download-kokkos --download-kokkos-kernels). Then run with -mat_type {aijcusparse or aijkokkos} -vec_type {cuda or kokkos}. But triangular solve is not GPU friendly and the performance might be poor. But you should try it, I think. Thanks! --Junchao Zhang On Sun, Feb 8, 2026 at 5:46?PM feng wang > wrote: Dear All, I have an existing implementation of GMRES with ILU(0), it works well for cpu now. I went through the Petsc documentation, it seems Petsc has some support for GPUs. is it possible for me to run GMRES with ILU(0) in GPUs? Many thanks for your help in advance, Feng -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fSHu6CilF6ZFnmEGCN9QPpP8ryHsYCNhI3ZnggfkMJpGA59zhN9K_-HF3K5Hu_SCj_Yh321ySgw1ltFU981pvXINSw$ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fSHu6CilF6ZFnmEGCN9QPpP8ryHsYCNhI3ZnggfkMJpGA59zhN9K_-HF3K5Hu_SCj_Yh321ySgw1ltFU981pvXINSw$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Feb 13 07:43:23 2026 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 13 Feb 2026 08:43:23 -0500 Subject: [petsc-users] Scalability In-Reply-To: <3086a5372d334aadb98092a518da3328@irt-saintexupery.com> References: <3086a5372d334aadb98092a518da3328@irt-saintexupery.com> Message-ID: On Fri, Feb 13, 2026 at 2:14?AM SCOTTO Alexandre < alexandre.scotto at irt-saintexupery.com> wrote: > Dear Matthew, Barry, > > > > Thank you for your answers. The question of the problem size was part of > my concern regarding the relevance of the quick test setup, I am going to > increase the size in the suggested way to see the difference. > > > > Regarding the sparsity pattern, I assume that the more ?diagonal? the > matrix is the better the speedup, is this is a correct rule of thumb? > What I was referring to was the density. The pattern has implications for the cache efficiency. Here is a good paper explaining what is going on: https://urldefense.us/v3/__https://d1wqtxts1xzle7.cloudfront.net/40652293/Toward_Realistic_Performance_Bounds_for_20151205-5192-8jxqcg-libre.pdf?1449311168=&response-content-disposition=inline*3B*filename*3DToward_realistic_performance_bounds_for.pdf&Expires=1770993790&Signature=ITtMQ-YNb5x*ZZnYof32wXbghpN9y5Bf50*ioozZi6O7GXATT4e4wApHuDX0qsrED1Pv--bv*rXFkMz9BpeGHP491X-qcDdKbRNxp7tg2zhKMwTeGpzzUCDV6UGjWcof39UCWzBSgNDhC35BVObFeDelhewIvn0dNI9O-Msr3wOjO51yDYzh1KJO-oTZ6mIDIYDL8S8ioLhnL0z6ec-3dQOmdDJfV6Vty3gkMJAjAhkhUNst2JEqIuRuygYGizCuVhYksH3p-51et7FWtu043MTmBO6lRCbKodbWMGBXvKe8Kox03NDQ2fs5-ClAWTwjd6VTiGpPq6PxP0a9UPvWZQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA__;JSslfn5-!!G_uCfscf7eWS!Zyta7xbfKs3GEHTK5_8CF8c_Fz0cB2U8ek1mg8Vb0LsqwXKq9nGutXOd3rCZw5mI5nmdiPAXXPUWo1vSYcsq$ Thanks, Matt > Best regards, > > Alexandre. > > > > *De :* Barry Smith > *Envoy? :* jeudi 12 f?vrier 2026 17:10 > *? :* Matthew Knepley > *Cc :* SCOTTO Alexandre ; > petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] Scalability > > > > > > The problem size is also very small. Typically one cannot get speedup > when the number of variables per MPI rank is below on the order of 10,000. > In your 64 process case you only have 390 variables. I would be stunned > with any kind of speedup for such sizes. Run a problem at least 10 times > bigger, better yet 20 times. > > > > > > On Feb 12, 2026, at 9:00?AM, Matthew Knepley wrote: > > > > On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Dear PETSc community, > > > > I have conducted a quick strong scalability-like test on direct and > adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, > distributed over 2, 4, ?, 32 and 64 processes and the results I obtained > were not so great. > > I am not very confident in my setup, so a as a matter of reference, is > there any available results on weak and strong scalability of PETSc.Mat > mult() and multTranspose() operations? > > > > 1. This behavior depends on available bandwidth, not on cores. Do you know > the bandwidth for your configurations? > > > > 2. Strong scaling depends heavily on matrix sparsity. If inevitably > declines, but slower with more work to do. > > > > 3. We published a paper on performance recently: > https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!Zyta7xbfKs3GEHTK5_8CF8c_Fz0cB2U8ek1mg8Vb0LsqwXKq9nGutXOd3rCZw5mI5nmdiPAXXPUWo5KkdLTl$ > > > > > Thanks, > > > > Matt > > > > Best regards, > > Alexandre. > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zyta7xbfKs3GEHTK5_8CF8c_Fz0cB2U8ek1mg8Vb0LsqwXKq9nGutXOd3rCZw5mI5nmdiPAXXPUWo_B6tlSa$ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Zyta7xbfKs3GEHTK5_8CF8c_Fz0cB2U8ek1mg8Vb0LsqwXKq9nGutXOd3rCZw5mI5nmdiPAXXPUWo_B6tlSa$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Feb 13 09:11:13 2026 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 13 Feb 2026 10:11:13 -0500 Subject: [petsc-users] Scalability In-Reply-To: References: <3086a5372d334aadb98092a518da3328@irt-saintexupery.com> Message-ID: <24761395-691B-429F-BB3A-FE9E1CDC4760@petsc.dev> Yes the more the values are near the diagonal likely the better scaling. Also the number of nonzeros per row, the higher that number the better the scaling. Barry > On Feb 13, 2026, at 8:43?AM, Matthew Knepley wrote: > > On Fri, Feb 13, 2026 at 2:14?AM SCOTTO Alexandre > wrote: >> Dear Matthew, Barry, >> >> >> >> Thank you for your answers. The question of the problem size was part of my concern regarding the relevance of the quick test setup, I am going to increase the size in the suggested way to see the difference. >> >> >> >> Regarding the sparsity pattern, I assume that the more ?diagonal? the matrix is the better the speedup, is this is a correct rule of thumb? >> > > What I was referring to was the density. The pattern has implications for the cache efficiency. Here is a good paper explaining what is going on: > > https://urldefense.us/v3/__https://d1wqtxts1xzle7.cloudfront.net/40652293/Toward_Realistic_Performance_Bounds_for_20151205-5192-8jxqcg-libre.pdf?1449311168=&response-content-disposition=inline*3B*filename*3DToward_realistic_performance_bounds_for.pdf&Expires=1770993790&Signature=ITtMQ-YNb5x*ZZnYof32wXbghpN9y5Bf50*ioozZi6O7GXATT4e4wApHuDX0qsrED1Pv--bv*rXFkMz9BpeGHP491X-qcDdKbRNxp7tg2zhKMwTeGpzzUCDV6UGjWcof39UCWzBSgNDhC35BVObFeDelhewIvn0dNI9O-Msr3wOjO51yDYzh1KJO-oTZ6mIDIYDL8S8ioLhnL0z6ec-3dQOmdDJfV6Vty3gkMJAjAhkhUNst2JEqIuRuygYGizCuVhYksH3p-51et7FWtu043MTmBO6lRCbKodbWMGBXvKe8Kox03NDQ2fs5-ClAWTwjd6VTiGpPq6PxP0a9UPvWZQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA__;JSslfn5-!!G_uCfscf7eWS!axZVqLk0h37e_aRGUtvMl_doja_7Vw4wdRhxvWhzyvFMozVXizBarj_RQS-_qxWkNTGjz50ihtujnoqWm896D_g$ > > Thanks, > > Matt > >> Best regards, >> >> Alexandre. >> >> >> >> De : Barry Smith > >> Envoy? : jeudi 12 f?vrier 2026 17:10 >> ? : Matthew Knepley > >> Cc : SCOTTO Alexandre >; petsc-users at mcs.anl.gov >> Objet : Re: [petsc-users] Scalability >> >> >> >> >> >> The problem size is also very small. Typically one cannot get speedup when the number of variables per MPI rank is below on the order of 10,000. In your 64 process case you only have 390 variables. I would be stunned with any kind of speedup for such sizes. Run a problem at least 10 times bigger, better yet 20 times. >> >> >> >> >> >> >> On Feb 12, 2026, at 9:00?AM, Matthew Knepley > wrote: >> >> >> >> On Thu, Feb 12, 2026 at 6:48?AM SCOTTO Alexandre via petsc-users > wrote: >> >> Dear PETSc community, >> >> >> >> I have conducted a quick strong scalability-like test on direct and adjoint matrix-vector product with a 25,000 x 25,000 sparse matrix, distributed over 2, 4, ?, 32 and 64 processes and the results I obtained were not so great. >> >> I am not very confident in my setup, so a as a matter of reference, is there any available results on weak and strong scalability of PETSc.Mat mult() and multTranspose() operations? >> >> >> >> 1. This behavior depends on available bandwidth, not on cores. Do you know the bandwidth for your configurations? >> >> >> >> 2. Strong scaling depends heavily on matrix sparsity. If inevitably declines, but slower with more work to do. >> >> >> >> 3. We published a paper on performance recently: https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!axZVqLk0h37e_aRGUtvMl_doja_7Vw4wdRhxvWhzyvFMozVXizBarj_RQS-_qxWkNTGjz50ihtujnoqWSwoa0nA$ >> >> >> Thanks, >> >> >> >> Matt >> >> >> >> Best regards, >> >> Alexandre. >> >> >> >> >> >> -- >> >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!axZVqLk0h37e_aRGUtvMl_doja_7Vw4wdRhxvWhzyvFMozVXizBarj_RQS-_qxWkNTGjz50ihtujnoqWiVlDisA$ >> >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!axZVqLk0h37e_aRGUtvMl_doja_7Vw4wdRhxvWhzyvFMozVXizBarj_RQS-_qxWkNTGjz50ihtujnoqWiVlDisA$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sat Feb 14 00:34:24 2026 From: jed at jedbrown.org (Jed Brown) Date: Fri, 13 Feb 2026 23:34:24 -0700 Subject: [petsc-users] 2026 Colorado Conference on Iterative and Multigrid Methods, June 21-26 in Boulder, CO References: <0100019c340dc317-b70c38b1-4985-4c7b-8b12-7d120c13b6ca-000000@email.amazonses.com> Message-ID: <87y0kvq1e7.fsf@jedbrown.org> As Scott announces below, the conference formerly known as Copper will be held in Boulder this June. The student paper competition abstract deadline is Feb 18, with paper submission a week later. We'll have an affordable on-campus housing option. We look forward to your student paper submissions. Registration and lodging will be available soon. https://urldefense.us/v3/__https://coloradoconference.github.io/2026/__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T406EDeiok$ -------------------- Start of forwarded message -------------------- From: Scott Maclachlan via SIAM Date: Fri, 6 Feb 2026 17:44:03 +0000 Subject: SIAG on Computational Science and Engineering Community: 2026 Colorado Conference on Iterative and Multigrid Methods To: jed at jedbrown.org -------------- next part -------------- The Copper Mountain Conference on Multigrid Methods was founded in 1983 and held every two years (odd years) since then. In 1990, the Copper Mountain Conference on Iterative Methods was formed as a companion conference to be held in even years. Together, they are widely regarded as premier international conferences on iterative and multigrid methods. In 2026, we continue this conference series with the Colorado Conference on Iterative and Multigrid Methods to be held on the CU Boulder campus. Abstract submission for the conference will soon open, including for the student paper competition (a tradition at the meeting with a cash award). For more information, see our website at https://urldefense.us/v3/__https://coloradoconference.github.io/2026/__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T406EDeiok$ Updates will be provided on the web site and in follow-up announcements in the coming weeks. Don't forget to mark your calendars! Important Deadlines: Student Paper Competition (no extensions): Abstract: Wednesday, February 18, 2026 Paper (reserved for students who submit an abstract): Wednesday, February 25, 2026 Presentation Abstracts: Friday, April 3, 2026 Early Registration: Friday, April 17, 2026 ------------------------------ Scott MacLachlan Memorial University of Newfoundland, Canada ------------------------------ Reply to Sender : https://urldefense.us/v3/__https://engage.siam.org/eGroups/PostReply/?GroupId=79&MID=12597&SenderKey=7385094f-4ce3-4897-a820-b2a77145ce90__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T40WNcMrmI$ Reply to Discussion : https://urldefense.us/v3/__https://engage.siam.org/eGroups/PostReply/?GroupId=79&MID=12597__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T40HtZlATY$ You are subscribed to "SIAG on Computational Science and Engineering Community" as jed at jedbrown.org. To change your subscriptions, go to https://urldefense.us/v3/__http://siam.connectedcommunity.org/preferences?section=Subscriptions__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T40BbSxG_w$ . To unsubscribe from this community discussion, go to https://urldefense.us/v3/__https://siam.connectedcommunity.org/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=70e74b35-bc03-4037-9b99-eabb1cd118f9&sKey=6061e99933064319987e&GroupKey=d8049ff6-9924-44fb-bbe6-c2f867948e95__;!!G_uCfscf7eWS!fcsx6Rcys1KBcih7OuHWzVu8dKLV5I0XFZLg6D8fr0nbtfXYGiYZdqeqMNKmMfR0fi56Ml3n0T40dO2c5Dk$ . -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- -------------------- End of forwarded message -------------------- From drwells at email.unc.edu Tue Feb 17 10:51:58 2026 From: drwells at email.unc.edu (Wells, David) Date: Tue, 17 Feb 2026 16:51:58 +0000 Subject: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc. In-Reply-To: <6F3A41A6-90B5-475D-88A3-C2BF6AE53547@petsc.dev> References: <6F3A41A6-90B5-475D-88A3-C2BF6AE53547@petsc.dev> Message-ID: Hi Barry, Sorry for the slow response - yes, that makes sense, and it is a sensible default to have. I appreciate the writeup. Best, David ________________________________ From: Barry Smith Sent: Tuesday, February 10, 2026 9:32 PM To: Wells, David Cc: Matthew Knepley ; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc. You don't often get email from bsmith at petsc.dev. Learn why this is important 1) For a fixed restart (say of 30) FGMRES needs 60 vectors, while GMRES only needs 30. This is a big disadvantage of FGMRES over GMRES. 2) By default PETSc GMRES uses a restart of 30 meaning it keeps 30 previous Krylov vectors (and FGMRES needs 60 vectors). You can use a smaller restart with KSPGMRESSetRestart or -ksp_gmres_restart to need less memory (of course the convergence may get far worse or not depending on the problem. 3) When GMRES (or FGMRES) starts up it does not immediately allocate all 30 (or whatever) restart vectors because it may be that GMRES only takes 15 steps so why allocate all of them? Instead it allocates a chunk at a time GMRES_DELTA_DIRECTIONS which is 10 when it uses up the 10 it allocates another 10 (if needed) etc until it gets to the restart You can force GMRES to allocate all 30 (or whatever) initially instead of the chunk of a time approach by using ?KSPGMRESSetPreAllocateVectors() or -ksp_gmres_preallocate This will prevent confusion about why more vectors are allocated later and why they are not all allocated when the solve starts. 4) PETSc?s GMRES tries to use BLAS 2 operations for MDot() and MAXPY (the orthogonalization in GMRES). It can only use the BLAS 2 on vector chunks that are allocated together. By preallocating all the vectors at the beginning one gets a single chunk and hence more efficient orthogonalization; this is more important on GPUs. For CPUs whether you have 10 or 30 vectors together doesn?t matter much at all. I hope this clarifies why you are seeing the memory allocations. Note that these are NOT ?reallocations? in the sense of KSPGMRES allocating more memory and then copying something into the new memory and freeing the old. They are just allocations of new memory which will then be used. Barry On Feb 10, 2026, at 9:04?PM, Wells, David via petsc-users wrote: Hi Matt, Thanks for the quick response! > I cannot understand precisely what is happening here. You specify a restart > size when you setup the KSP. It allocates that many vecs (roughly). Why are > there reallocations? Do you increase the restart size during the iteration? I don't believe there are any reallocations (I didn't write this solver, but I don't see any calls which set the restart size or any other relevant parameter [1]): as far as I can tell, the solver just allocates a lot of vectors. I'm working off of traces computed by heaptrack, which is my only insight into how this works. The allocations come from KSPCreateVecs(), which is called by 1. KSPFGMRESGetNewVectors() (for about 1.7 GB [2] of memory) 2. KSPSetUp_GMRES() (for about 300 MB of memory) 3. KSPSetUp_FGMRES() (for about 264 MB of memory) 4. KSPSetWorkVecs() (for about 236 MB of memory) Is there some relevant set of monitoring flags I can set which will show me how many vectors I allocate or use? That would also help. Best, David [1] This is IBAMR's PETScKrylovLinearSolver. [2] This is half the total memory we use for side-centered data vectors. ________________________________ From: Matthew Knepley Sent: Tuesday, February 10, 2026 6:28 PM To: Wells, David Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Limiting the number of vectors allocated at a time by fgmres etc. On Tue, Feb 10, 2026 at 5:32?PM Wells, David via petsc-users > wrote: Hello, I've been profiling the memory usage of my solver and it looks like a huge number (roughly half) of allocations are from KSPFGMRESGetNewVectors(). I read through the source code and it looks like these vectors are allocated ten at a time (FGMRES_DELTA_DIRECTIONS) in a couple of places inside that KSP. Is there a way to change this value? We could add an option to change this delta. Actually theory suggests that a constant is not optimal, but rather we should double the number each time. I would also be willing to code that. If not - how hard would it be to add an API to set a different initial value for that? These vectors take up a lot of memory and I would rather just one at a time. I cannot understand precisely what is happening here. You specify a restart size when you setup the KSP. It allocates that many vecs (roughly). Why are there reallocations? Do you increase the restart size during the iteration? Thanks, Matt Best, David Wells -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fGPdLXU1J2z90EOLzcOjKmEhpgVOTMjLIZFkcTqlZ1InQNY5B79RxHFXzmqiSb7O09i1rWuXObNqZRj-hZmQV7v23JY$ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Feb 19 11:11:21 2026 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 19 Feb 2026 12:11:21 -0500 Subject: [petsc-users] DG methods in PETSc In-Reply-To: References: Message-ID: please keep this on the list. On Thu, Feb 19, 2026 at 9:36?AM Matteo Leone wrote: > Hello, I now have tested quite a bit the code, and I have some doubts. > Note I am quite new to PETSc. > > I have modified a bit your code to test it and I tried to print from the > functions to check if they were actually called with negative results. > The Riemann solver is never called and also the solution is just static in > time. > This was a work in progress, but it does run: ./ex9 -dm_plex_box_faces 8,8 -dm_plex_dim 2 -dm_plex_simplex 0 -order 1 -ts_max_steps 10 -ts_monitor -velocity 1.0,0.5 I see that the Riemann solver is set: PetscCall(PetscDSSetRiemannSolver(ds, 0, RiemannSolver_Advection)); I would not recommend using DG because there are no tests for it. There are tests for PetscDSSetRiemannSolver that you could clone and replace the FE/FV construction in those with DG: src/ts/tutorials/ex11.c: PetscCall(PetscDSSetRiemannSolver(prob, 0, user->model->physics->riemann)); src/ts/tutorials/ex18.c: if (user->velocityDist == VEL_ZERO) PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_advection)); src/ts/tutorials/ex18.c: else PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_coupled_advection)); If you get this far, you might want to look at internal examples (not tests) and clone these DG constructors: src/dm/dt/fe/interface/fe.c: PetscFECreateBrokenElement - Create a discontinuous version of the input `PetscFE` src/dm/dt/fe/interface/fe.c:PetscErrorCode PetscFECreateBrokenElement(PetscFE cgfe, PetscFE *dgfe) src/dm/impls/plex/plexcreate.c: PetscCall(PetscFECreateBrokenElement(fe, &dgfe)); src/dm/interface/dmcoordinates.c: PetscCall(PetscFECreateBrokenElement(feLinear, &dgfe)); Sorry we do not have DG fully deployed yet. Good luck, Mark > I share the code. It's a bit modified and no more c but c++ (I do not know > to write the csv file fom c, in case is a dealbreaker I'll try to give a > look on how to do it). > (For reference I use PETSc 3.24.4 by means of nix, the nix flake is in the > shared docs if you are used to it lmk). > > There is also a small .py to handle the visualization (It was made by > Claude code, but it just plots the results) > Hopefully we can go deep into this PETSc DG stuff. > > > Thanks in advace. > Matteo > > > ------------------------------ > *Da:* Mark Adams > *Inviato:* mercoled? 11 febbraio 2026 17:25 > *A:* Matteo Leone ; PETSc users list < > petsc-users at mcs.anl.gov> > *Oggetto:* Re: [petsc-users] DG methods in PETSc > > Great, and keep it on the list. Lots of people here to help! > > On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone > wrote: > > Wow thank you so much! I was almost hopeless, I'll deep dive into it and > I'll give you a feedback. > > Matteo > > ------------------------------ > *From:* Mark Adams > *Sent:* Wednesday, February 11, 2026 3:38:30 PM > *To:* Matteo Leone > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] DG methods in PETSc > > DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see > a test. > Here is a test but it is not well verified. > Mark > > On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello, I already posted on Reddit but just to be sure I write even here. > > First thanks for the job you do for PETSc, I have used it for several > projects and is always nice. > > I am writing cause I am getting mad trying to implement DG solver in > PETSc, the target is the Euler equations, however I am failing even with > just the simplest transport equation (u/t + u/x = 0). I was wondering if I > am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, > and more, but I keep failing, I tried also with LLMs, but seems like there > is no DG code with PETSc on the web, however I see many papers that do it. > > I was wondering if I am maybe missing something out or what. > > For reference I use PETSc 3.24.3 by means of nix. > > Thanks in advance, cheers. > > Matteo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo4.leone at mail.polimi.it Thu Feb 19 11:13:54 2026 From: matteo4.leone at mail.polimi.it (Matteo Leone) Date: Thu, 19 Feb 2026 17:13:54 +0000 Subject: [petsc-users] Fw: DG methods in PETSc In-Reply-To: References: Message-ID: Inviato da Outlook per Android ________________________________ From: Matteo Leone Sent: Thursday, February 19, 2026 3:36:44 PM To: Mark Adams Subject: R: [petsc-users] DG methods in PETSc Hello, I now have tested quite a bit the code, and I have some doubts. Note I am quite new to PETSc. I have modified a bit your code to test it and I tried to print from the functions to check if they were actually called with negative results. The Riemann solver is never called and also the solution is just static in time. I share the code. It's a bit modified and no more c but c++ (I do not know to write the csv file fom c, in case is a dealbreaker I'll try to give a look on how to do it). (For reference I use PETSc 3.24.4 by means of nix, the nix flake is in the shared docs if you are used to it lmk). There is also a small .py to handle the visualization (It was made by Claude code, but it just plots the results) Hopefully we can go deep into this PETSc DG stuff. Thanks in advace. Matteo ________________________________ Da: Mark Adams Inviato: mercoled? 11 febbraio 2026 17:25 A: Matteo Leone ; PETSc users list Oggetto: Re: [petsc-users] DG methods in PETSc Great, and keep it on the list. Lots of people here to help! On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone > wrote: Wow thank you so much! I was almost hopeless, I'll deep dive into it and I'll give you a feedback. Matteo ________________________________ From: Mark Adams > Sent: Wednesday, February 11, 2026 3:38:30 PM To: Matteo Leone > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] DG methods in PETSc DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see a test. Here is a test but it is not well verified. Mark On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users > wrote: Hello, I already posted on Reddit but just to be sure I write even here. First thanks for the job you do for PETSc, I have used it for several projects and is always nice. I am writing cause I am getting mad trying to implement DG solver in PETSc, the target is the Euler equations, however I am failing even with just the simplest transport equation (u/t + u/x = 0). I was wondering if I am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, and more, but I keep failing, I tried also with LLMs, but seems like there is no DG code with PETSc on the web, however I see many papers that do it. I was wondering if I am maybe missing something out or what. For reference I use PETSc 3.24.3 by means of nix. Thanks in advance, cheers. Matteo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dg_transp_eq.cpp Type: text/x-c++src Size: 16452 bytes Desc: dg_transp_eq.cpp URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: plot.py Type: text/x-python Size: 7225 bytes Desc: plot.py URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: flake.nix Type: application/x-mix-transfer Size: 2748 bytes Desc: flake.nix URL: From mfadams at lbl.gov Thu Feb 19 11:15:58 2026 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 19 Feb 2026 12:15:58 -0500 Subject: [petsc-users] Fw: DG methods in PETSc In-Reply-To: References: Message-ID: Just to avoid confusion with a convoluted thread, here is my response reposed: This was a work in progress, but it does run: ./ex9 -dm_plex_box_faces 8,8 -dm_plex_dim 2 -dm_plex_simplex 0 -order 1 -ts_max_steps 10 -ts_monitor -velocity 1.0,0.5 I see that the Riemann solver is set: PetscCall(PetscDSSetRiemannSolver(ds, 0, RiemannSolver_Advection)); I would not recommend using DG because there are no tests for it. There are tests for PetscDSSetRiemannSolver that you could clone and replace the FE/FV construction in those with DG: src/ts/tutorials/ex11.c: PetscCall(PetscDSSetRiemannSolver(prob, 0, user->model->physics->riemann)); src/ts/tutorials/ex18.c: if (user->velocityDist == VEL_ZERO) PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_advection)); src/ts/tutorials/ex18.c: else PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_coupled_advection)); If you get this far, you might want to look at internal examples (not tests) and clone these DG constructors: src/dm/dt/fe/interface/fe.c: PetscFECreateBrokenElement - Create a discontinuous version of the input `PetscFE` src/dm/dt/fe/interface/fe.c:PetscErrorCode PetscFECreateBrokenElement(PetscFE cgfe, PetscFE *dgfe) src/dm/impls/plex/plexcreate.c: PetscCall(PetscFECreateBrokenElement(fe, &dgfe)); src/dm/interface/dmcoordinates.c: PetscCall(PetscFECreateBrokenElement(feLinear, &dgfe)); Sorry we do not have DG fully deployed yet. Good luck, Mark On Thu, Feb 19, 2026 at 12:14?PM Matteo Leone via petsc-users < petsc-users at mcs.anl.gov> wrote: > > > Inviato da Outlook per Android > > ------------------------------ > *From:* Matteo Leone > *Sent:* Thursday, February 19, 2026 3:36:44 PM > *To:* Mark Adams > *Subject:* R: [petsc-users] DG methods in PETSc > > Hello, I now have tested quite a bit the code, and I have some doubts. > Note I am quite new to PETSc. > > I have modified a bit your code to test it and I tried to print from the > functions to check if they were actually called with negative results. > The Riemann solver is never called and also the solution is just static in > time. > I share the code. It's a bit modified and no more c but c++ (I do not know > to write the csv file fom c, in case is a dealbreaker I'll try to give a > look on how to do it). > (For reference I use PETSc 3.24.4 by means of nix, the nix flake is in the > shared docs if you are used to it lmk). > > There is also a small .py to handle the visualization (It was made by > Claude code, but it just plots the results) > Hopefully we can go deep into this PETSc DG stuff. > > > Thanks in advace. > Matteo > > > ------------------------------ > *Da:* Mark Adams > *Inviato:* mercoled? 11 febbraio 2026 17:25 > *A:* Matteo Leone ; PETSc users list < > petsc-users at mcs.anl.gov> > *Oggetto:* Re: [petsc-users] DG methods in PETSc > > Great, and keep it on the list. Lots of people here to help! > > On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone > wrote: > > Wow thank you so much! I was almost hopeless, I'll deep dive into it and > I'll give you a feedback. > > Matteo > > ------------------------------ > *From:* Mark Adams > *Sent:* Wednesday, February 11, 2026 3:38:30 PM > *To:* Matteo Leone > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] DG methods in PETSc > > DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see > a test. > Here is a test but it is not well verified. > Mark > > On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello, I already posted on Reddit but just to be sure I write even here. > > First thanks for the job you do for PETSc, I have used it for several > projects and is always nice. > > I am writing cause I am getting mad trying to implement DG solver in > PETSc, the target is the Euler equations, however I am failing even with > just the simplest transport equation (u/t + u/x = 0). I was wondering if I > am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, > and more, but I keep failing, I tried also with LLMs, but seems like there > is no DG code with PETSc on the web, however I see many papers that do it. > > I was wondering if I am maybe missing something out or what. > > For reference I use PETSc 3.24.3 by means of nix. > > Thanks in advance, cheers. > > Matteo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.j.macneice at nasa.gov Thu Feb 19 20:08:51 2026 From: peter.j.macneice at nasa.gov (Macneice, Peter J. (GSFC-6740)) Date: Fri, 20 Feb 2026 02:08:51 +0000 Subject: [petsc-users] Is use of Mirror Boundary with Box Stencil supported for 2D? Message-ID: >From my searching of the documentation, it looks to me as though this combination should work. However for my modified version of the ex66.c tutorial code, I get the error message below. Is this really not yet supported? Regards Peter MacNeice [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Mirror boundary and box stencil [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [0]PETSC ERROR: Option left: name:-dm_view (no value) source: command line [0]PETSC ERROR: Option left: name:-ksp_monitor (no value) source: command line [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!boqIOhIgncdTr1z60slIrCjusZEY0_bk-qkGbrlrmiepTiBAvCwW8s9c1IQ6vyT4fOeshVO3Bsu1F5HuERiBGRsq1r2LJ1zBtrM$ for trouble shooting. [0]PETSC ERROR: PETSc Release Version 3.24.3, Jan 01, 2026 [0]PETSC ERROR: ex66_9pt with 1 MPI process(es) and PETSC_ARCH arch-darwin-c-debug on gs67-5186361 by pmacneic Thu Feb 19 21:03:16 2026 [0]PETSC ERROR: Configure options: --with-mpi-dir=/Users/pmacneic/mpich-install-3.3-gcc15 --force [0]PETSC ERROR: #1 DMSetUp_DA_2D() at /Users/pmacneic/petsc-3.24.3/src/dm/impls/da/da2.c:212 [0]PETSC ERROR: #2 DMSetUp_DA() at /Users/pmacneic/petsc-3.24.3/src/dm/impls/da/dareg.c:17 [0]PETSC ERROR: #3 DMSetUp() at /Users/pmacneic/petsc-3.24.3/src/dm/interface/dm.c:807 [0]PETSC ERROR: #4 main() at ex66_9pt.c:77 [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -dm_view (source: command line) [0]PETSC ERROR: -ksp_monitor (source: command line) [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_SELF, 56) - process 0 Macneice, Peter J. (GSFC-6740) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Feb 19 21:37:32 2026 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 19 Feb 2026 22:37:32 -0500 Subject: [petsc-users] Is use of Mirror Boundary with Box Stencil supported for 2D? In-Reply-To: References: Message-ID: I believe the error checking is in place due to how to manage the extreme ghost corners of a rectangular (or box) region. Consider the grid with values as indicated 1 2 3 4 5 6 7 8 9 10 that we mirror in both directions with x 6 7 8 9 10 y 2 1 2 3 4 5 4 7 6 7 8 9 10 9 z 1 2 3 4 5 w I think it is likely I did not want to think about this case when I wrote the code hence put the error checking in. Quickly looking now it seems the mirroring is well defined so it is possible the error checking is not needed so long as the code properly handles those points. Sadly the code to set up all the communication patterns is complicated and my short-term memory was too small to be able to think about the box code. The code is in src/dm/impls/da/da2.c I hope you have more stamina than I do and can take a look at it and see if it needs changes etc. Note that if you install PETSc not using a ?prefix configure options you can change the code and just run make libs in a PETSc directory with PETSC_DIR (and PETSC_ARCH) set and it will update the library so you first turn off the error check and see what happens. If you can get it to work please let us know, it would be nice to support this case. Good luck Barry ] > On Feb 19, 2026, at 9:08?PM, Macneice, Peter J. (GSFC-6740) via petsc-users wrote: > > > From my searching of the documentation, it looks to me as though this combination should work. > However for my modified version of the ex66.c tutorial code, I get the error message below. > > Is this really not yet supported? > > > Regards > > Peter MacNeice > > > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Mirror boundary and box stencil > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-dm_view (no value) source: command line > [0]PETSC ERROR: Option left: name:-ksp_monitor (no value) source: command line > [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!c4lEIB-12hqgDjSzBQ8Q3sro45es-1d6wdQ2tLBRtAUg6mudiRWjcIFMcTu1RZYjSTK1WPXqCKVkPyoOIdVy568$ for trouble shooting. > [0]PETSC ERROR: PETSc Release Version 3.24.3, Jan 01, 2026 > [0]PETSC ERROR: ex66_9pt with 1 MPI process(es) and PETSC_ARCH arch-darwin-c-debug on gs67-5186361 by pmacneic Thu Feb 19 21:03:16 2026 > [0]PETSC ERROR: Configure options: --with-mpi-dir=/Users/pmacneic/mpich-install-3.3-gcc15 --force > [0]PETSC ERROR: #1 DMSetUp_DA_2D() at /Users/pmacneic/petsc-3.24.3/src/dm/impls/da/da2.c:212 > [0]PETSC ERROR: #2 DMSetUp_DA() at /Users/pmacneic/petsc-3.24.3/src/dm/impls/da/dareg.c:17 > [0]PETSC ERROR: #3 DMSetUp() at /Users/pmacneic/petsc-3.24.3/src/dm/interface/dm.c:807 > [0]PETSC ERROR: #4 main() at ex66_9pt.c:77 > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -dm_view (source: command line) > [0]PETSC ERROR: -ksp_monitor (source: command line) > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_SELF, 56) - process 0 > > > > > > > > Macneice, Peter J. (GSFC-6740) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo4.leone at mail.polimi.it Fri Feb 20 07:00:11 2026 From: matteo4.leone at mail.polimi.it (Matteo Leone) Date: Fri, 20 Feb 2026 13:00:11 +0000 Subject: [petsc-users] R: Fw: DG methods in PETSc In-Reply-To: References: Message-ID: Yes it works, but does not provide a correct solution and is stationary in time, I was just pointing this out. I'll try to get as much as I can from the PetscFE object, PETSc provides already a lot of stuff, it cannot do everything. Thanks for the support and time dedicated, Matteo ________________________________ Da: Mark Adams Inviato: gioved? 19 febbraio 2026 18:15 A: Matteo Leone Cc: PETSc users list Oggetto: Re: [petsc-users] Fw: DG methods in PETSc Just to avoid confusion with a convoluted thread, here is my response reposed: This was a work in progress, but it does run: ./ex9 -dm_plex_box_faces 8,8 -dm_plex_dim 2 -dm_plex_simplex 0 -order 1 -ts_max_steps 10 -ts_monitor -velocity 1.0,0.5 I see that the Riemann solver is set: PetscCall(PetscDSSetRiemannSolver(ds, 0, RiemannSolver_Advection)); I would not recommend using DG because there are no tests for it. There are tests for PetscDSSetRiemannSolver that you could clone and replace the FE/FV construction in those with DG: src/ts/tutorials/ex11.c: PetscCall(PetscDSSetRiemannSolver(prob, 0, user->model->physics->riemann)); src/ts/tutorials/ex18.c: if (user->velocityDist == VEL_ZERO) PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_advection)); src/ts/tutorials/ex18.c: else PetscCall(PetscDSSetRiemannSolver(prob, 1, riemann_coupled_advection)); If you get this far, you might want to look at internal examples (not tests) and clone these DG constructors: src/dm/dt/fe/interface/fe.c: PetscFECreateBrokenElement - Create a discontinuous version of the input `PetscFE` src/dm/dt/fe/interface/fe.c:PetscErrorCode PetscFECreateBrokenElement(PetscFE cgfe, PetscFE *dgfe) src/dm/impls/plex/plexcreate.c: PetscCall(PetscFECreateBrokenElement(fe, &dgfe)); src/dm/interface/dmcoordinates.c: PetscCall(PetscFECreateBrokenElement(feLinear, &dgfe)); Sorry we do not have DG fully deployed yet. Good luck, Mark On Thu, Feb 19, 2026 at 12:14?PM Matteo Leone via petsc-users > wrote: Inviato da Outlook per Android ________________________________ From: Matteo Leone > Sent: Thursday, February 19, 2026 3:36:44 PM To: Mark Adams > Subject: R: [petsc-users] DG methods in PETSc Hello, I now have tested quite a bit the code, and I have some doubts. Note I am quite new to PETSc. I have modified a bit your code to test it and I tried to print from the functions to check if they were actually called with negative results. The Riemann solver is never called and also the solution is just static in time. I share the code. It's a bit modified and no more c but c++ (I do not know to write the csv file fom c, in case is a dealbreaker I'll try to give a look on how to do it). (For reference I use PETSc 3.24.4 by means of nix, the nix flake is in the shared docs if you are used to it lmk). There is also a small .py to handle the visualization (It was made by Claude code, but it just plots the results) Hopefully we can go deep into this PETSc DG stuff. Thanks in advace. Matteo ________________________________ Da: Mark Adams > Inviato: mercoled? 11 febbraio 2026 17:25 A: Matteo Leone >; PETSc users list > Oggetto: Re: [petsc-users] DG methods in PETSc Great, and keep it on the list. Lots of people here to help! On Wed, Feb 11, 2026 at 9:46?AM Matteo Leone > wrote: Wow thank you so much! I was almost hopeless, I'll deep dive into it and I'll give you a feedback. Matteo ________________________________ From: Mark Adams > Sent: Wednesday, February 11, 2026 3:38:30 PM To: Matteo Leone > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] DG methods in PETSc DG (discontinuous Galerkin) is done with a "Broken" FE. Yikes I do not see a test. Here is a test but it is not well verified. Mark On Tue, Feb 10, 2026 at 10:25?AM Matteo Leone via petsc-users > wrote: Hello, I already posted on Reddit but just to be sure I write even here. First thanks for the job you do for PETSc, I have used it for several projects and is always nice. I am writing cause I am getting mad trying to implement DG solver in PETSc, the target is the Euler equations, however I am failing even with just the simplest transport equation (u/t + u/x = 0). I was wondering if I am missing somenthing. I tried with the DSSetReimannSolver and DualSpaces, and more, but I keep failing, I tried also with LLMs, but seems like there is no DG code with PETSc on the web, however I see many papers that do it. I was wondering if I am maybe missing something out or what. For reference I use PETSc 3.24.3 by means of nix. Thanks in advance, cheers. Matteo -------------- next part -------------- An HTML attachment was scrubbed... URL: