[petsc-dev] Additive Schwarz Method + ILU on GPU platforms

Matthew Knepley knepley at gmail.com
Tue Nov 4 13:26:51 CST 2025


On Tue, Nov 4, 2025 at 1:25 PM Angus, Justin Ray via petsc-dev <
petsc-dev at mcs.anl.gov> wrote:

> Hi Junchao,
>
> We have recently been using ASM + LU for 2D problems on both CPU and GPU.
> However, I found that this method has very bad weak scaling. I find that
> the cost of PCApply increases by about a factor of 4 each time I increase
> the problem size in 1 dimension by a factor of 2 while keeping the load per
> core/gpu the same. The total number of GMRES iterations does not increase,
> just the cost of PCApply (and PCSetup). Is this scaling behavior expected?
> Any ideas of how to optimize the preconditioner?
>

The cost of PCApply for ASM is dominated by the cost of process-local block
solves. You are using LU for the block solve. (Sparse) LU has cost roughly
O(N^2) for the apply (depending on the structure of the matrix). So, if you
double the size of a local block, your runtime should increase by about 4x.
Thus LU is not a scalable method.

  Thanks,

     Matt


> Thank you.
>
> -Justin
>
> *From: *Junchao Zhang <junchao.zhang at gmail.com>
> *Date: *Monday, April 14, 2025 at 7:35 PM
> *To: *Angus, Justin Ray <angus1 at llnl.gov>
> *Cc: *petsc-dev at mcs.anl.gov <petsc-dev at mcs.anl.gov>, Ghosh, Debojyoti <
> ghosh5 at llnl.gov>
> *Subject: *Re: [petsc-dev] Additive Schwarz Method + ILU on GPU platforms
>
> Petsc supports ILU0/ICC0 numeric factorization (without reordering) and
> then triangular solve on GPUs. It is done by calling vendor libraries (ex.
> cusparse).
> We have options -pc_factor_mat_factor_on_host <bool>
> -pc_factor_mat_solve_on_host <bool> to force doing the factorization and
> MatSolve on the host for device matrix types.
>
> You can try to see if it works for your case.
>
> --Junchao Zhang
>
>
> On Mon, Apr 14, 2025 at 4:39 PM Angus, Justin Ray via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
>
> Hello,
>
>
>
> A project I work on uses GMRES via PETSc. In particular, we have had good
> successes using the Additive Schwarz Method + ILU preconditioner setup
> using a CPU-based code. I found online where it is stated that “Parts of
> most preconditioners run directly on the GPU” (
> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dXQeQOf4ckc4MRP64tltlc6e1FJgPXuEuzX8tHsTreO_vIP2Lbge1es994i-WdQTd1zpmNP2R9dbEMyXaXgl$ 
> <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bw6qeKcY7MKSvlEgcogdKR7fpjZSOFvka6zfDprUZ_sJHdE-YZmRD6UTqWQW3_uGVBII4P-AG0zaGTLbI67_fQ$>).
> Is ASM + ILU also available for GPU platforms?
>
>
>
> -Justin
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dXQeQOf4ckc4MRP64tltlc6e1FJgPXuEuzX8tHsTreO_vIP2Lbge1es994i-WdQTd1zpmNP2R9dbEHzfUzWJ$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dXQeQOf4ckc4MRP64tltlc6e1FJgPXuEuzX8tHsTreO_vIP2Lbge1es994i-WdQTd1zpmNP2R9dbEHfLa0v_$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20251104/ec6950c0/attachment-0001.html>


More information about the petsc-dev mailing list