[petsc-dev] Additive Schwarz Method + ILU on GPU platforms
Mark Adams
mfadams at lbl.gov
Wed Nov 5 08:13:35 CST 2025
And we do not have sparse LU on GPUs so that is done on the CPU.
And I don't know why it would not weak scale well.
Your results are consistent with just using one process with one domain,
(re Matt) while you double the problem size.
On Tue, Nov 4, 2025 at 2:27 PM Matthew Knepley <knepley at gmail.com> wrote:
> On Tue, Nov 4, 2025 at 1:25 PM Angus, Justin Ray via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
>
>> Hi Junchao,
>>
>> We have recently been using ASM + LU for 2D problems on both CPU and GPU.
>> However, I found that this method has very bad weak scaling. I find that
>> the cost of PCApply increases by about a factor of 4 each time I increase
>> the problem size in 1 dimension by a factor of 2 while keeping the load per
>> core/gpu the same. The total number of GMRES iterations does not increase,
>> just the cost of PCApply (and PCSetup). Is this scaling behavior expected?
>> Any ideas of how to optimize the preconditioner?
>>
>
> The cost of PCApply for ASM is dominated by the cost of process-local
> block solves. You are using LU for the block solve. (Sparse) LU has cost
> roughly O(N^2) for the apply (depending on the structure of the matrix).
> So, if you double the size of a local block, your runtime should increase
> by about 4x. Thus LU is not a scalable method.
>
> Thanks,
>
> Matt
>
>
>> Thank you.
>>
>> -Justin
>>
>> *From: *Junchao Zhang <junchao.zhang at gmail.com>
>> *Date: *Monday, April 14, 2025 at 7:35 PM
>> *To: *Angus, Justin Ray <angus1 at llnl.gov>
>> *Cc: *petsc-dev at mcs.anl.gov <petsc-dev at mcs.anl.gov>, Ghosh, Debojyoti <
>> ghosh5 at llnl.gov>
>> *Subject: *Re: [petsc-dev] Additive Schwarz Method + ILU on GPU platforms
>>
>> Petsc supports ILU0/ICC0 numeric factorization (without reordering) and
>> then triangular solve on GPUs. It is done by calling vendor libraries (ex.
>> cusparse).
>> We have options -pc_factor_mat_factor_on_host <bool>
>> -pc_factor_mat_solve_on_host <bool> to force doing the factorization and
>> MatSolve on the host for device matrix types.
>>
>> You can try to see if it works for your case.
>>
>> --Junchao Zhang
>>
>>
>> On Mon, Apr 14, 2025 at 4:39 PM Angus, Justin Ray via petsc-dev <
>> petsc-dev at mcs.anl.gov> wrote:
>>
>> Hello,
>>
>>
>>
>> A project I work on uses GMRES via PETSc. In particular, we have had good
>> successes using the Additive Schwarz Method + ILU preconditioner setup
>> using a CPU-based code. I found online where it is stated that “Parts of
>> most preconditioners run directly on the GPU” (
>> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!Ydf9z5k1UHnNhQ0sTzum0-c33UUe--FbwPXnTQ35bVB4TZQru9VwKm_xB6tGZIX_6lBOGe4PYotkSOdSAXXrPg$
>> <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bw6qeKcY7MKSvlEgcogdKR7fpjZSOFvka6zfDprUZ_sJHdE-YZmRD6UTqWQW3_uGVBII4P-AG0zaGTLbI67_fQ$>).
>> Is ASM + ILU also available for GPU platforms?
>>
>>
>>
>> -Justin
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!Ydf9z5k1UHnNhQ0sTzum0-c33UUe--FbwPXnTQ35bVB4TZQru9VwKm_xB6tGZIX_6lBOGe4PYotkSOdhWgMEag$
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dXQeQOf4ckc4MRP64tltlc6e1FJgPXuEuzX8tHsTreO_vIP2Lbge1es994i-WdQTd1zpmNP2R9dbEHfLa0v_$>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20251105/122ae68d/attachment.html>
More information about the petsc-dev
mailing list