[petsc-users] Reasons for breakdown in preconditioned LSQR

Tue May 7 04:12:18 CDT 2024

> On 7 May 2024, at 9:10 AM, Marco Seiz <marco at kit.ac.jp> wrote:
> 
> Thanks for the quick response!
> 
> On 07.05.24 14:24, Pierre Jolivet wrote:
>> 
>> 
>>> On 7 May 2024, at 7:04 AM, Marco Seiz <marco at kit.ac.jp> wrote:
>>> 
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>> Hello,
>>> 
>>> something a bit different from my last question, since that didn't
>>> progress so well:
>>> I have a related model which generally produces a rectangular matrix A,
>>> so I am using LSQR to solve the system.
>>> The matrix A has two nonzeros (1, -1) per row, with A^T A being similar
>>> to a finite difference Poisson matrix if the rows were permuted randomly.
>>> The problem is singular in that the solution is only specified up to a
>>> constant from the matrix, with my target solution being a weighted zero
>>> average one, which I can handle by adding a nullspace to my matrix.
>>> However, I'd also like to pin (potentially many) DOFs in the future so I
>>> also tried pinning a single value, and afterwards subtracting the
>>> average from the KSP solution.
>>> This leads to the KSP *sometimes* diverging when I use a preconditioner;
>>> the target size of the matrix will be something like ([1,20] N) x N,
>>> with N ~ [2, 1e6] so for the higher end I will require a preconditioner
>>> for reasonable execution time.
>>> 
>>> For a smaller example system, I set up my application to dump the input
>>> to the KSP when it breaks down and I've attached a simple python script
>>> + data using petsc4py to demonstrate the divergence for those specific
>>> systems.
>>> With `python3 lsdiv.py -pc_type lu -ksp_converged_reason` that
>>> particular system shows breakdown, but if I remove the pinned DOF and
>>> add the nullspace (pass -usens) it converges. I did try different PCs
>>> but they tend to break down at different steps, e.g. `python3 lsdiv.py
>>> -usenormal -qrdiv -pc_type qr -ksp_converged_reason` shows the breakdown
>>> for PCQR when I use MatCreateNormal for creating the PC mat, but
>>> interestingly it doesn't break down when I explicitly form A^T A (don't
>>> pass -usenormal).
>> 
>> What version are you using? All those commands are returning
>>  Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1
>> So I cannot reproduce any breakdown, but there have been recent changes to KSPLSQR.
> For those tests I've been using PETSc 3.20.5 (last githash was
> 4b82c11ab5d ).
> I pulled the latest version from gitlab ( 6b3135e3cbe ) and compiled it,
> but I had to drop --download-suitesparse=1 from my earlier config due to
> errors.
> Should I write a separate mail about this?
> 
> The LU example still behaves the same for me (`python3 lsdiv.py -pc_type
> lu -ksp_converged_reason` gives DIVERGED_BREAKDOWN, `python3 lsdiv.py
> -usens -pc_type lu -ksp_converged_reason` gives CONVERGED_RTOL_NORMAL)
> but the QR example fails since I had to remove suitesparse.
> petsc4py.__version__ reports 3.21.1 and if I rebuild my application,
> then `ldd app` gives me `libpetsc.so <https://urldefense.us/v3/__http://libpetsc.so/__;!!G_uCfscf7eWS!auri5B6VaP-JYC4fuoLQd6QGnMRYi45UVg6GvK8V2FIlWo6HdPSPwjqjQnRiV2HkM5lAHgRRgpwXScugHRUKcQ$ >.3.21 =>
> /opt/petsc/linux-c-opt/lib/libpetsc.so <https://urldefense.us/v3/__http://libpetsc.so/__;!!G_uCfscf7eWS!auri5B6VaP-JYC4fuoLQd6QGnMRYi45UVg6GvK8V2FIlWo6HdPSPwjqjQnRiV2HkM5lAHgRRgpwXScugHRUKcQ$ >.3.21` so it should be using the
> newly built one.
> The application then still eventually yields a DIVERGED_BREAKDOWN.
> I don't have a ~/.petscrc and PETSC_OPTIONS is unset, so if we are on
> the same version and there's still a discrepancy it is quite weird.

Quite weird indeed…
$ python3 lsdiv.py -pc_type lu -ksp_converged_reason
  Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1
$ python3 lsdiv.py -usens -pc_type lu -ksp_converged_reason
  Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1
$ python3 lsdiv.py -pc_type qr -ksp_converged_reason
  Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1
$ python3 lsdiv.py -usens -pc_type qr -ksp_converged_reason
  Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1

>>> For the moment I can work by adding the nullspace but eventually the
>>> need for pinning DOFs will resurface, so I'd like to ask where the
>>> breakdown is coming from. What causes the breakdowns? Is that a generic
>>> problem occurring when adding (dof_i = val) rows to least-squares
>>> systems which prevents these preconditioners from being robust? If so,
>>> what preconditioners could be robust?
>>> I did a minimal sweep of the available PCs by going over the possible
>>> inputs of -pc_type for my application while pinning one DOF. Excepting
>>> unavailable PCs (not compiled for, other setup missing, ...) and those
>>> that did break down, I am left with ( hmg jacobi mat none pbjacobi sor
>>> svd ).
>> It’s unlikely any of these preconditioners will scale (or even converge) for problems with up to 1E6 unknowns.
>> I could help you setup https://urldefense.us/v3/__https://epubs.siam.org/doi/abs/10.1137/21M1434891__;!!G_uCfscf7eWS!auri5B6VaP-JYC4fuoLQd6QGnMRYi45UVg6GvK8V2FIlWo6HdPSPwjqjQnRiV2HkM5lAHgRRgpwXScvk6kPrWA$  if you are willing to share a larger example (the current Mat are extremely tiny).
> Yes, that would be great. About how large of a matrix do you need? I can
> probably quickly get something non-artificial up to O(N) ~ 1e3,

That’s big enough.
If you’re in luck, AMG on the normal equations won’t behave too badly, but I’ll try some more robust (in theory) methods nonetheless.

Thanks,
Pierre

> bigger
> matrices will take some time since I purposefully ignored MPI previously.
> The matrix basically describes the contacts between particles which are
> resolved on a uniform grid, so the main memory hog isn't the matrix but
> rather resolving the particles.
> I should mention that the matrix changes over the course of the
> simulation but stays constant for many solves, i.e. hundreds to
> thousands of solves with variable RHS between periods of contact
> formation/loss.
> 
>> 
>> Thanks,
>> Pierre
>>> 
>>> 
>>> Best regards,
>>> Marco
>>> 
>>> <lsdiv.zip>
>> 
>> 
> Best regards,
> Marco

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240507/92afb32d/attachment-0001.html>