[petsc-users] Reasons for breakdown in preconditioned LSQR

Pierre Jolivet pierre at joliv.et
Tue May 7 09:29:23 CDT 2024


OK, it’s very trivial to solve with algebraic solvers.
If you are willing to share larger test cases, maybe then some issue will arise (please do not attach them to your mail, send a URL, if you cannot and must attach it to your mail, switch to the following mailing list: petsc-maint at mcs.anl.gov)

Thanks,
Pierre

$ mpirun -n 8 ./sparse_ls -mat_name /tmp/system/matdump_step00010000_dir0.bin -options_file gmres.rc -pc_type hypre -options_left 0
  0 KSP Residual norm 3.453633143461e+01
  1 KSP Residual norm 1.454005637026e+00
  2 KSP Residual norm 7.254564223178e-02
  3 KSP Residual norm 2.595967767025e-03
  4 KSP Residual norm 8.616844003887e-05
  5 KSP Residual norm 2.519788767995e-06
  6 KSP Residual norm 6.830984907384e-08
  Linear solve converged due to CONVERGED_RTOL iterations 6
||A^T(Ax-b)|| / ||Ax-b|| = 0.000000 / 17.017700 = 0.000000
$ mpirun -n 8 ./sparse_ls -mat_name /tmp/system/matdump_step00010000_dir0.bin -options_file gmres.rc -pc_type gamg -options_left 0
  0 KSP Residual norm 3.453633143461e+01
  1 KSP Residual norm 9.410331701775e+00
  2 KSP Residual norm 7.249670993944e-01
  3 KSP Residual norm 8.415995591379e-02
  4 KSP Residual norm 8.230048116783e-03
  5 KSP Residual norm 8.200299281980e-04
  6 KSP Residual norm 8.332302028270e-05
  7 KSP Residual norm 9.103576883087e-06
  8 KSP Residual norm 8.837914581158e-07
  9 KSP Residual norm 8.295880033739e-08
  Linear solve converged due to CONVERGED_RTOL iterations 9
||A^T(Ax-b)|| / ||Ax-b|| = 0.000000 / 17.017700 = 0.000000
$ mpirun -n 8 ./sparse_ls -mat_name /tmp/system/matdump_step00010000_dir0.bin -options_file gmres.rc -pc_type hpddm -options_left 0
  0 KSP Residual norm 3.453633143461e+01
  1 KSP Residual norm 4.455259710220e+00
  2 KSP Residual norm 1.190904699181e-01
  3 KSP Residual norm 5.220970280301e-03
  4 KSP Residual norm 6.214389182723e-05
  5 KSP Residual norm 1.693927074022e-06
  6 KSP Residual norm 3.700055659502e-08
  Linear solve converged due to CONVERGED_RTOL iterations 6
||A^T(Ax-b)|| / ||Ax-b|| = 0.000000 / 17.017700 = 0.000000


> On 7 May 2024, at 2:50 PM, Seiz,Marco <marco at kit.ac.jp> wrote:
> 
> This Message Is From an External Sender
> This message came from outside your organization.
> Pierre,
> 
> I've attached the dumps of the matrix + RHS for something of about 3k x 1k.
> 
> 
> Regarding the weird divergence behaviour, I tried again at home but I still get the same results.
> 
> I am running a rolling release distribution on both machines, but that really shouldn't matter for divergence behavior I would think.
> 
> Is there some kind of option in PETSc to get more information about the breakdown from my side?
> 
> 
> Best regards,
> 
> Marco
> 
> ----- Original Message -----
> >> From: Pierre Jolivet <pierre at joliv.et <mailto:pierre at joliv.et>>
> >> To: Marco Seiz <marco at kit.ac.jp <mailto:marco at kit.ac.jp>>
> >> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>
> >> Date: 2024-05-07 18:12:18
> >> Subject: Re: [petsc-users] Reasons for breakdown in preconditioned LSQR
> >> 
> 
> >> 
> >> > On 7 May 2024, at 9:10 AM, Marco Seiz <marco at kit.ac.jp <mailto:marco at kit.ac.jp>> wrote:
> >> > 
> >> > Thanks for the quick response!
> >> > 
> >> > On 07.05.24 14:24, Pierre Jolivet wrote:
> >> >> 
> >> >> 
> >> >>> On 7 May 2024, at 7:04 AM, Marco Seiz <marco at kit.ac.jp <mailto:marco at kit.ac.jp>> wrote:
> >> >>> 
> >> >>> This Message Is From an External Sender
> >> >>> This message came from outside your organization.
> >> >>> Hello,
> >> >>> 
> >> >>> something a bit different from my last question, since that didn't
> >> >>> progress so well:
> >> >>> I have a related model which generally produces a rectangular matrix A,
> >> >>> so I am using LSQR to solve the system.
> >> >>> The matrix A has two nonzeros (1, -1) per row, with A^T A being similar
> >> >>> to a finite difference Poisson matrix if the rows were permuted randomly.
> >> >>> The problem is singular in that the solution is only specified up to a
> >> >>> constant from the matrix, with my target solution being a weighted zero
> >> >>> average one, which I can handle by adding a nullspace to my matrix.
> >> >>> However, I'd also like to pin (potentially many) DOFs in the future so I
> >> >>> also tried pinning a single value, and afterwards subtracting the
> >> >>> average from the KSP solution.
> >> >>> This leads to the KSP *sometimes* diverging when I use a preconditioner;
> >> >>> the target size of the matrix will be something like ([1,20] N) x N,
> >> >>> with N ~ [2, 1e6] so for the higher end I will require a preconditioner
> >> >>> for reasonable execution time.
> >> >>> 
> >> >>> For a smaller example system, I set up my application to dump the input
> >> >>> to the KSP when it breaks down and I've attached a simple python script
> >> >>> + data using petsc4py to demonstrate the divergence for those specific
> >> >>> systems.
> >> >>> With `python3 lsdiv.py -pc_type lu -ksp_converged_reason` that
> >> >>> particular system shows breakdown, but if I remove the pinned DOF and
> >> >>> add the nullspace (pass -usens) it converges. I did try different PCs
> >> >>> but they tend to break down at different steps, e.g. `python3 lsdiv.py
> >> >>> -usenormal -qrdiv -pc_type qr -ksp_converged_reason` shows the breakdown
> >> >>> for PCQR when I use MatCreateNormal for creating the PC mat, but
> >> >>> interestingly it doesn't break down when I explicitly form A^T A (don't
> >> >>> pass -usenormal).
> >> >> 
> >> >> What version are you using? All those commands are returning
> >> >>  Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1
> >> >> So I cannot reproduce any breakdown, but there have been recent changes to KSPLSQR.
> >> > For those tests I've been using PETSc 3.20.5 (last githash was
> >> > 4b82c11ab5d ).
> >> > I pulled the latest version from gitlab ( 6b3135e3cbe ) and compiled it,
> >> > but I had to drop --download-suitesparse=1 from my earlier config due to
> >> > errors.
> >> > Should I write a separate mail about this?
> >> > 
> >> > The LU example still behaves the same for me (`python3 lsdiv.py -pc_type
> >> > lu -ksp_converged_reason` gives DIVERGED_BREAKDOWN, `python3 lsdiv.py
> >> > -usens -pc_type lu -ksp_converged_reason` gives CONVERGED_RTOL_NORMAL)
> >> > but the QR example fails since I had to remove suitesparse.
> >> > petsc4py.__version__ reports 3.21.1 and if I rebuild my application,
> >> > then `ldd app` gives me `libpetsc.so <https://urldefense.us/v3/__http://libpetsc.so/__;!!G_uCfscf7eWS!c1Ilm648iIid0fNBLdTFLRbvkN5mXZ2UQOp3h1hpqyawkAkemrT_drZlBVd2jY51MyTXIxxNG4zF9Zxk8HtMRw$ > <https://urldefense.us/v3/__http://libpetsc.so/__;!!G_uCfscf7eWS!fW1baXZMAQIKi0VDUIDUUzpMi4xQf7jrWGCXPlpIllqKAXJBzDClVwrLKYuWuT7LYfZoDzK4g9I9g_wFtHOXaQ$>.3.21 =>
> >> > /opt/petsc/linux-c-opt/lib/libpetsc.so <https://urldefense.us/v3/__http://libpetsc.so/__;!!G_uCfscf7eWS!c1Ilm648iIid0fNBLdTFLRbvkN5mXZ2UQOp3h1hpqyawkAkemrT_drZlBVd2jY51MyTXIxxNG4zF9Zxk8HtMRw$ > <https://urldefense.us/v3/__http://libpetsc.so/__;!!G_uCfscf7eWS!fW1baXZMAQIKi0VDUIDUUzpMi4xQf7jrWGCXPlpIllqKAXJBzDClVwrLKYuWuT7LYfZoDzK4g9I9g_wFtHOXaQ$>.3.21` so it should be using the
> >> > newly built one.
> >> > The application then still eventually yields a DIVERGED_BREAKDOWN.
> >> > I don't have a ~/.petscrc and PETSC_OPTIONS is unset, so if we are on
> >> > the same version and there's still a discrepancy it is quite weird.
> >> 
> >> Quite weird indeed…
> >> $ python3 lsdiv.py -pc_type lu -ksp_converged_reason
> >>   Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1
> >> $ python3 lsdiv.py -usens -pc_type lu -ksp_converged_reason
> >>   Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1
> >> $ python3 lsdiv.py -pc_type qr -ksp_converged_reason
> >>   Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1
> >> $ python3 lsdiv.py -usens -pc_type qr -ksp_converged_reason
> >>   Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1
> >> 
> >> >>> For the moment I can work by adding the nullspace but eventually the
> >> >>> need for pinning DOFs will resurface, so I'd like to ask where the
> >> >>> breakdown is coming from. What causes the breakdowns? Is that a generic
> >> >>> problem occurring when adding (dof_i = val) rows to least-squares
> >> >>> systems which prevents these preconditioners from being robust? If so,
> >> >>> what preconditioners could be robust?
> >> >>> I did a minimal sweep of the available PCs by going over the possible
> >> >>> inputs of -pc_type for my application while pinning one DOF. Excepting
> >> >>> unavailable PCs (not compiled for, other setup missing, ...) and those
> >> >>> that did break down, I am left with ( hmg jacobi mat none pbjacobi sor
> >> >>> svd ).
> >> >> It’s unlikely any of these preconditioners will scale (or even converge) for problems with up to 1E6 unknowns.
> >> >> I could help you setup https://urldefense.us/v3/__https://epubs.siam.org/doi/abs/10.1137/21M1434891__;!!G_uCfscf7eWS!fW1baXZMAQIKi0VDUIDUUzpMi4xQf7jrWGCXPlpIllqKAXJBzDClVwrLKYuWuT7LYfZoDzK4g9I9g_z0Iwv7Sg$ if you are willing to share a larger example (the current Mat are extremely tiny).
> >> > Yes, that would be great. About how large of a matrix do you need? I can
> >> > probably quickly get something non-artificial up to O(N) ~ 1e3,
> >> 
> >> That’s big enough.
> >> If you’re in luck, AMG on the normal equations won’t behave too badly, but I’ll try some more robust (in theory) methods nonetheless.
> >> 
> >> Thanks,
> >> Pierre
> >> 
> >> > bigger
> >> > matrices will take some time since I purposefully ignored MPI previously.
> >> > The matrix basically describes the contacts between particles which are
> >> > resolved on a uniform grid, so the main memory hog isn't the matrix but
> >> > rather resolving the particles.
> >> > I should mention that the matrix changes over the course of the
> >> > simulation but stays constant for many solves, i.e. hundreds to
> >> > thousands of solves with variable RHS between periods of contact
> >> > formation/loss.
> >> > 
> >> >> 
> >> >> Thanks,
> >> >> Pierre
> >> >>> 
> >> >>> 
> >> >>> Best regards,
> >> >>> Marco
> >> >>> 
> >> >>> <lsdiv.zip>
> >> >> 
> >> >> 
> >> > Best regards,
> >> > Marco
> >> 
> >> 
> >> 
> <system.zip>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240507/41b76da5/attachment-0001.html>


More information about the petsc-users mailing list