[petsc-users] Debugging failed solve (what's an acceptable upper bound to the condition number?)

Fri Nov 20 14:24:29 CST 2015

   Do you really only have 851 variables?

 SVD: condition number 1.457087640207e+12, 0 of 851 singular values are (nearly) zero

if so you can use -snes_fd  and -ksp_view_pmat  binary:filename to save the small matrix and then load it up into 
MATLAB or similar tool to fully analysis its eigenstructure to see the distribution from the tiny values to the large values; Is it just a small number of tiny ones etc.

  Note that with such a large condition number the factor the linear system "converges" quickly may be meaningless since a small residual doesn't always mean a small error. The error code still be huge

  Barry

> On Nov 20, 2015, at 12:40 PM, Alex Lindsay <adlinds3 at ncsu.edu> wrote:
> 
> Hello,
> 
> I have an application built on top of the Moose framework, and I'm trying to debug a solve that is not converging. My linear solve     converges very nicely. However, my non-linear solve does not, and the problem appears to be in the line search. Reading the PetSc FAQ, I see that the most common cause of poor line searches are bad Jacobians. However, I'm using a finite-differenced Jacobian; if I run -snes_type=test, I get "norm of matrix ratios" < 1e-15. Thus in this case the Jacobian should be accurate. I'm wondering then if my problem might be these (taken from the FAQ page):
> 
> 	• The matrix is very ill-conditioned. Check the condition number.
> 		• Try to improve it by choosing the relative scaling of components/boundary conditions.
> 		• Try -ksp_diagonal_scale -ksp_diagonal_scale_fix.
> 		• Perhaps change the formulation of the problem to produce more friendly algebraic equations.
> 	• The matrix is nonlinear (e.g. evaluated using finite differencing of a nonlinear function). Try different differencing parameters, ./configure --with-precision=__float128 --download-f2cblaslapack, check if it converges in "easier" parameter regimes.
> I'm almost ashamed to share my condition number because I'm sure it must be absurdly high. Without applying -ksp_diagonal_scale and -ksp_diagonal_scale_fix, the condition number is around 1e25. When I do apply those two parameters, the condition number is reduced to 1e17. Even after scaling all my variable residuals so that they were all on the order of unity (a suggestion on the Moose list), I still have a condition number of 1e12. I have no experience with condition numbers, but knowing that perfect condition number is unity, 1e12 seems unacceptable. What's an acceptable upper limit on the condition number? Is it problem dependent? Having already tried scaling the individual variable residuals, I'm not exactly sure what my next method would be for trying to reduce the condition number.
> 
> I definitely have a nonlinear problem. Could I be having problems because I'm finite differencing non-linear residuals to form my Jacobian? I can see about using a different differencing parameter. I'm also going to consider trying quad precision. However, my hypothesis is that my condition number is the fundamental problem. Is that a reasonable hypothesis?
> 
> If it's useful, below is console output with -pc_type=svd
> 
> Time Step  1, time = 1e-10
>                 dt = 1e-10
>     |residual|_2 of individual variables:
>                potential:    8.12402e+07
>                potentialliq: 0.000819748
>                em:           49.206
>                emliq:        3.08187e-11
>                Arp:          2375.94
> 
>  0 Nonlinear |R| = 8.124020e+07
>       SVD: condition number 1.457087640207e+12, 0 of 851 singular values are (nearly) zero
>       SVD: smallest singular values: 5.637144317564e-09 9.345415388433e-08 4.106132915572e-05 1.017339655185e-04 1.147649477723e-04
>       SVD: largest singular values : 1.498505466947e+03 1.577560767570e+03 1.719172328193e+03 2.344218235296e+03 8.213813311188e+03
>     0 KSP unpreconditioned resid norm 3.185019606208e+05 true resid norm 3.185019606208e+05 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP unpreconditioned resid norm 6.382886902896e-07 true resid norm 6.382761808414e-07 ||r(i)||/||b|| 2.003994511046e-12
>   Linear solve converged due to CONVERGED_RTOL iterations 1
>       Line search: Using full step: fnorm 8.124020470169e+07 gnorm 1.097605946684e+01
>     |residual|_2 of individual variables:
>                potential:    8.60047
>                potentialliq: 0.335436
>                em:           2.26472
>                emliq:        0.642578
>                Arp:          6.39151
> 
>  1 Nonlinear |R| = 1.097606e+01
>       SVD: condition number 1.457473763066e+12, 0 of 851 singular values are (nearly) zero
>       SVD: smallest singular values: 5.637185516434e-09 9.347128557672e-08 1.017339655587e-04 1.146760266781e-04 4.064422034774e-04
>       SVD: largest singular values : 1.498505466944e+03 1.577544976882e+03 1.718956369043e+03 2.343692402876e+03 8.216049987736e+03
>     0 KSP unpreconditioned resid norm 2.653715381459e+01 true resid norm 2.653715381459e+01 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP unpreconditioned resid norm 6.031179341420e-05 true resid norm 6.031183387732e-05 ||r(i)||/||b|| 2.272731819648e-06
>   Linear solve converged due to CONVERGED_RTOL iterations 1
>       Line search: gnorm after quadratic fit 2.485190757827e+11
>       Line search: Cubic step no good, shrinking lambda, current gnorm 2.632996340352e+10 lambda=5.0000000000000003e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 4.290675557416e+09 lambda=2.5000000000000001e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 4.332980055153e+08 lambda=1.2500000000000001e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.677118626669e+07 lambda=6.2500000000000003e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.024469780306e+05 lambda=3.1250000000000002e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.011543252988e+03 lambda=1.5625000000000001e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.750171277470e+03 lambda=7.8125000000000004e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 3.486970625406e+02 lambda=3.4794637057251714e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.830624839582e+01 lambda=1.5977866967992950e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 2.147529381328e+01 lambda=6.8049915671999093e-05
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.138950943123e+01 lambda=1.7575203052774536e-05
>       Line search: Cubically determined step, current gnorm 1.095195976135e+01 lambda=1.7575203052774537e-06
>     |residual|_2 of individual variables:
>                potential:    8.59984
>                potentialliq: 0.395753
>                em:           2.26492
>                emliq:        0.642578
>                Arp:          6.34735
> 
>  2 Nonlinear |R| = 1.095196e+01
>       SVD: condition number 1.457459214030e+12, 0 of 851 singular values are (nearly) zero
>       SVD: smallest singular values: 5.637295371943e-09 9.347057884198e-08 1.017339655949e-04 1.146738253493e-04 4.064421554132e-04
>       SVD: largest singular values : 1.498505466946e+03 1.577543742603e+03 1.718948052797e+03 2.343672206864e+03 8.216128082047e+03
>     0 KSP unpreconditioned resid norm 2.653244141805e+01 true resid norm 2.653244141805e+01 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP unpreconditioned resid norm 4.480869560737e-05 true resid norm 4.480686665183e-05 ||r(i)||/||b|| 1.688757771886e-06
>   Linear solve converged due to CONVERGED_RTOL iterations 1
>       Line search: gnorm after quadratic fit 2.481752147885e+11
>       Line search: Cubic step no good, shrinking lambda, current gnorm 2.631959989642e+10 lambda=5.0000000000000003e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 4.289110800463e+09 lambda=2.5000000000000001e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 4.332043942482e+08 lambda=1.2500000000000001e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.677933337886e+07 lambda=6.2500000000000003e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.027980597206e+05 lambda=3.1250000000000002e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.054113639063e+03 lambda=1.5625000000000001e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.771258630210e+03 lambda=7.8125000000000004e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 3.517070127496e+02 lambda=3.4519087020105563e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.844350966118e+01 lambda=1.5664532891249369e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 2.114833995101e+01 lambda=6.5367917100814859e-05
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.144636844292e+01 lambda=1.6044984646715980e-05
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.095640770627e+01 lambda=1.6044984646715980e-06
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.095196729511e+01 lambda=1.6044984646715980e-07
>       Line search: Cubically determined step, current gnorm 1.095195451041e+01 lambda=2.3994454223607641e-08
>     |residual|_2 of individual variables:
>                potential:    8.59983
>                potentialliq: 0.396107
>                em:           2.26492
>                emliq:        0.642578
>                Arp:          6.34733
> 
>  3 Nonlinear |R| = 1.095195e+01
>       SVD: condition number 1.457474387942e+12, 0 of 851 singular values are (nearly) zero
>       SVD: smallest singular values: 5.637237413167e-09 9.347057670885e-08 1.017339654798e-04 1.146737961973e-04 4.064420550524e-04
>       SVD: largest singular values : 1.498505466946e+03 1.577543716995e+03 1.718947893048e+03 2.343671853830e+03 8.216129148438e+03
>     0 KSP unpreconditioned resid norm 2.653237816527e+01 true resid norm 2.653237816527e+01 ||r(i)||/||b|| 1.000000000000e+00
>     1 KSP unpreconditioned resid norm 8.525213442515e-05 true resid norm 8.527696332776e-05 ||r(i)||/||b|| 3.214071607022e-06
>   Linear solve converged due to CONVERGED_RTOL iterations 1
>       Line search: gnorm after quadratic fit 2.481576195523e+11
>       Line search: Cubic step no good, shrinking lambda, current gnorm 2.632005412624e+10 lambda=5.0000000000000003e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 4.289212002697e+09 lambda=2.5000000000000001e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 4.332196637845e+08 lambda=1.2500000000000001e-02
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.678040222943e+07 lambda=6.2500000000000003e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.027868984884e+05 lambda=3.1250000000000002e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.010733464460e+03 lambda=1.5625000000000001e-03
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.751519860441e+03 lambda=7.8125000000000004e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 3.497889916171e+02 lambda=3.4753778542938795e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 7.932631084466e+01 lambda=1.5879606741873878e-04
>       Line search: Cubic step no good, shrinking lambda, current gnorm 2.194608479634e+01 lambda=6.5716583192912669e-05
>       Line search: Cubic step no good, shrinking lambda, current gnorm 1.117190149691e+01 lambda=1.1541218569257328e-05
>       Line search: Cubically determined step, current gnorm 1.093879875464e+01 lambda=1.1541218569257329e-06
>     |residual|_2 of individual variables:
>                potential:    8.59942
>                potentialliq: 0.403326
>                em:           2.26505
>                emliq:        0.714844
>                Arp:          6.3169
> 
>  4 Nonlinear |R| = 1.093880e+01
>