[petsc-users] TAO STCG initial perturbation

Dener, Alp adener at anl.gov
Wed Jun 10 10:29:26 CDT 2020


Hello,

STCG is being used to compute a search direction by inverting the Hessian of the objective onto the gradient. The Hessian has to be positive definitive for this search direction to be a valid descent direction. To enforce this, STCG terminates the KSP solution when it encounters negative curvature (i.e. “uphill” directions). The resulting search direction from this early termination is still a descent direction but it’s a pretty bad one, meaning that the line search is forced to do extra work to find a valid (and sometimes pretty small) step length. This is normal.

Once a negative curvature is detected, the NLS algorithm imposes a shift to the diagonal of the Hessian in order to guarantee that it will be positive-definite on the next optimization iteration. This diagonal shift is increased in magnitude if you keep hitting the negative curvature KSP termination, and it’s gradually decreased on iterations where the KSP solution succeeds. However this does not prevent the line search from doing all that extra work on iterations where the search direction solution never converged fully. It simply helps generate better search directions in subsequent iterations and helps the overall optimization solution eventually converge to a valid minimum.

As a side note, if your function, gradient and/or Hessian evaluations are expensive, it might actually make more sense to use the quasi-Newton method (BQNLS) instead of Newton line search. The optimization solution is likely to take more overall iterations with BQNLS(-tao_type bqnls), but the BFGS approximation used by the algorithm is inherently guaranteed to be positive-definite and tends to generate very well scaled search directions so it might help the line search do less work. Combined with the elimination of Hessian evaluations, it may very well solve your problem faster in CPU time even if it takes more nonlinear iterations overall. This is technically a bound constrained method but it solves unconstrained problems too when you don’t define any bounds.

About Levenberg-Marquardt: a user started the branch to eventually contribute an LM solver, but I have not heard any updates on it since end of April. For least-squares type problems, you can try using the regularized Gauss-Newton solver (-tao_type brgn). The problem definition interface is a bit different. BRGN requires the problem to be defined as a residual and its Jacobian, and it will assemble the gradient and the Hessian on its own and feed it into the standard Newton line search solver underneath. There are also a few different regularization options available, like proximal point Tikhonov (l2prox) and a sparsity term (l1dict) that can be set with the (-tao_brgn_regularization_type) option. These get automatically cooked into the gradient and Hessian. Combined with the automatic diagonal shifts for the Hessian, BRGN really is almost equivalent to an LM solver so it might do what you need.

Hope this helps!
—
Alp

On Jun 9, 2020, at 10:55 PM, zakaryah . <zakaryah at gmail.com<mailto:zakaryah at gmail.com>> wrote:

I am using TAO to minimize the elastic strain energy for somewhat complicated applied forces. With Newton linesearch (NLS), the STCG KSP, and several preconditioners (none, Jacobi, lmvm), the solver finds a minimum within an acceptable number of iterations (<50). Still, in the interest of performance, I am wondering about what happens after the KSP encounters an indefinite matrix (KSP_CONVERGED_CG_NEG_CURVE). After this happens, the line search performs extremely poorly, with function values at step length 1 reaching absurdly large values. As the line search tries smaller step lengths, the function values fall, then typically reach values representing a decline from the previous iteration, but only with tiny step lengths (~1e-11). The line search takes many iterations to arrive at such an improvement. Here is an example, run with  -tao_type nls -tao_nls_ksp_type stcg -tao_nls_pc_type none -tao_nls_ksp_norm_type unpreconditioned:

  0 TAO,  Function value: 0.160612,  Residual: 0.0736074
  0 TAO,  Function value: 0.121568,  Residual: 0.0424117
    Residual norms for tao_nls_ solve.
    0 KSP Residual norm 4.241174778983e-02
    1 KSP Residual norm 5.806891169509e-02
    2 KSP Residual norm 6.492953448014e-02
    3 KSP Residual norm 5.856559430984e-02
    4 KSP Residual norm 5.262548559710e-02
    5 KSP Residual norm 4.863633473400e-02
    6 KSP Residual norm 4.725156347026e-02
    7 KSP Residual norm 4.748458009319e-02
    8 KSP Residual norm 4.885339641711e-02
    9 KSP Residual norm 5.065071226354e-02
   10 KSP Residual norm 5.085544070851e-02
   11 KSP Residual norm 5.127093547976e-02
   12 KSP Residual norm 5.155225884843e-02
   13 KSP Residual norm 5.219895021408e-02
   14 KSP Residual norm 6.480520610077e-02
   15 KSP Residual norm 1.433515456621e-01
  Linear tao_nls_ solve converged due to CONVERGED_CG_NEG_CURVE iterations 16
    0 LS    Function value: 0.121568,    Step length: 0.
    1 LS    Function value: 5.99839e+59,    Step length: 1.
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0., fy: 0.121568, dgy: -1.32893e+08
    2 LS    Function value: 1.46445e+56,    Step length: 0.25
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 1., fy: 5.99839e+59, dgy: 3.59903e+60
    3 LS    Function value: 3.57532e+52,    Step length: 0.0625
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.25, fy: 1.46445e+56, dgy: 3.51468e+57
    4 LS    Function value: 8.7288e+48,    Step length: 0.015625
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.0625, fy: 3.57532e+52, dgy: 3.4323e+54
    5 LS    Function value: 2.13105e+45,    Step length: 0.00390625
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.015625, fy: 8.7288e+48, dgy: 3.35186e+51
    6 LS    Function value: 5.20277e+41,    Step length: 0.000976562
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.00390625, fy: 2.13105e+45, dgy: 3.2733e+48
    7 LS    Function value: 1.27021e+38,    Step length: 0.000244141
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.000976562, fy: 5.20277e+41, dgy: 3.19658e+45
    8 LS    Function value: 3.1011e+34,    Step length: 6.10352e-05
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.000244141, fy: 1.27021e+38, dgy: 3.12166e+42
    9 LS    Function value: 7.57106e+30,    Step length: 1.52588e-05
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 6.10352e-05, fy: 3.1011e+34, dgy: 3.0485e+39
   10 LS    Function value: 1.84842e+27,    Step length: 3.8147e-06
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 1.52588e-05, fy: 7.57106e+30, dgy: 2.97706e+36
   11 LS    Function value: 4.51295e+23,    Step length: 9.53672e-07
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 3.8147e-06, fy: 1.84842e+27, dgy: 2.90731e+33
   12 LS    Function value: 1.10199e+20,    Step length: 2.38417e-07
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 9.53672e-07, fy: 4.51295e+23, dgy: 2.83927e+30
   13 LS    Function value: 2.69231e+16,    Step length: 5.96028e-08
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 2.38417e-07, fy: 1.10199e+20, dgy: 2.77314e+27
   14 LS    Function value: 6.59192e+12,    Step length: 1.48993e-08
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 5.96028e-08, fy: 2.69231e+16, dgy: 2.70974e+24
   15 LS    Function value: 1.62897e+09,    Step length: 3.72338e-09
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 1.48993e-08, fy: 6.59192e+12, dgy: 2.65254e+21
   16 LS    Function value: 421305.,    Step length: 9.29291e-10
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 3.72338e-09, fy: 1.62897e+09, dgy: 2.61626e+18
   17 LS    Function value: 141.939,    Step length: 2.30343e-10
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 9.29291e-10, fy: 421305., dgy: 2.67496e+15
   18 LS    Function value: 0.213621,    Step length: 5.45712e-11
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 2.30343e-10, fy: 141.939, dgy: 3.35874e+12
   19 LS    Function value: 0.120058,    Step length: 1.32286e-11
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 5.45712e-11, fy: 0.213621, dgy: 8.60977e+09
  1 TAO,  Function value: 0.120058,  Residual: 0.0484495

When the KSP does not encounter negative curvature, the linesearch performs well:
  3 TAO,  Function value: 0.118376,  Residual: 0.0545446
    Residual norms for tao_nls_ solve.
    0 KSP Residual norm 5.454463134947e-02
    1 KSP Residual norm 2.394277461960e-02
    2 KSP Residual norm 6.674182971627e-03
    3 KSP Residual norm 1.235116278289e-03
    4 KSP Residual norm 1.714792759324e-04
    5 KSP Residual norm 3.928769927518e-05
    6 KSP Residual norm 8.464174666739e-06
    7 KSP Residual norm 1.583763581407e-06
    8 KSP Residual norm 3.251685842746e-07
  Linear tao_nls_ solve converged due to CONVERGED_RTOL iterations 8
    0 LS    Function value: 0.118376,    Step length: 0.
    1 LS    Function value: 0.117884,    Step length: 1.
        stx: 0., fx: 0.118376, dgx: -0.000530697
        sty: 0., fy: 0.118376, dgy: -0.000530697

I have attached the full log for this particular solve. The points of negative curvature do not surprise me - this is a difficult problem, with many singularities in the Jacobian/Hessian. In fact, I am very happy with how well STCG is performing, globally. My question is what to make of the poor performance of the linesearch after encountering these points. As I understand it, most of the flags for adjusting the solver after encountering an indefinite matrix involve altering the magnitude of the perturbation by which the Hessian is offset for calculating the update direction. I am not clear on how to adjust the routine for determining the initial step size. More generally, should I expect these points of negative curvature to cause difficulties for the solver, and be satisfied with a large number of iterations before finding an improvement at tiny step size?

I have a loosely related question about the status of the Levenberg Marquardt solver within Tao? I remember hearing on the list that this was being worked on, but I am not sure from the branch description whether it is working or intended for general use. I would love to hear any updates on that, because I think it could be useful for my problem.

Thanks!
<Tao_log.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200610/3285597d/attachment-0001.html>


More information about the petsc-users mailing list