[petsc-users] TAO STCG initial perturbation

Dener, Alp adener at anl.gov
Thu Jun 11 14:01:42 CDT 2020


Hi Zak,

Gauss-Newton finds the least-squares solution of overdetermined systems, e.g. nonlinear regression. It minimizes the squared L2-norm of a nonlinear residual ||r(x)||_2^2 where the Jacobian J = dr/dx is rectangular with full column rank. Since this J is not invertible, Gauss-Newton uses the left pseudo-inverse (J^T J)^{-1} J^T to compute an approximate Newton direction. However, when J is invertible, the pseudo-inverse exactly reduces to the classic inverse J^{-1} and you recover the full Newton method.

If you have a problem with an invertible J, then you should directly use a Newton method for it (BNLS or BNTR) instead of using Gauss-Newton (BRGN). To utilize your preconditioner to J, you can get a pointer to the PC object via TaoGetKSP -> KSPGetPC, set its type to PCSHELL, and then define the operator for it via PCShellSetApply. If it’s a good preconditioner, it should significantly improve the KSP solution and, by extension, likely reduce the work done in the line search.

Alp

On Jun 11, 2020, at 12:10 PM, zakaryah . <zakaryah at gmail.com<mailto:zakaryah at gmail.com>> wrote:

Hi Alp,

Thanks for the help. Quasi-Newton seems promising - the Tao solver eventually converges, sometimes after hundreds or even thousands of iterations, with each iterate proceeding very quickly thanks to not evaluating the Hessian. I have only tried this with the problem set up as a general optimization, i.e., not a least-squares problem per se.

Your help got me started with testing BRGN, which I'd like to explore more. Can you give me some advice on preconditioners for the subsolver? I guess that the subsolver uses a shell matrix to calculate elements of J^T J from the Jacobian J. I have good preconditioners for J, but I'm not sure what to do in order to apply them to the shell matrix.

Thanks, Zak

On Wed, Jun 10, 2020 at 3:10 PM Dener, Alp <adener at anl.gov<mailto:adener at anl.gov>> wrote:
Hi Zak,

You got it right with the TaoBRGNGetSubsolver -> TaoGetKSP workflow. This will get you the KSP object correctly.

BRGN is not a stand-alone solver. It’s a wrapper that combines the user-provided residual and Jacobian callbacks to assemble the gradient and Hessian under the Gauss-Newton formulation, and feed that information into a standard TAO solver like NLS. It also incorporates some regularizations to the problem too. Technically any user could also do this themselves and directly use NLS to solve a least squares problem, but BRGN reduces the workload a little bit.

To use it for your least-squares problem, you would need to set TaoType() to TAOBRGN and then provide the Tao solver two callbacks: one for the residual using TaoSetResidualRoutine() and another for the Jacobian using TaoSetJacobianResidualRoutine(). The regularization can be changed using -tao_brgn_regularization_type <none,l2pure,l2prox,l1dict> command line options.

Note that the BRGN sub solver can also be changed. It’s set to BNLS by default — this is bound constrained Newton line search but it also solves unconstrained problems when no bounds are defined. You could always try to use BNTR for a trust-region method instead of line search, or switch to BNQLS to do a quasi-Gauss-Newton solution. The sub solver’s settings can be accessed via the -tao_brgn_subsolver_ prefix. So for instance if you wanted to change type, you could do "-tao_brgn_subsolver_tao_type bntr" for Newton trust-region.

Alp

On Jun 10, 2020, at 1:55 PM, zakaryah . <zakaryah at gmail.com<mailto:zakaryah at gmail.com>> wrote:

Oh, maybe I can answer my own question - for BRGN, is it the subsolver that has a KSP? Then I would call

TaoBRGNGetSubsolver(tao, &subsolver);
TaoGetKSP(subsolver,&ksp);

?

On Wed, Jun 10, 2020 at 2:52 PM zakaryah . <zakaryah at gmail.com<mailto:zakaryah at gmail.com>> wrote:
Hi Alp,

Thanks very much for this thorough answer. I understand the situation much better now.

I will experiment with quasi-Newton methods - this should be straightforward to test and evaluate. I agree that the regularized Gauss-Newton may be useful in place of the Levenberg Marquardt per se. I am having some trouble understanding how it is set up and how to use it for my purposes.

Using BRGN, does the Tao solver itself not have a KSP? After setting up the solver, running TaoGetKSP returns a NULL KSP. It seems I am doing something wrong, but with other Tao types, like nls, everything seems to work fine.

Thanks for your help!

Cheers, Zak


On Wed, Jun 10, 2020 at 11:29 AM Dener, Alp <adener at anl.gov<mailto:adener at anl.gov>> wrote:
Hello,

STCG is being used to compute a search direction by inverting the Hessian of the objective onto the gradient. The Hessian has to be positive definitive for this search direction to be a valid descent direction. To enforce this, STCG terminates the KSP solution when it encounters negative curvature (i.e. “uphill” directions). The resulting search direction from this early termination is still a descent direction but it’s a pretty bad one, meaning that the line search is forced to do extra work to find a valid (and sometimes pretty small) step length. This is normal.

Once a negative curvature is detected, the NLS algorithm imposes a shift to the diagonal of the Hessian in order to guarantee that it will be positive-definite on the next optimization iteration. This diagonal shift is increased in magnitude if you keep hitting the negative curvature KSP termination, and it’s gradually decreased on iterations where the KSP solution succeeds. However this does not prevent the line search from doing all that extra work on iterations where the search direction solution never converged fully. It simply helps generate better search directions in subsequent iterations and helps the overall optimization solution eventually converge to a valid minimum.

As a side note, if your function, gradient and/or Hessian evaluations are expensive, it might actually make more sense to use the quasi-Newton method (BQNLS) instead of Newton line search. The optimization solution is likely to take more overall iterations with BQNLS(-tao_type bqnls), but the BFGS approximation used by the algorithm is inherently guaranteed to be positive-definite and tends to generate very well scaled search directions so it might help the line search do less work. Combined with the elimination of Hessian evaluations, it may very well solve your problem faster in CPU time even if it takes more nonlinear iterations overall. This is technically a bound constrained method but it solves unconstrained problems too when you don’t define any bounds.

About Levenberg-Marquardt: a user started the branch to eventually contribute an LM solver, but I have not heard any updates on it since end of April. For least-squares type problems, you can try using the regularized Gauss-Newton solver (-tao_type brgn). The problem definition interface is a bit different. BRGN requires the problem to be defined as a residual and its Jacobian, and it will assemble the gradient and the Hessian on its own and feed it into the standard Newton line search solver underneath. There are also a few different regularization options available, like proximal point Tikhonov (l2prox) and a sparsity term (l1dict) that can be set with the (-tao_brgn_regularization_type) option. These get automatically cooked into the gradient and Hessian. Combined with the automatic diagonal shifts for the Hessian, BRGN really is almost equivalent to an LM solver so it might do what you need.

Hope this helps!
—
Alp

On Jun 9, 2020, at 10:55 PM, zakaryah . <zakaryah at gmail.com<mailto:zakaryah at gmail.com>> wrote:

I am using TAO to minimize the elastic strain energy for somewhat complicated applied forces. With Newton linesearch (NLS), the STCG KSP, and several preconditioners (none, Jacobi, lmvm), the solver finds a minimum within an acceptable number of iterations (<50). Still, in the interest of performance, I am wondering about what happens after the KSP encounters an indefinite matrix (KSP_CONVERGED_CG_NEG_CURVE). After this happens, the line search performs extremely poorly, with function values at step length 1 reaching absurdly large values. As the line search tries smaller step lengths, the function values fall, then typically reach values representing a decline from the previous iteration, but only with tiny step lengths (~1e-11). The line search takes many iterations to arrive at such an improvement. Here is an example, run with  -tao_type nls -tao_nls_ksp_type stcg -tao_nls_pc_type none -tao_nls_ksp_norm_type unpreconditioned:

  0 TAO,  Function value: 0.160612,  Residual: 0.0736074
  0 TAO,  Function value: 0.121568,  Residual: 0.0424117
    Residual norms for tao_nls_ solve.
    0 KSP Residual norm 4.241174778983e-02
    1 KSP Residual norm 5.806891169509e-02
    2 KSP Residual norm 6.492953448014e-02
    3 KSP Residual norm 5.856559430984e-02
    4 KSP Residual norm 5.262548559710e-02
    5 KSP Residual norm 4.863633473400e-02
    6 KSP Residual norm 4.725156347026e-02
    7 KSP Residual norm 4.748458009319e-02
    8 KSP Residual norm 4.885339641711e-02
    9 KSP Residual norm 5.065071226354e-02
   10 KSP Residual norm 5.085544070851e-02
   11 KSP Residual norm 5.127093547976e-02
   12 KSP Residual norm 5.155225884843e-02
   13 KSP Residual norm 5.219895021408e-02
   14 KSP Residual norm 6.480520610077e-02
   15 KSP Residual norm 1.433515456621e-01
  Linear tao_nls_ solve converged due to CONVERGED_CG_NEG_CURVE iterations 16
    0 LS    Function value: 0.121568,    Step length: 0.
    1 LS    Function value: 5.99839e+59,    Step length: 1.
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0., fy: 0.121568, dgy: -1.32893e+08
    2 LS    Function value: 1.46445e+56,    Step length: 0.25
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 1., fy: 5.99839e+59, dgy: 3.59903e+60
    3 LS    Function value: 3.57532e+52,    Step length: 0.0625
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.25, fy: 1.46445e+56, dgy: 3.51468e+57
    4 LS    Function value: 8.7288e+48,    Step length: 0.015625
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.0625, fy: 3.57532e+52, dgy: 3.4323e+54
    5 LS    Function value: 2.13105e+45,    Step length: 0.00390625
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.015625, fy: 8.7288e+48, dgy: 3.35186e+51
    6 LS    Function value: 5.20277e+41,    Step length: 0.000976562
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.00390625, fy: 2.13105e+45, dgy: 3.2733e+48
    7 LS    Function value: 1.27021e+38,    Step length: 0.000244141
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.000976562, fy: 5.20277e+41, dgy: 3.19658e+45
    8 LS    Function value: 3.1011e+34,    Step length: 6.10352e-05
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 0.000244141, fy: 1.27021e+38, dgy: 3.12166e+42
    9 LS    Function value: 7.57106e+30,    Step length: 1.52588e-05
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 6.10352e-05, fy: 3.1011e+34, dgy: 3.0485e+39
   10 LS    Function value: 1.84842e+27,    Step length: 3.8147e-06
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 1.52588e-05, fy: 7.57106e+30, dgy: 2.97706e+36
   11 LS    Function value: 4.51295e+23,    Step length: 9.53672e-07
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 3.8147e-06, fy: 1.84842e+27, dgy: 2.90731e+33
   12 LS    Function value: 1.10199e+20,    Step length: 2.38417e-07
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 9.53672e-07, fy: 4.51295e+23, dgy: 2.83927e+30
   13 LS    Function value: 2.69231e+16,    Step length: 5.96028e-08
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 2.38417e-07, fy: 1.10199e+20, dgy: 2.77314e+27
   14 LS    Function value: 6.59192e+12,    Step length: 1.48993e-08
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 5.96028e-08, fy: 2.69231e+16, dgy: 2.70974e+24
   15 LS    Function value: 1.62897e+09,    Step length: 3.72338e-09
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 1.48993e-08, fy: 6.59192e+12, dgy: 2.65254e+21
   16 LS    Function value: 421305.,    Step length: 9.29291e-10
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 3.72338e-09, fy: 1.62897e+09, dgy: 2.61626e+18
   17 LS    Function value: 141.939,    Step length: 2.30343e-10
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 9.29291e-10, fy: 421305., dgy: 2.67496e+15
   18 LS    Function value: 0.213621,    Step length: 5.45712e-11
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 2.30343e-10, fy: 141.939, dgy: 3.35874e+12
   19 LS    Function value: 0.120058,    Step length: 1.32286e-11
        stx: 0., fx: 0.121568, dgx: -1.32893e+08
        sty: 5.45712e-11, fy: 0.213621, dgy: 8.60977e+09
  1 TAO,  Function value: 0.120058,  Residual: 0.0484495

When the KSP does not encounter negative curvature, the linesearch performs well:
  3 TAO,  Function value: 0.118376,  Residual: 0.0545446
    Residual norms for tao_nls_ solve.
    0 KSP Residual norm 5.454463134947e-02
    1 KSP Residual norm 2.394277461960e-02
    2 KSP Residual norm 6.674182971627e-03
    3 KSP Residual norm 1.235116278289e-03
    4 KSP Residual norm 1.714792759324e-04
    5 KSP Residual norm 3.928769927518e-05
    6 KSP Residual norm 8.464174666739e-06
    7 KSP Residual norm 1.583763581407e-06
    8 KSP Residual norm 3.251685842746e-07
  Linear tao_nls_ solve converged due to CONVERGED_RTOL iterations 8
    0 LS    Function value: 0.118376,    Step length: 0.
    1 LS    Function value: 0.117884,    Step length: 1.
        stx: 0., fx: 0.118376, dgx: -0.000530697
        sty: 0., fy: 0.118376, dgy: -0.000530697

I have attached the full log for this particular solve. The points of negative curvature do not surprise me - this is a difficult problem, with many singularities in the Jacobian/Hessian. In fact, I am very happy with how well STCG is performing, globally. My question is what to make of the poor performance of the linesearch after encountering these points. As I understand it, most of the flags for adjusting the solver after encountering an indefinite matrix involve altering the magnitude of the perturbation by which the Hessian is offset for calculating the update direction. I am not clear on how to adjust the routine for determining the initial step size. More generally, should I expect these points of negative curvature to cause difficulties for the solver, and be satisfied with a large number of iterations before finding an improvement at tiny step size?

I have a loosely related question about the status of the Levenberg Marquardt solver within Tao? I remember hearing on the list that this was being worked on, but I am not sure from the branch description whether it is working or intended for general use. I would love to hear any updates on that, because I think it could be useful for my problem.

Thanks!
<Tao_log.txt>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200611/0ae7288b/attachment-0001.html>


More information about the petsc-users mailing list