[petsc-users] TAO STCG initial perturbation
zakaryah .
zakaryah at gmail.com
Thu Jun 11 12:10:07 CDT 2020
Hi Alp,
Thanks for the help. Quasi-Newton seems promising - the Tao solver
eventually converges, sometimes after hundreds or even thousands of
iterations, with each iterate proceeding very quickly thanks to not
evaluating the Hessian. I have only tried this with the problem set up as a
general optimization, i.e., not a least-squares problem per se.
Your help got me started with testing BRGN, which I'd like to explore more.
Can you give me some advice on preconditioners for the subsolver? I guess
that the subsolver uses a shell matrix to calculate elements of J^T J from
the Jacobian J. I have good preconditioners for J, but I'm not sure what to
do in order to apply them to the shell matrix.
Thanks, Zak
On Wed, Jun 10, 2020 at 3:10 PM Dener, Alp <adener at anl.gov> wrote:
> Hi Zak,
>
> You got it right with the TaoBRGNGetSubsolver -> TaoGetKSP workflow. This
> will get you the KSP object correctly.
>
> BRGN is not a stand-alone solver. It’s a wrapper that combines the
> user-provided residual and Jacobian callbacks to assemble the gradient and
> Hessian under the Gauss-Newton formulation, and feed that information into
> a standard TAO solver like NLS. It also incorporates some regularizations
> to the problem too. Technically any user could also do this themselves and
> directly use NLS to solve a least squares problem, but BRGN reduces the
> workload a little bit.
>
> To use it for your least-squares problem, you would need to set TaoType()
> to TAOBRGN and then provide the Tao solver two callbacks: one for the
> residual using TaoSetResidualRoutine() and another for the Jacobian using
> TaoSetJacobianResidualRoutine(). The regularization can be changed using
> -tao_brgn_regularization_type <none,l2pure,l2prox,l1dict> command line
> options.
>
> Note that the BRGN sub solver can also be changed. It’s set to BNLS by
> default — this is bound constrained Newton line search but it also solves
> unconstrained problems when no bounds are defined. You could always try to
> use BNTR for a trust-region method instead of line search, or switch to
> BNQLS to do a quasi-Gauss-Newton solution. The sub solver’s settings can be
> accessed via the -tao_brgn_subsolver_ prefix. So for instance if you wanted
> to change type, you could do "-tao_brgn_subsolver_tao_type bntr" for Newton
> trust-region.
>
> Alp
>
> On Jun 10, 2020, at 1:55 PM, zakaryah . <zakaryah at gmail.com> wrote:
>
> Oh, maybe I can answer my own question - for BRGN, is it the subsolver
> that has a KSP? Then I would call
>
> TaoBRGNGetSubsolver(tao, &subsolver);
> TaoGetKSP(subsolver,&ksp);
>
> ?
>
> On Wed, Jun 10, 2020 at 2:52 PM zakaryah . <zakaryah at gmail.com> wrote:
>
>> Hi Alp,
>>
>> Thanks very much for this thorough answer. I understand the situation
>> much better now.
>>
>> I will experiment with quasi-Newton methods - this should be
>> straightforward to test and evaluate. I agree that the regularized
>> Gauss-Newton may be useful in place of the Levenberg Marquardt per se. I am
>> having some trouble understanding how it is set up and how to use it for my
>> purposes.
>>
>> Using BRGN, does the Tao solver itself not have a KSP? After setting up
>> the solver, running TaoGetKSP returns a NULL KSP. It seems I am doing
>> something wrong, but with other Tao types, like nls, everything seems to
>> work fine.
>>
>> Thanks for your help!
>>
>> Cheers, Zak
>>
>>
>> On Wed, Jun 10, 2020 at 11:29 AM Dener, Alp <adener at anl.gov> wrote:
>>
>>> Hello,
>>>
>>> STCG is being used to compute a search direction by inverting the
>>> Hessian of the objective onto the gradient. The Hessian has to be positive
>>> definitive for this search direction to be a valid descent direction. To
>>> enforce this, STCG terminates the KSP solution when it encounters negative
>>> curvature (i.e. “uphill” directions). The resulting search direction from
>>> this early termination is still a descent direction but it’s a pretty bad
>>> one, meaning that the line search is forced to do extra work to find a
>>> valid (and sometimes pretty small) step length. This is normal.
>>>
>>> Once a negative curvature is detected, the NLS algorithm imposes a shift
>>> to the diagonal of the Hessian in order to guarantee that it will be
>>> positive-definite on the next optimization iteration. This diagonal shift
>>> is increased in magnitude if you keep hitting the negative curvature KSP
>>> termination, and it’s gradually decreased on iterations where the KSP
>>> solution succeeds. However this does not prevent the line search from doing
>>> all that extra work on iterations where the search direction solution never
>>> converged fully. It simply helps generate better search directions in
>>> subsequent iterations and helps the overall optimization solution
>>> eventually converge to a valid minimum.
>>>
>>> As a side note, if your function, gradient and/or Hessian evaluations
>>> are expensive, it might actually make more sense to use the quasi-Newton
>>> method (BQNLS) instead of Newton line search. The optimization solution is
>>> likely to take more overall iterations with BQNLS(-tao_type bqnls), but the
>>> BFGS approximation used by the algorithm is inherently guaranteed to be
>>> positive-definite and tends to generate very well scaled search directions
>>> so it might help the line search do less work. Combined with the
>>> elimination of Hessian evaluations, it may very well solve your problem
>>> faster in CPU time even if it takes more nonlinear iterations overall. This
>>> is technically a bound constrained method but it solves unconstrained
>>> problems too when you don’t define any bounds.
>>>
>>> About Levenberg-Marquardt: a user started the branch to eventually
>>> contribute an LM solver, but I have not heard any updates on it since end
>>> of April. For least-squares type problems, you can try using the
>>> regularized Gauss-Newton solver (-tao_type brgn). The problem definition
>>> interface is a bit different. BRGN requires the problem to be defined as a
>>> residual and its Jacobian, and it will assemble the gradient and the
>>> Hessian on its own and feed it into the standard Newton line search solver
>>> underneath. There are also a few different regularization options
>>> available, like proximal point Tikhonov (l2prox) and a sparsity term
>>> (l1dict) that can be set with the (-tao_brgn_regularization_type) option.
>>> These get automatically cooked into the gradient and Hessian. Combined with
>>> the automatic diagonal shifts for the Hessian, BRGN really is almost
>>> equivalent to an LM solver so it might do what you need.
>>>
>>> Hope this helps!
>>> —
>>> Alp
>>>
>>> On Jun 9, 2020, at 10:55 PM, zakaryah . <zakaryah at gmail.com> wrote:
>>>
>>> I am using TAO to minimize the elastic strain energy for somewhat
>>> complicated applied forces. With Newton linesearch (NLS), the STCG KSP, and
>>> several preconditioners (none, Jacobi, lmvm), the solver finds a minimum
>>> within an acceptable number of iterations (<50). Still, in the interest of
>>> performance, I am wondering about what happens after the KSP encounters an
>>> indefinite matrix (KSP_CONVERGED_CG_NEG_CURVE). After this happens, the
>>> line search performs extremely poorly, with function values at step length
>>> 1 reaching absurdly large values. As the line search tries smaller step
>>> lengths, the function values fall, then typically reach values representing
>>> a decline from the previous iteration, but only with tiny step lengths
>>> (~1e-11). The line search takes many iterations to arrive at such an
>>> improvement. Here is an example, run with -tao_type nls -tao_nls_ksp_type
>>> stcg -tao_nls_pc_type none -tao_nls_ksp_norm_type unpreconditioned:
>>>
>>> 0 TAO, Function value: 0.160612, Residual: 0.0736074
>>>
>>>
>>>
>>> 0 TAO, Function value: 0.121568, Residual: 0.0424117
>>>
>>>
>>>
>>> Residual norms for tao_nls_ solve.
>>>
>>>
>>>
>>> 0 KSP Residual norm 4.241174778983e-02
>>>
>>>
>>>
>>> 1 KSP Residual norm 5.806891169509e-02
>>>
>>>
>>>
>>> 2 KSP Residual norm 6.492953448014e-02
>>>
>>>
>>>
>>> 3 KSP Residual norm 5.856559430984e-02
>>>
>>>
>>>
>>> 4 KSP Residual norm 5.262548559710e-02
>>>
>>>
>>>
>>> 5 KSP Residual norm 4.863633473400e-02
>>>
>>>
>>>
>>> 6 KSP Residual norm 4.725156347026e-02
>>>
>>>
>>>
>>> 7 KSP Residual norm 4.748458009319e-02
>>>
>>>
>>>
>>> 8 KSP Residual norm 4.885339641711e-02
>>>
>>>
>>>
>>> 9 KSP Residual norm 5.065071226354e-02
>>>
>>>
>>>
>>> 10 KSP Residual norm 5.085544070851e-02
>>>
>>>
>>>
>>> 11 KSP Residual norm 5.127093547976e-02
>>>
>>>
>>>
>>> 12 KSP Residual norm 5.155225884843e-02
>>>
>>>
>>>
>>> 13 KSP Residual norm 5.219895021408e-02
>>>
>>>
>>>
>>> 14 KSP Residual norm 6.480520610077e-02
>>>
>>>
>>>
>>> 15 KSP Residual norm 1.433515456621e-01
>>>
>>>
>>>
>>> Linear tao_nls_ solve converged due to CONVERGED_CG_NEG_CURVE
>>> iterations 16
>>>
>>>
>>> 0 LS Function value: 0.121568, Step length: 0.
>>>
>>>
>>>
>>> 1 LS Function value: 5.99839e+59, Step length: 1.
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 0., fy: 0.121568, dgy: -1.32893e+08
>>> 2 LS Function value: 1.46445e+56, Step length: 0.25
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 1., fy: 5.99839e+59, dgy: 3.59903e+60
>>>
>>>
>>>
>>> 3 LS Function value: 3.57532e+52, Step length: 0.0625
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 0.25, fy: 1.46445e+56, dgy: 3.51468e+57
>>>
>>>
>>>
>>> 4 LS Function value: 8.7288e+48, Step length: 0.015625
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 0.0625, fy: 3.57532e+52, dgy: 3.4323e+54
>>>
>>>
>>>
>>> 5 LS Function value: 2.13105e+45, Step length: 0.00390625
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 0.015625, fy: 8.7288e+48, dgy: 3.35186e+51
>>>
>>>
>>>
>>> 6 LS Function value: 5.20277e+41, Step length: 0.000976562
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 0.00390625, fy: 2.13105e+45, dgy: 3.2733e+48
>>>
>>>
>>>
>>> 7 LS Function value: 1.27021e+38, Step length: 0.000244141
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 0.000976562, fy: 5.20277e+41, dgy: 3.19658e+45
>>>
>>>
>>>
>>> 8 LS Function value: 3.1011e+34, Step length: 6.10352e-05
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 0.000244141, fy: 1.27021e+38, dgy: 3.12166e+42
>>>
>>>
>>>
>>> 9 LS Function value: 7.57106e+30, Step length: 1.52588e-05
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 6.10352e-05, fy: 3.1011e+34, dgy: 3.0485e+39
>>>
>>>
>>>
>>> 10 LS Function value: 1.84842e+27, Step length: 3.8147e-06
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 1.52588e-05, fy: 7.57106e+30, dgy: 2.97706e+36
>>>
>>>
>>>
>>> 11 LS Function value: 4.51295e+23, Step length: 9.53672e-07
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 3.8147e-06, fy: 1.84842e+27, dgy: 2.90731e+33
>>>
>>>
>>>
>>> 12 LS Function value: 1.10199e+20, Step length: 2.38417e-07
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 9.53672e-07, fy: 4.51295e+23, dgy: 2.83927e+30
>>>
>>>
>>>
>>> 13 LS Function value: 2.69231e+16, Step length: 5.96028e-08
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 2.38417e-07, fy: 1.10199e+20, dgy: 2.77314e+27
>>>
>>>
>>>
>>> 14 LS Function value: 6.59192e+12, Step length: 1.48993e-08
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 5.96028e-08, fy: 2.69231e+16, dgy: 2.70974e+24
>>>
>>>
>>>
>>> 15 LS Function value: 1.62897e+09, Step length: 3.72338e-09
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 1.48993e-08, fy: 6.59192e+12, dgy: 2.65254e+21
>>>
>>>
>>>
>>> 16 LS Function value: 421305., Step length: 9.29291e-10
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 3.72338e-09, fy: 1.62897e+09, dgy: 2.61626e+18
>>>
>>>
>>>
>>> 17 LS Function value: 141.939, Step length: 2.30343e-10
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 9.29291e-10, fy: 421305., dgy: 2.67496e+15
>>>
>>>
>>>
>>> 18 LS Function value: 0.213621, Step length: 5.45712e-11
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>>
>>>
>>>
>>> sty: 2.30343e-10, fy: 141.939, dgy: 3.35874e+12
>>>
>>>
>>>
>>> 19 LS Function value: 0.120058, Step length: 1.32286e-11
>>>
>>>
>>>
>>> stx: 0., fx: 0.121568, dgx: -1.32893e+08
>>> sty: 5.45712e-11, fy: 0.213621, dgy: 8.60977e+09
>>> 1 TAO, Function value: 0.120058, Residual: 0.0484495
>>>
>>> When the KSP does not encounter negative curvature, the linesearch
>>> performs well:
>>> 3 TAO, Function value: 0.118376, Residual: 0.0545446
>>>
>>>
>>>
>>> Residual norms for tao_nls_ solve.
>>>
>>>
>>>
>>> 0 KSP Residual norm 5.454463134947e-02
>>>
>>>
>>>
>>> 1 KSP Residual norm 2.394277461960e-02
>>>
>>>
>>>
>>> 2 KSP Residual norm 6.674182971627e-03
>>>
>>>
>>>
>>> 3 KSP Residual norm 1.235116278289e-03
>>>
>>>
>>>
>>> 4 KSP Residual norm 1.714792759324e-04
>>>
>>>
>>>
>>> 5 KSP Residual norm 3.928769927518e-05
>>>
>>>
>>>
>>> 6 KSP Residual norm 8.464174666739e-06
>>>
>>>
>>>
>>> 7 KSP Residual norm 1.583763581407e-06
>>>
>>>
>>>
>>> 8 KSP Residual norm 3.251685842746e-07
>>>
>>>
>>>
>>> Linear tao_nls_ solve converged due to CONVERGED_RTOL iterations 8
>>>
>>>
>>>
>>> 0 LS Function value: 0.118376, Step length: 0.
>>>
>>>
>>>
>>> 1 LS Function value: 0.117884, Step length: 1.
>>>
>>>
>>>
>>> stx: 0., fx: 0.118376, dgx: -0.000530697
>>>
>>>
>>>
>>> sty: 0., fy: 0.118376, dgy: -0.000530697
>>>
>>> I have attached the full log for this particular solve. The points of
>>> negative curvature do not surprise me - this is a difficult problem, with
>>> many singularities in the Jacobian/Hessian. In fact, I am very happy with
>>> how well STCG is performing, globally. My question is what to make of the
>>> poor performance of the linesearch after encountering these points. As I
>>> understand it, most of the flags for adjusting the solver after
>>> encountering an indefinite matrix involve altering the magnitude of the
>>> perturbation by which the Hessian is offset for calculating the update
>>> direction. I am not clear on how to adjust the routine for determining the
>>> initial step size. More generally, should I expect these points of negative
>>> curvature to cause difficulties for the solver, and be satisfied with a
>>> large number of iterations before finding an improvement at tiny step size?
>>>
>>> I have a loosely related question about the status of the Levenberg
>>> Marquardt solver within Tao? I remember hearing on the list that this was
>>> being worked on, but I am not sure from the branch description whether it
>>> is working or intended for general use. I would love to hear any updates on
>>> that, because I think it could be useful for my problem.
>>>
>>> Thanks!
>>> <Tao_log.txt>
>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200611/be8b8cfe/attachment-0001.html>
More information about the petsc-users
mailing list