[petsc-users] TAO: Finite Difference vs Continuous Adjoint gradient issues

Thu Nov 23 04:16:35 CST 2017

It was indeed a mass scaling issue. We have to project the CADJ derived 
gradient to the corresponding FE space again.

Testing hand-coded gradient (hc) against finite difference gradient 
(fd), if the ratio ||fd - hc|| / ||hc|| is
0 (1.e-8), the hand-coded gradient is probably correct.
Run with -tao_test_display to show difference
between hand-coded and finite difference gradient.
||fd|| 0.000150841, ||hc|| = 0.000150841, angle cosine = 
(fd'hc)/||fd||||hc|| = 1.
2-norm ||fd-hc||/max(||hc||,||fd||) = 4.48554e-06, difference ||fd-hc|| 
= 6.76604e-10
max-norm ||fd-hc||/max(||hc||,||fd||) = 4.99792e-06, difference 
||fd-hc|| = 1.88044e-10
||fd|| 0.000386312, ||hc|| = 0.000386312, angle cosine = 
(fd'hc)/||fd||||hc|| = 1.
2-norm ||fd-hc||/max(||hc||,||fd||) = 1.14682e-05, difference ||fd-hc|| 
= 4.4303e-09
max-norm ||fd-hc||/max(||hc||,||fd||) = 1.56645e-05, difference 
||fd-hc|| = 1.49275e-09
||fd|| 8.46797e-05, ||hc|| = 8.46797e-05, angle cosine = 
(fd'hc)/||fd||||hc|| = 1.
2-norm ||fd-hc||/max(||hc||,||fd||) = 2.63488e-06, difference ||fd-hc|| 
= 2.2312e-10
max-norm ||fd-hc||/max(||hc||,||fd||) = 2.7873e-06, difference ||fd-hc|| 
= 5.58718e-11

Thank you all for the quick responses and input again!

On 2017-11-23 09:29, Julian Andrej wrote:
> On 2017-11-22 16:27, Emil Constantinescu wrote:
>> On 11/22/17 3:48 AM, Julian Andrej wrote:
>>> Hello,
>>> 
>>> we prepared a small example which computes the gradient via the 
>>> continuous adjoint method of a heating problem with a cost 
>>> functional.
>>> 
>>> We implemented the text book example and tested the gradient via a 
>>> Taylor Remainder (which works fine). Now we wanted to solve the
>>> optimization problem with TAO and checked the gradient vs. the finite 
>>> difference gradient and run into problems.
>>> 
>>> Testing hand-coded gradient (hc) against finite difference gradient 
>>> (fd), if the ratio ||fd - hc|| / ||hc|| is
>>> 0 (1.e-8), the hand-coded gradient is probably correct.
>>> Run with -tao_test_display to show difference
>>> between hand-coded and finite difference gradient.
>>> ||fd|| 0.000147076, ||hc|| = 0.00988136, angle cosine = 
>>> (fd'hc)/||fd||||hc|| = 0.99768
>>> 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| 
>>> = 0.00973464
>>> max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985149, difference 
>>> ||fd-hc|| = 0.00243363
>>> ||fd|| 0.000382547, ||hc|| = 0.0257001, angle cosine = 
>>> (fd'hc)/||fd||||hc|| = 0.997609
>>> 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985151, difference ||fd-hc|| 
>>> = 0.0253185
>>> max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985117, difference 
>>> ||fd-hc|| = 0.00624562
>>> ||fd|| 8.84429e-05, ||hc|| = 0.00594196, angle cosine = 
>>> (fd'hc)/||fd||||hc|| = 0.997338
>>> 2-norm ||fd-hc||/max(||hc||,||fd||) = 0.985156, difference ||fd-hc|| 
>>> = 0.00585376
>>> max-norm ||fd-hc||/max(||hc||,||fd||) = 0.985006, difference 
>>> ||fd-hc|| = 0.00137836
>>> 
>>> Despite these differences we achieve convergence with our hand coded 
>>> gradient, but have to use -tao_ls_type unit.
>> 
>> Both give similar (assume descent) directions, but seem to be scaled
>> differently. It could be a bad scaling by the mass matrix somewhere in
>> the continuous adjoint. This could be seen if you plot them side by
>> side as a quick diagnostic.
>> 
> 
> I visualized and attached the two gradients. The CADJ is hand coded and
> the DADJ is from pyadjoint which is the same as the finite difference
> gradient from TAO.
> 
> If the attachement gets lost in the mailing list,, here is a direct 
> link [1]
> 
> [1] https://cloud.tf.uni-kiel.de/index.php/s/nmiNOoI213dx1L1
> 
>> Emil
>> 
>>> $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor 
>>> -tao_gatol 1e-7 -tao_ls_type unit
>>> iter =   0, Function value: 0.000316722,  Residual: 0.00126285
>>> iter =   1, Function value: 3.82272e-05,  Residual: 0.000438094
>>> iter =   2, Function value: 1.26011e-07,  Residual: 8.4194e-08
>>> Tao Object: 1 MPI processes
>>>    type: blmvm
>>>        Gradient steps: 0
>>>    TaoLineSearch Object: 1 MPI processes
>>>      type: unit
>>>    Active Set subset type: subvec
>>>    convergence tolerances: gatol=1e-07,   steptol=0.,   gttol=0.
>>>    Residual in Function/Gradient:=8.4194e-08
>>>    Objective value=1.26011e-07
>>>    total number of iterations=2,                          (max: 2000)
>>>    total number of function/gradient evaluations=3,      (max: 4000)
>>>    Solution converged:    ||g(X)|| <= gatol
>>> 
>>> $ python heat_adj.py -tao_type blmvm -tao_view -tao_monitor 
>>> -tao_fd_gradient
>>> iter =   0, Function value: 0.000316722,  Residual: 4.87343e-06
>>> iter =   1, Function value: 0.000195676,  Residual: 3.83011e-06
>>> iter =   2, Function value: 1.26394e-07,  Residual: 1.60262e-09
>>> Tao Object: 1 MPI processes
>>>    type: blmvm
>>>        Gradient steps: 0
>>>    TaoLineSearch Object: 1 MPI processes
>>>      type: more-thuente
>>>    Active Set subset type: subvec
>>>    convergence tolerances: gatol=1e-08,   steptol=0.,   gttol=0.
>>>    Residual in Function/Gradient:=1.60262e-09
>>>    Objective value=1.26394e-07
>>>    total number of iterations=2,                          (max: 2000)
>>>    total number of function/gradient evaluations=3474,      (max: 
>>> 4000)
>>>    Solution converged:    ||g(X)|| <= gatol
>>> 
>>> 
>>> We think, that the finite difference gradient should be in line with 
>>> our hand coded gradient for such a simple example.
>>> 
>>> We appreciate any hints on debugging this issue. It is implemented in 
>>> python (firedrake) and i can provide the code if this is needed.
>>> 
>>> Regards
>>> Julian