[petsc-users] interpreting data from SNESSolve profiling

Wed Feb 8 06:56:23 CST 2023

Dear all,

     I am trying to optimize the nonlinear solvers in a code of mine, 
but I am having a hard time at interpreting the profiling data from the 
SNES. In particular, if I run with -snesCorr_snes_lag_jacobian 5 
-snesCorr_snes_linesearch_monitor -snesCorr_snes_monitor 
-snesCorr_snes_linesearch_type basic -snesCorr_snes_view I get, for all 
timesteps an output like

0 SNES Function norm 2.204257292307e+00
  1 SNES Function norm 5.156376709750e-03
  2 SNES Function norm 9.399026338316e-05
  3 SNES Function norm 1.700505246874e-06
  4 SNES Function norm 2.938127043559e-08
SNES Object: snesCorr (snesCorr_) 1 MPI process
  type: newtonls
  maximum iterations=50, maximum function evaluations=10000
  tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
  total number of linear solver iterations=4
  total number of function evaluations=5
  norm schedule ALWAYS
  Jacobian is rebuilt every 5 SNES iterations
  SNESLineSearch Object: (snesCorr_) 1 MPI process
    type: basic
    maxstep=1.000000e+08, minlambda=1.000000e-12
    tolerances: relative=1.000000e-08, absolute=1.000000e-15, 
lambda=1.000000e-08
    maximum iterations=40
  KSP Object: (snesCorr_) 1 MPI process
    type: gmres
      restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
      happy breakdown tolerance 1e-30
    maximum iterations=10000, initial guess is zero
    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
    left preconditioning
    using PRECONDITIONED norm type for convergence test
  PC Object: (snesCorr_) 1 MPI process
    type: ilu
      out-of-place factorization
      0 levels of fill
      tolerance for zero pivot 2.22045e-14
      matrix ordering: natural
      factor fill ratio given 1., needed 1.
        Factored matrix follows:
          Mat Object: (snesCorr_) 1 MPI process
            type: seqaij
            rows=1200, cols=1200
            package used to perform factorization: petsc
            total: nonzeros=17946, allocated nonzeros=17946
              using I-node routines: found 400 nodes, limit used is 5
    linear system matrix = precond matrix:
    Mat Object: 1 MPI process
      type: seqaij
      rows=1200, cols=1200
      total: nonzeros=17946, allocated nonzeros=17946
      total number of mallocs used during MatSetValues calls=0
        using I-node routines: found 400 nodes, limit used is 5

I guess that this means that no linesearch is performed and the full 
Newton step is always performed (I did not report the full output, but 
all timesteps are alike). Also, with the default (bt) LineSearch, the 
total CPU time does not change, which seems in line with this.

However, I'd have expected that the time spent in SNESLineSearch would 
be negligible, but the flamegraph is showing that about 38% of the time 
spent by SNESSolve is actually spent in SNESLineSearch. Furthermore, 
SNESLineSearch seems to cause more SNESFunction evaluations (in terms of 
CPU time) than the SNESSolve itself. The flamegraph is attached.

Could some expert help me in understanding these data? Is the LineSearch 
actually performing the newton step? Given that the full step is always 
taken, can the SNESFunction evaluations from the LineSearch be skipped?

Thanks a lot!

Matteo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230208/14f71c48/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flame.svg
Type: image/svg+xml
Size: 35538 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230208/14f71c48/attachment-0001.svg>