<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>Dear all,</p>

    <p>    I am trying to optimize the nonlinear solvers in a code of

      mine, but I am having a hard time at interpreting the profiling

      data from the SNES. In particular, if I run with <span style="font-family:monospace"><span style="color:#000000;background-color:#ffffff;">-snesCorr_snes_lag_jacobian

          5 -snesCorr_snes_linesearch_monitor -snesCorr_snes_</span>monitor

        -snesCorr_snes_linesearch_type basic -snesCorr_snes_view </span>I

      get, for all timesteps an output like<br>

    </p>

    <p><font face="monospace"> </font><span style="font-family:monospace"><span style="color:#000000;background-color:#ffffff;">0 SNES

          Function norm 2.204257292307e+00  </span><br>

         1 SNES Function norm 5.156376709750e-03  <br>

         2 SNES Function norm 9.399026338316e-05  <br>

         3 SNES Function norm 1.700505246874e-06  <br>

         4 SNES Function norm 2.938127043559e-08  <br>

        SNES Object: snesCorr (snesCorr_) 1 MPI process

        <br>

         type: newtonls

        <br>

         maximum iterations=50, maximum function evaluations=10000

        <br>

         tolerances: relative=1e-08, absolute=1e-50, solution=1e-08

        <br>

         total number of linear solver iterations=4

        <br>

         total number of function evaluations=5

        <br>

         norm schedule ALWAYS

        <br>

         Jacobian is rebuilt every 5 SNES iterations

        <br>

         SNESLineSearch Object: (snesCorr_) 1 MPI process

        <br>

           type: basic

        <br>

           maxstep=1.000000e+08, minlambda=1.000000e-12

        <br>

           tolerances: relative=1.000000e-08, absolute=1.000000e-15,

        lambda=1.000000e-08

        <br>

           maximum iterations=40

        <br>

         KSP Object: (snesCorr_) 1 MPI process

        <br>

           type: gmres

        <br>

             restart=30, using Classical (unmodified) Gram-Schmidt

        Orthogonalization with no iterative refinement

        <br>

             happy breakdown tolerance 1e-30

        <br>

           maximum iterations=10000, initial guess is zero

        <br>

           tolerances:  relative=1e-05, absolute=1e-50,

        divergence=10000.

        <br>

           left preconditioning

        <br>

           using PRECONDITIONED norm type for convergence test

        <br>

         PC Object: (snesCorr_) 1 MPI process

        <br>

           type: ilu

        <br>

             out-of-place factorization

        <br>

             0 levels of fill

        <br>

             tolerance for zero pivot 2.22045e-14

        <br>

             matrix ordering: natural

        <br>

             factor fill ratio given 1., needed 1.

        <br>

               Factored matrix follows:

        <br>

                 Mat Object: (snesCorr_) 1 MPI process

        <br>

                   type: seqaij

        <br>

                   rows=1200, cols=1200

        <br>

                   package used to perform factorization: petsc

        <br>

                   total: nonzeros=17946, allocated nonzeros=17946

        <br>

                     using I-node routines: found 400 nodes, limit used

        is 5

        <br>

           linear system matrix = precond matrix:

        <br>

           Mat Object: 1 MPI process

        <br>

             type: seqaij

        <br>

             rows=1200, cols=1200

        <br>

             total: nonzeros=17946, allocated nonzeros=17946

        <br>

             total number of mallocs used during MatSetValues calls=0

        <br>

               using I-node routines: found 400 nodes, limit used is 5<br>

      </span></p>

    <p>I guess that this means that no linesearch is performed and the

      full Newton step is always performed (I did not report the full

      output, but all timesteps are alike). Also, with the default (bt)

      LineSearch, the total CPU time does not change, which seems in

      line with this.<br>

    </p>

    <p>However, I'd have expected that the time spent in SNESLineSearch

      would be negligible, but the flamegraph is showing that about 38%

      of the time spent by SNESSolve is actually spent in

      SNESLineSearch. Furthermore, SNESLineSearch seems to cause more

      SNESFunction evaluations (in terms of CPU time) than the SNESSolve

      itself. The flamegraph is attached.<br>

    </p>

    <p>Could some expert help me in understanding these data? Is the

      LineSearch actually performing the newton step? Given that the

      full step is always taken, can the SNESFunction evaluations from

      the LineSearch be skipped?<br>

    </p>

    <p>Thanks a lot!<br>

    </p>

    <p>Matteo</p>

  </body>

</html>