Weird (?) results obtained from experiments with the SNESMF module

Rafael Santos Coelho rafaelsantoscoelho at gmail.com
Fri Dec 19 12:27:20 CST 2008


Hello to everyone,

Weeks ago I did some experiments with PETSc's SNESMF module using the Bratu
nonlinear PDE in
2D<http://www.mcs.anl.gov/petsc/petsc-2/snapshots/petsc-current/src/snes/examples/tutorials/ex5.c.html>example.
I ran a bunch of tests (varying the number of processors and the
mesh size) with the "-log_summary" command-line option to collect several
measures, namely: max runtime, total memory usage, floating point
operations, linear iterations count and MPI messages sent. My intention was
to compare SNESMF (the jacobian-free Newton-Krylov method, which I will call
JFNK) with SNES ("simple" Newton-Krylov method, which I will call NK). In
theory, we basically know that:


   1. JFNK is supposed to consume less memory than NK since the true
   elements of the jacobian matrix are never actually stored;
   2. JFNK is supposed to do less or roughly the same amount of floating
   point operations than NK since it does not calculate the jacobian matrix
   entries in every Newton iteration.

Well, I have to admit that I was pretty surprised by the results. Shortly,
except in few cases, NK outperformed JFNK with regard to each one of the
metrics mentioned above. Before anything, some clarifications:


   1. Each test was repeated 5 times and I summarized the results using the
   arithmetic mean in order to attenuate possible fluctuations;
   2. The tests were run on a Beowulf cluster with 30 processing nodes, and
   each node is a Intel(R) Core(TM)2 CPU 4300 1.80GHz with 2MB of cache, 2GB of
   RAM, Fedora Core 6 GNU/Linux (kernel version 2.6.26.6-49), PETSc version
   2.3.3-p15 and LAM/MPI version 6.41;
   3. The Walker-Pernice<http://www.dcsc.sdu.dk/docs/PETSC/src/snes/mf/wp.c.html>formula
was chosen to compute the differencing parameter "h" used with the
   finite difference based matrix-free jacobian;
   4. No preconditioners or matrix reorderings were employed.

Now here come my questions:


   - How come JFNK used more memory than NK? Looking at the log files at the
   end of this message, there were 4 matrix creations in JFNK. Why 4?! And also
   why did JFNK create one vector more (46) than NK (45)? Where does that
   particular vector come from? Is that vector the h parameter history? How can
   I figure that out?
   - Why did JFNK perform worse than NK in terms of max runtime, total
   linear iterations count, MPI messages sent, MPI reductions and flops?

My answer: JFNK had a much higher linear iterations count than NK, and
that's probably why it turned out to be worse than NK in all those aspects.
As for the MPI reductions, I believe that JFNK topped NK because of all the
vector norm operations needed to compute the "h" parameter.


   - Why did JFNK's KSP solver (GMRES) iterate way more per Newton iteration
   than NK?

My answer: That probably has to do with the fact that JFNK approximately
computes the product J(x)a. If the precision of that approximation is poor
(and that's closely related to which approach was used to calculate the
differencing parameter, I think), then the linear solver must iterate more
to produce a "good" Newton correction.

I'd really appreciate if you guys could comment on my observations and
conclusions and help me out with my questions. I've pasted below some
excerpts of two log files generated by the "-log_summary" option.

Thanks in advance,

Rafael

--------------------------------------------------------------------------------------------------------
NK
KSP SOLVER: GMRES
MESH SIZE : 512 x 512 unknows
NUMBER OF PROCESSORS : 24

Linear solve converged due to CONVERGED_RTOL iterations 25038
Linear solve converged due to CONVERGED_RTOL iterations 25995
Linear solve converged due to CONVERGED_RTOL iterations 26769
Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE

                                          Max       Max/Min
Avg             Total
*Time (sec):                    4.597e+02    1.00008      4.597e+02*
Objects:                        6.200e+01    1.00000      6.200e+01
*Flops:                           6.553e+10    1.01230      6.501e+10
1.560e+12*
Flops/sec:                     1.425e+08    1.01233      1.414e+08
3.394e+09
*Memory:                       4.989e+06    1.01590
1.186e+08*
*MPI Messages:             3.216e+05    2.00000       2.546e+05    6.111e+06
*
MPI Message Lengths:  2.753e+08    2.00939       8.623e+02    5.269e+09
*MPI Reductions:            6.704e+03    1.00000*

Object Type      Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

SNES                        1              1            124          0
Krylov Solver              1              1           16880        0
Preconditioner            1              1             0             0
Distributed array         1              1           46568        0
Index Set                   6              6          135976       0
*Vec                          45             45        3812684      0*
Vec Scatter               3              3              0             0
IS L to G Mapping      1              1           45092         0
*Matrix                       3              3          1011036       0*
--------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------
JFNK
KSP SOLVER: GMRES
MESH SIZE : 512 x 512 unknows
NUMBER OF PROCESSORS : 24

Linear solve converged due to CONVERGED_RTOL iterations 25042
Linear solve converged due to CONVERGED_RTOL iterations 33804
Linear solve converged due to CONVERGED_RTOL iterations 33047
Linear solve converged due to CONVERGED_RTOL iterations 21219
Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE

                                        Max           Max/Min
Avg              Total
*Time (sec):                   1.076e+03      1.00009     1.076e+03*
Objects:                       6.500e+01      1.00000     6.500e+01
*Flops:                          1.044e+11      1.01176     1.036e+11
2.485e+12*
Flops/sec:                    9.702e+07      1.01185     9.626e+07
2.310e+09
*Memory:                       5.076e+06
1.01530                         1.207e+08*
*MPI Messages:             4.676e+05      2.00000     3.702e+05
8.884e+06*
MPI Message Lengths:  4.002e+08      2.00939     8.623e+02     7.661e+09
*MPI Reductions:            9.901e+03      1.00000*

Object Type       Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

SNES                        1              1               124             0
Krylov Solver               1              1             16880           0
Preconditioner             1              1                0               0
Distributed array          1              1            46568            0
Index Set                    6              6            135976          0
*Vec                           46             46          3901252         0*
Vec Scatter                3              3                0
0
IS L to G Mapping       1              1             45092            0
MatMFFD                   1              1                0                0
*Matrix                        4              4           1011036
0*
--------------------------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20081219/4417f117/attachment.htm>


More information about the petsc-users mailing list