Hello to everyone,<br><br>Weeks ago I did some experiments with PETSc's SNESMF module using the <a href="http://www.mcs.anl.gov/petsc/petsc-2/snapshots/petsc-current/src/snes/examples/tutorials/ex5.c.html">Bratu nonlinear PDE in 2D</a> example. I ran a bunch of tests (varying the number of processors and the mesh size) with the "-log_summary" command-line option to collect several measures, namely: max runtime, total memory usage, floating point operations, linear iterations count and MPI messages sent. My intention was to compare SNESMF (the jacobian-free Newton-Krylov method, which I will call JFNK) with SNES ("simple" Newton-Krylov method, which I will call NK). In theory, we basically know that:<br>
<br><ol><li>JFNK is supposed to consume less memory than NK since the true elements of the jacobian matrix are never actually stored;</li><li>JFNK is supposed to do less or roughly the same amount of floating point operations than NK since it does not calculate the jacobian matrix entries in every Newton iteration.</li>
</ol>Well, I have to admit that I was pretty surprised by the results. Shortly, except in few cases, NK outperformed JFNK with regard to each one of the metrics mentioned above. Before anything, some clarifications:<br><br>
<ol><li>Each test was repeated 5 times and I summarized the results using the arithmetic mean in order to attenuate possible fluctuations;</li><li>The tests were run on a Beowulf cluster with 30 processing nodes, and each node is a Intel(R) Core(TM)2 CPU 4300 1.80GHz with 2MB of cache, 2GB of RAM, Fedora Core 6 GNU/Linux (kernel version 2.6.26.6-49), PETSc version 2.3.3-p15 and LAM/MPI version 6.41;</li>
<li>The <a href="http://www.dcsc.sdu.dk/docs/PETSC/src/snes/mf/wp.c.html">Walker-Pernice</a> formula was chosen to compute the differencing parameter "h" used with the finite difference based matrix-free jacobian;</li>
<li>No preconditioners or matrix reorderings were employed.<br></li></ol>Now here come my questions:<br><br><ul><li>How come JFNK used more memory than NK? Looking at the log files at the end of this message, there were 4 matrix creations
in JFNK. Why 4?! And also why did JFNK create one vector more (46) than
NK (45)? Where does that particular vector come from? Is that vector
the h parameter history? How can I figure that out?</li><li>Why did JFNK perform worse than NK in terms of max runtime, total linear iterations count, MPI messages sent, MPI reductions and flops?</li></ul>My answer: JFNK had a much higher linear iterations count than
NK, and that's probably why it turned out to be worse than NK in
all those aspects. As for the MPI reductions, I believe that JFNK topped NK because of all the vector norm
operations needed to compute the "h" parameter.<br><br><ul><li>Why did JFNK's KSP solver (GMRES) iterate way more per Newton iteration than NK?</li></ul>My answer: That probably has to do with the fact that JFNK approximately computes the product J(x)a. If the precision of that approximation is poor (and that's closely related to which approach was used to calculate the differencing parameter, I think), then the linear solver must iterate more to produce a "good" Newton correction.<br>
<br>I'd really appreciate if you guys could comment on my observations and conclusions and help me out with my questions. I've pasted below some excerpts of two log files generated by the "-log_summary" option.<br>
<br>Thanks in advance,<br><br>Rafael<br><br>--------------------------------------------------------------------------------------------------------<br>
NK<br>
KSP SOLVER: GMRES<br>
MESH SIZE : 512 x 512 unknows<br>
NUMBER OF PROCESSORS : 24<br>
<br>
Linear solve converged due to CONVERGED_RTOL iterations 25038<br>
Linear solve converged due to CONVERGED_RTOL iterations 25995<br>
Linear solve converged due to CONVERGED_RTOL iterations 26769<br>
Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE<br>
<br>
Max Max/Min Avg Total <br>
<b>Time (sec): 4.597e+02 1.00008 4.597e+02</b><br>
Objects: 6.200e+01 1.00000 6.200e+01<br>
<b>Flops: 6.553e+10 1.01230 6.501e+10 1.560e+12</b><br>
Flops/sec: 1.425e+08 1.01233 1.414e+08 3.394e+09<br>
<b>Memory: 4.989e+06 1.01590 1.186e+08</b><br>
<b>MPI Messages: 3.216e+05 2.00000 2.546e+05 6.111e+06</b><br>
MPI Message Lengths: 2.753e+08 2.00939 8.623e+02 5.269e+09<br>
<b>MPI Reductions: 6.704e+03 1.00000</b><br>
<br>
Object Type Creations Destructions Memory Descendants' Mem.<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
SNES 1 1 124 0<br>
Krylov Solver 1 1 16880 0<br>
Preconditioner 1 1 0 0<br>
Distributed array 1 1 46568 0<br>
Index Set 6 6 135976 0<br>
<b>Vec 45 45 3812684 0</b><br>
Vec Scatter 3 3 0 0<br>
IS L to G Mapping 1 1 45092 0<br>
<b>Matrix 3 3 1011036 0</b><br>
--------------------------------------------------------------------------------------------------------<br>
<br>
--------------------------------------------------------------------------------------------------------<br>
JFNK<br>
KSP SOLVER: GMRES<br>
MESH SIZE : 512 x 512 unknows<br>
NUMBER OF PROCESSORS : 24<br>
<br>
Linear solve converged due to CONVERGED_RTOL iterations 25042<br>
Linear solve converged due to CONVERGED_RTOL iterations 33804<br>
Linear solve converged due to CONVERGED_RTOL iterations 33047<br>
Linear solve converged due to CONVERGED_RTOL iterations 21219<br>
Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE<br>
<br>
Max Max/Min Avg Total <br>
<b>Time (sec): 1.076e+03 1.00009 1.076e+03</b><br>
Objects: 6.500e+01 1.00000 6.500e+01<br>
<b>Flops: 1.044e+11 1.01176 1.036e+11 2.485e+12</b><br>
Flops/sec: 9.702e+07 1.01185 9.626e+07 2.310e+09<br>
<b>Memory: 5.076e+06 1.01530 1.207e+08</b><br>
<b>MPI Messages: 4.676e+05 2.00000 3.702e+05 8.884e+06</b><br>
MPI Message Lengths: 4.002e+08 2.00939 8.623e+02 7.661e+09<br>
<b>MPI Reductions: 9.901e+03 1.00000</b><br>
<br>
Object Type Creations Destructions Memory Descendants' Mem.<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
SNES 1 1 124 0<br>
Krylov Solver 1 1 16880 0<br>
Preconditioner 1 1 0 0<br>
Distributed array 1 1 46568 0<br>
Index Set 6 6 135976 0<br>
<b>Vec 46 46 3901252 0</b><br>
Vec Scatter 3 3 0 0<br>
IS L to G Mapping 1 1 45092 0<br>
MatMFFD 1 1 0 0<br>
<b>Matrix 4 4 1011036 0</b><br>
--------------------------------------------------------------------------------------------------------<br>
<br><br>