Weird (?) results obtained from experiments with the SNESMF module

Fri Dec 19 14:21:54 CST 2008

   Send the output from using -snes_view -options_table for each run;  
it may be
you are not running what you think you are running.

    Barry

On Dec 19, 2008, at 12:27 PM, Rafael Santos Coelho wrote:

> Hello to everyone,
>
> Weeks ago I did some experiments with PETSc's SNESMF module using  
> the Bratu nonlinear PDE in 2D example. I ran a bunch of tests  
> (varying the number of processors and the mesh size) with the "- 
> log_summary" command-line option to collect several measures,  
> namely: max runtime, total memory usage, floating point operations,  
> linear iterations count and MPI messages sent. My intention was to  
> compare SNESMF (the jacobian-free Newton-Krylov method, which I will  
> call JFNK) with SNES ("simple" Newton-Krylov method, which I will  
> call NK). In theory, we basically know that:
>
> 	• JFNK is supposed to consume less memory than NK since the true  
> elements of the jacobian matrix are never actually stored;
> 	• JFNK is supposed to do less or roughly the same amount of  
> floating point operations than NK since it does not calculate the  
> jacobian matrix entries in every Newton iteration.
> Well, I have to admit that I was pretty surprised by the results.  
> Shortly, except in few cases, NK outperformed JFNK with regard to  
> each one of the metrics mentioned above. Before anything, some  
> clarifications:
>
> 	• Each test was repeated 5 times and I summarized the results using  
> the arithmetic mean in order to attenuate possible fluctuations;
> 	• The tests were run on a Beowulf cluster with 30 processing nodes,  
> and each node is a Intel(R) Core(TM)2 CPU 4300 1.80GHz with 2MB of  
> cache, 2GB of RAM, Fedora Core 6 GNU/Linux (kernel version  
> 2.6.26.6-49), PETSc version 2.3.3-p15 and LAM/MPI version 6.41;
> 	• The Walker-Pernice formula was chosen to compute the differencing  
> parameter "h" used with the finite difference based matrix-free  
> jacobian;
> 	• No preconditioners or matrix reorderings were employed.
> Now here come my questions:
>
> 	• How come JFNK used more memory than NK? Looking at the log files  
> at the end of this message, there were 4 matrix creations in JFNK.  
> Why 4?! And also why did JFNK create one vector more (46) than NK  
> (45)? Where does that particular vector come from? Is that vector  
> the h parameter history? How can I figure that out?
> 	• Why did JFNK perform worse than NK in terms of max runtime, total  
> linear iterations count, MPI messages sent, MPI reductions and flops?
> My answer: JFNK had a much higher linear iterations count than NK,  
> and that's probably why it turned out to be worse than NK in all  
> those aspects. As for the MPI reductions, I believe that JFNK topped  
> NK because of all the vector norm operations needed to compute the  
> "h" parameter.
>
> 	• Why did JFNK's KSP solver (GMRES) iterate way more per Newton  
> iteration than NK?
> My answer: That probably has to do with the fact that JFNK  
> approximately computes the product J(x)a. If the precision of that  
> approximation is poor (and that's closely related to which approach  
> was used to calculate the differencing parameter, I think), then the  
> linear solver must iterate more to produce a "good" Newton correction.
>
> I'd really appreciate if you guys could comment on my observations  
> and conclusions and help me out with my questions. I've pasted below  
> some excerpts of two log files generated by the "-log_summary" option.
>
> Thanks in advance,
>
> Rafael
>
> --------------------------------------------------------------------------------------------------------
> NK
> KSP SOLVER: GMRES
> MESH SIZE : 512 x 512 unknows
> NUMBER OF PROCESSORS : 24
>
> Linear solve converged due to CONVERGED_RTOL iterations 25038
> Linear solve converged due to CONVERGED_RTOL iterations 25995
> Linear solve converged due to CONVERGED_RTOL iterations 26769
> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE
>
>                                           Max       Max/Min           
> Avg             Total
> Time (sec):                    4.597e+02    1.00008      4.597e+02
> Objects:                        6.200e+01    1.00000      6.200e+01
> Flops:                           6.553e+10    1.01230      6.501e 
> +10    1.560e+12
> Flops/sec:                     1.425e+08    1.01233      1.414e 
> +08    3.394e+09
> Memory:                       4.989e+06     
> 1.01590                          1.186e+08
> MPI Messages:             3.216e+05    2.00000       2.546e+05     
> 6.111e+06
> MPI Message Lengths:  2.753e+08    2.00939       8.623e+02    5.269e 
> +09
> MPI Reductions:            6.704e+03    1.00000
>
> Object Type      Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
> SNES                        1              1            124          0
> Krylov Solver              1              1           16880        0
> Preconditioner            1              1             0             0
> Distributed array         1              1           46568        0
> Index Set                   6              6          135976       0
> Vec                          45             45        3812684      0
> Vec Scatter               3              3               
> 0             0
> IS L to G Mapping      1              1           45092         0
> Matrix                       3              3          1011036       0
> --------------------------------------------------------------------------------------------------------
>
> --------------------------------------------------------------------------------------------------------
> JFNK
> KSP SOLVER: GMRES
> MESH SIZE : 512 x 512 unknows
> NUMBER OF PROCESSORS : 24
>
> Linear solve converged due to CONVERGED_RTOL iterations 25042
> Linear solve converged due to CONVERGED_RTOL iterations 33804
> Linear solve converged due to CONVERGED_RTOL iterations 33047
> Linear solve converged due to CONVERGED_RTOL iterations 21219
> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE
>
>                                         Max           Max/Min         
> Avg              Total
> Time (sec):                   1.076e+03      1.00009     1.076e+03
> Objects:                       6.500e+01      1.00000     6.500e+01
> Flops:                          1.044e+11      1.01176     1.036e 
> +11     2.485e+12
> Flops/sec:                    9.702e+07      1.01185     9.626e 
> +07     2.310e+09
> Memory:                       5.076e+06       
> 1.01530                         1.207e+08
> MPI Messages:             4.676e+05      2.00000     3.702e+05      
> 8.884e+06
> MPI Message Lengths:  4.002e+08      2.00939     8.623e+02     7.661e 
> +09
> MPI Reductions:            9.901e+03      1.00000
>
> Object Type       Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
> SNES                        1              1                
> 124             0
> Krylov Solver               1              1              
> 16880           0
> Preconditioner             1              1                 
> 0               0
> Distributed array          1              1             
> 46568            0
> Index Set                    6              6             
> 135976          0
> Vec                           46             46           
> 3901252         0
> Vec Scatter                3              3                 
> 0                0
> IS L to G Mapping       1              1              
> 45092            0
> MatMFFD                   1              1                 
> 0                0
> Matrix                        4              4            
> 1011036           0
> --------------------------------------------------------------------------------------------------------
>
>