[petsc-users] How to speed up geometric multigrid

Michele Rosso mrosso at uci.edu
Wed Oct 2 13:10:32 CDT 2013


Thank you all for your contribution.
So far the fastest solution is still the initial one proposed by Jed in 
an earlier round:

-ksp_atol 1e-9  -ksp_monitor_true_residual  -ksp_view  -log_summary 
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu    -mg_levels_ksp_max_it 3 -mg_levels_ksp_type 
richardson  -options_left -pc_mg_galerkin
-pc_mg_levels 5  -pc_mg_log  -pc_type mg

where I used  -mg_levels_ksp_max_it 3  as Barry suggested instead of  
-mg_levels_ksp_max_it 1.
I attached the diagnostics for this case. Any further idea?
Thank you,

Michele


On 10/01/2013 11:44 PM, Barry Smith wrote:
> On Oct 2, 2013, at 12:28 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
>> "Mark F. Adams" <mfadams at lbl.gov> writes:
>>> run3.txt uses:
>>>
>>> -ksp_type richardson
>>>
>>> This is bad and I doubt anyone recommended it intentionally.
>     Hell this is normal multigrid without a Krylov accelerator. Under normal circumstances with geometric multigrid this should be fine, often the best choice.
>
>> I would have expected FGMRES, but Barry likes Krylov smoothers and
>> Richardson is one of a few methods that can tolerate nonlinear
>> preconditioners.
>>
>>> You also have, in this file,
>>>
>>> -mg_levels_ksp_type gmres
>>>
>>> did you or the recommenders mean
>>>
>>> -mg_levels_ksp_type richardson  ???
>>>
>>> you are using gmres here, which forces you to use fgmres in the outer solver.  This is a safe thing to use you if you apply your BCa symmetrically with a low order discretization then
>>>
>>> -ksp_type cg
>>> -mg_levels_ksp_type richardson
>>> -mg_levels_pc_type sor
>>>
>>> is what I'd recommend.
>> I thought that was tried in an earlier round.
>>
>> I don't understand why SOR preconditioning in the Krylov smoother is so
>> drastically more expensive than BJacobi/ILU and why SOR is called so
>> many more times even though the number of outer iterations
>>
>> bjacobi: PCApply              322 1.0 4.1021e+01 1.0 6.44e+09 1.0 3.0e+07 1.6e+03 4.5e+04 74 86 98 88 92 28160064317351226 20106
>> bjacobi: KSPSolve              46 1.0 4.6268e+01 1.0 7.52e+09 1.0 3.0e+07 1.8e+03 4.8e+04 83100100 99 99 31670065158291309 20800
>>
>> sor:     PCApply             1132 1.0 1.5532e+02 1.0 2.30e+10 1.0 1.0e+08 1.6e+03 1.6e+05 69 88 99 88 93 21871774317301274 18987
>> sor:     KSPSolve             201 1.0 1.7101e+02 1.0 2.63e+10 1.0 1.1e+08 1.8e+03 1.7e+05 75100100 99 98 24081775248221352 19652
>

-------------- next part --------------
OPTIONS USED:
-ksp_atol 1e-9
-ksp_monitor_true_residual
-ksp_view
-log_summary
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu
-mg_levels_ksp_max_it 3
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_log
-pc_type mg

  0 KSP unpreconditioned resid norm 1.062110078078e-06 true resid norm 1.062110078078e-06 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP unpreconditioned resid norm 2.236451946298e-07 true resid norm 2.236451946298e-07 ||r(i)||/||b|| 2.105668698998e-01
  2 KSP unpreconditioned resid norm 1.220833343347e-07 true resid norm 1.220833343347e-07 ||r(i)||/||b|| 1.149441445426e-01
  3 KSP unpreconditioned resid norm 8.872102504003e-08 true resid norm 8.872102504003e-08 ||r(i)||/||b|| 8.353279652576e-02
  4 KSP unpreconditioned resid norm 6.619711859327e-08 true resid norm 6.619711859327e-08 ||r(i)||/||b|| 6.232604318479e-02
  5 KSP unpreconditioned resid norm 5.271840072756e-08 true resid norm 5.271840072756e-08 ||r(i)||/||b|| 4.963553384502e-02
  6 KSP unpreconditioned resid norm 4.229219228929e-08 true resid norm 4.229219228929e-08 ||r(i)||/||b|| 3.981902927221e-02
  7 KSP unpreconditioned resid norm 3.634611900984e-08 true resid norm 3.634611900984e-08 ||r(i)||/||b|| 3.422067049361e-02
  8 KSP unpreconditioned resid norm 3.131796519117e-08 true resid norm 3.131796519117e-08 ||r(i)||/||b|| 2.948655307729e-02
  9 KSP unpreconditioned resid norm 2.653911712772e-08 true resid norm 2.653911712772e-08 ||r(i)||/||b|| 2.498716251309e-02
 10 KSP unpreconditioned resid norm 2.451248019105e-08 true resid norm 2.451248019105e-08 ||r(i)||/||b|| 2.307903926061e-02
 11 KSP unpreconditioned resid norm 2.373835128699e-08 true resid norm 2.373835128699e-08 ||r(i)||/||b|| 2.235017987020e-02
 12 KSP unpreconditioned resid norm 2.476463969748e-08 true resid norm 2.476463969748e-08 ||r(i)||/||b|| 2.331645298226e-02
 13 KSP unpreconditioned resid norm 2.689648556959e-08 true resid norm 2.689648556959e-08 ||r(i)||/||b|| 2.532363276156e-02
 14 KSP unpreconditioned resid norm 3.098937516975e-08 true resid norm 3.098937516975e-08 ||r(i)||/||b|| 2.917717834468e-02
 15 KSP unpreconditioned resid norm 3.555038182753e-08 true resid norm 3.555038182753e-08 ||r(i)||/||b|| 3.347146643395e-02
 16 KSP unpreconditioned resid norm 4.112024511716e-08 true resid norm 4.112024511716e-08 ||r(i)||/||b|| 3.871561523226e-02
 17 KSP unpreconditioned resid norm 4.629240103388e-08 true resid norm 4.629240103388e-08 ||r(i)||/||b|| 4.358531379127e-02
 18 KSP unpreconditioned resid norm 4.985610207288e-08 true resid norm 4.985610207288e-08 ||r(i)||/||b|| 4.694061670436e-02
 19 KSP unpreconditioned resid norm 5.046376291690e-08 true resid norm 5.046376291690e-08 ||r(i)||/||b|| 4.751274275472e-02
 20 KSP unpreconditioned resid norm 5.025357388082e-08 true resid norm 5.025357388082e-08 ||r(i)||/||b|| 4.731484515407e-02
 21 KSP unpreconditioned resid norm 4.733061043311e-08 true resid norm 4.733061043311e-08 ||r(i)||/||b|| 4.456281077642e-02
 22 KSP unpreconditioned resid norm 4.482409805557e-08 true resid norm 4.482409805557e-08 ||r(i)||/||b|| 4.220287424134e-02
 23 KSP unpreconditioned resid norm 4.070552710576e-08 true resid norm 4.070552710576e-08 ||r(i)||/||b|| 3.832514910266e-02
 24 KSP unpreconditioned resid norm 3.746139586173e-08 true resid norm 3.746139586173e-08 ||r(i)||/||b|| 3.527072818058e-02
 25 KSP unpreconditioned resid norm 3.416470090249e-08 true resid norm 3.416470090249e-08 ||r(i)||/||b|| 3.216681736447e-02
 26 KSP unpreconditioned resid norm 3.162747159737e-08 true resid norm 3.162747159737e-08 ||r(i)||/||b|| 2.977796016644e-02
 27 KSP unpreconditioned resid norm 2.886965691540e-08 true resid norm 2.886965691540e-08 ||r(i)||/||b|| 2.718141698425e-02
 28 KSP unpreconditioned resid norm 2.669294602696e-08 true resid norm 2.669294602696e-08 ||r(i)||/||b|| 2.513199580525e-02
 29 KSP unpreconditioned resid norm 2.477496636609e-08 true resid norm 2.477496636609e-08 ||r(i)||/||b|| 2.332617576789e-02
 30 KSP unpreconditioned resid norm 2.254756345946e-08 true resid norm 2.254756345946e-08 ||r(i)||/||b|| 2.122902693878e-02
 31 KSP unpreconditioned resid norm 2.100745862543e-08 true resid norm 2.100745862543e-08 ||r(i)||/||b|| 1.977898436239e-02
 32 KSP unpreconditioned resid norm 2.082372673705e-08 true resid norm 2.082372673705e-08 ||r(i)||/||b|| 1.960599674823e-02
 33 KSP unpreconditioned resid norm 2.058561394284e-08 true resid norm 2.058561394284e-08 ||r(i)||/||b|| 1.938180831510e-02
 34 KSP unpreconditioned resid norm 2.071527481693e-08 true resid norm 2.071527481693e-08 ||r(i)||/||b|| 1.950388688000e-02
 35 KSP unpreconditioned resid norm 2.100892944872e-08 true resid norm 2.100892944872e-08 ||r(i)||/||b|| 1.978036917487e-02
 36 KSP unpreconditioned resid norm 2.220101872142e-08 true resid norm 2.220101872142e-08 ||r(i)||/||b|| 2.090274744554e-02
 37 KSP unpreconditioned resid norm 2.324772438230e-08 true resid norm 2.324772438230e-08 ||r(i)||/||b|| 2.188824384792e-02
 38 KSP unpreconditioned resid norm 2.452302256995e-08 true resid norm 2.452302256995e-08 ||r(i)||/||b|| 2.308896514224e-02
 39 KSP unpreconditioned resid norm 2.502647686575e-08 true resid norm 2.502647686575e-08 ||r(i)||/||b|| 2.356297843539e-02
 40 KSP unpreconditioned resid norm 2.531223073672e-08 true resid norm 2.531223073672e-08 ||r(i)||/||b|| 2.383202199016e-02
 41 KSP unpreconditioned resid norm 2.499727165695e-08 true resid norm 2.499727165695e-08 ||r(i)||/||b|| 2.353548108892e-02
 42 KSP unpreconditioned resid norm 2.462083389942e-08 true resid norm 2.462083389942e-08 ||r(i)||/||b|| 2.318105666033e-02
 43 KSP unpreconditioned resid norm 2.360189108305e-08 true resid norm 2.360189108305e-08 ||r(i)||/||b|| 2.222169958670e-02
 44 KSP unpreconditioned resid norm 2.252988454814e-08 true resid norm 2.252988454814e-08 ||r(i)||/||b|| 2.121238185492e-02
 45 KSP unpreconditioned resid norm 2.188564712770e-08 true resid norm 2.188564712770e-08 ||r(i)||/||b|| 2.060581815334e-02
 46 KSP unpreconditioned resid norm 2.002949813700e-08 true resid norm 2.002949813700e-08 ||r(i)||/||b|| 1.885821305193e-02
 47 KSP unpreconditioned resid norm 1.822159592332e-08 true resid norm 1.822159592332e-08 ||r(i)||/||b|| 1.715603335231e-02
 48 KSP unpreconditioned resid norm 1.731437653543e-08 true resid norm 1.731437653543e-08 ||r(i)||/||b|| 1.630186634399e-02
 49 KSP unpreconditioned resid norm 1.582438316044e-08 true resid norm 1.582438316044e-08 ||r(i)||/||b|| 1.489900480850e-02
 50 KSP unpreconditioned resid norm 1.470070282545e-08 true resid norm 1.470070282545e-08 ||r(i)||/||b|| 1.384103505736e-02
 51 KSP unpreconditioned resid norm 1.317055921275e-08 true resid norm 1.317055921275e-08 ||r(i)||/||b|| 1.240037118995e-02
 52 KSP unpreconditioned resid norm 1.200360805809e-08 true resid norm 1.200360805809e-08 ||r(i)||/||b|| 1.130166101033e-02
 53 KSP unpreconditioned resid norm 1.035246990182e-08 true resid norm 1.035246990182e-08 ||r(i)||/||b|| 9.747078118834e-03
 54 KSP unpreconditioned resid norm 9.012810502968e-09 true resid norm 9.012810502968e-09 ||r(i)||/||b|| 8.485759328525e-03
 55 KSP unpreconditioned resid norm 8.556164955549e-09 true resid norm 8.556164955549e-09 ||r(i)||/||b|| 8.055817501548e-03
 56 KSP unpreconditioned resid norm 7.776893147540e-09 true resid norm 7.776893147540e-09 ||r(i)||/||b|| 7.322115953947e-03
 57 KSP unpreconditioned resid norm 6.867595067138e-09 true resid norm 6.867595067138e-09 ||r(i)||/||b|| 6.465991810912e-03
 58 KSP unpreconditioned resid norm 6.256223035332e-09 true resid norm 6.256223035332e-09 ||r(i)||/||b|| 5.890371595621e-03
 59 KSP unpreconditioned resid norm 5.775805121780e-09 true resid norm 5.775805121780e-09 ||r(i)||/||b|| 5.438047563048e-03
 60 KSP unpreconditioned resid norm 5.028152348022e-09 true resid norm 5.028152348022e-09 ||r(i)||/||b|| 4.734116031666e-03
 61 KSP unpreconditioned resid norm 4.491271029703e-09 true resid norm 4.491271029703e-09 ||r(i)||/||b|| 4.228630461573e-03
 62 KSP unpreconditioned resid norm 4.194174911407e-09 true resid norm 4.194174911407e-09 ||r(i)||/||b|| 3.948907931462e-03
 63 KSP unpreconditioned resid norm 3.900672763613e-09 true resid norm 3.900672763613e-09 ||r(i)||/||b|| 3.672569203630e-03
 64 KSP unpreconditioned resid norm 3.725382861224e-09 true resid norm 3.725382861224e-09 ||r(i)||/||b|| 3.507529904967e-03
 65 KSP unpreconditioned resid norm 3.470705216044e-09 true resid norm 3.470705216044e-09 ||r(i)||/||b|| 3.267745300304e-03
 66 KSP unpreconditioned resid norm 3.190845546802e-09 true resid norm 3.190845546802e-09 ||r(i)||/||b|| 3.004251266100e-03
 67 KSP unpreconditioned resid norm 2.936936118052e-09 true resid norm 2.936936118052e-09 ||r(i)||/||b|| 2.765189954103e-03
 68 KSP unpreconditioned resid norm 2.807750828309e-09 true resid norm 2.807750828309e-09 ||r(i)||/||b|| 2.643559162334e-03
 69 KSP unpreconditioned resid norm 2.630235180177e-09 true resid norm 2.630235180177e-09 ||r(i)||/||b|| 2.476424275098e-03
 70 KSP unpreconditioned resid norm 2.423253188367e-09 true resid norm 2.423253188367e-09 ||r(i)||/||b|| 2.281546177166e-03
 71 KSP unpreconditioned resid norm 2.312671011482e-09 true resid norm 2.312671011482e-09 ||r(i)||/||b|| 2.177430625334e-03
 72 KSP unpreconditioned resid norm 2.135449041972e-09 true resid norm 2.135449041972e-09 ||r(i)||/||b|| 2.010572242980e-03
 73 KSP unpreconditioned resid norm 2.002324106483e-09 true resid norm 2.002324106483e-09 ||r(i)||/||b|| 1.885232188086e-03
 74 KSP unpreconditioned resid norm 1.778111616174e-09 true resid norm 1.778111616174e-09 ||r(i)||/||b|| 1.674131196827e-03
 75 KSP unpreconditioned resid norm 1.653921088947e-09 true resid norm 1.653921088947e-09 ||r(i)||/||b|| 1.557203083827e-03
 76 KSP unpreconditioned resid norm 1.536016641258e-09 true resid norm 1.536016641258e-09 ||r(i)||/||b|| 1.446193452978e-03
 77 KSP unpreconditioned resid norm 1.456376200968e-09 true resid norm 1.456376200968e-09 ||r(i)||/||b|| 1.371210226725e-03
 78 KSP unpreconditioned resid norm 1.301938916885e-09 true resid norm 1.301938916885e-09 ||r(i)||/||b|| 1.225804126858e-03
 79 KSP unpreconditioned resid norm 1.256867113940e-09 true resid norm 1.256867113940e-09 ||r(i)||/||b|| 1.183368033015e-03
 80 KSP unpreconditioned resid norm 1.084746612787e-09 true resid norm 1.084746612787e-09 ||r(i)||/||b|| 1.021312795328e-03
 81 KSP unpreconditioned resid norm 1.026849960395e-09 true resid norm 1.026849960395e-09 ||r(i)||/||b|| 9.668018236432e-04
 82 KSP unpreconditioned resid norm 9.283375662057e-10 true resid norm 9.283375662057e-10 ||r(i)||/||b|| 8.740502376984e-04
KSP Object: 128 MPI processes
  type: cg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-09, divergence=10000
  left preconditioning
  has attached null space
  using UNPRECONDITIONED norm type for convergence test
PC Object: 128 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     128 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     128 MPI processes
      type: lu
        LU: out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        matrix ordering: natural
        factor fill ratio given 0, needed 0
          Factored matrix follows:
            Matrix Object:             128 MPI processes
              type: mpiaij
              rows=1024, cols=1024
              package used to perform factorization: superlu_dist
              total: nonzeros=0, allocated nonzeros=0
              total number of mallocs used during MatSetValues calls =0
                SuperLU_DIST run parameters:
                  Process grid nprow 16 x npcol 8 
                  Equilibrate matrix TRUE 
                  Matrix input mode 1 
                  Replace tiny pivots TRUE 
                  Use iterative refinement FALSE 
                  Processors in row 16 col partition 8 
                  Row permutation LargeDiag 
                  Column permutation METIS_AT_PLUS_A
                  Parallel symbolic factorization FALSE 
                  Repeated factorization SamePattern_SameRowPerm
      linear system matrix = precond matrix:
      Matrix Object:       128 MPI processes
        type: mpiaij
        rows=1024, cols=1024
        total: nonzeros=27648, allocated nonzeros=27648
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     128 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=3
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     128 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       128 MPI processes
        type: mpiaij
        rows=8192, cols=8192
        total: nonzeros=221184, allocated nonzeros=221184
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 16 nodes, limit used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     128 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=3
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     128 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       128 MPI processes
        type: mpiaij
        rows=65536, cols=65536
        total: nonzeros=1769472, allocated nonzeros=1769472
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     128 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=3
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     128 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       128 MPI processes
        type: mpiaij
        rows=524288, cols=524288
        total: nonzeros=14155776, allocated nonzeros=14155776
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     128 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=3
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     128 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       128 MPI processes
        type: mpiaij
        rows=4194304, cols=4194304
        total: nonzeros=29360128, allocated nonzeros=29360128
        total number of mallocs used during MatSetValues calls =0
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Matrix Object:   128 MPI processes
    type: mpiaij
    rows=4194304, cols=4194304
    total: nonzeros=29360128, allocated nonzeros=29360128
    total number of mallocs used during MatSetValues calls =0
 
---------------------------------------- SUMMARY ----------------------------------------
                              Setup time =     0.0118 min 
                     Initialization time =     0.0001 min 
                         Processing time =    39.5579 min 
                    Post-processing time =     0.0028 min 
                   Total simulation time =    39.5726 min 
           Processing time per time step =     2.8391 sec 
              Total number of time steps =      836
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./hit on a interlagos-64idx-pgi-opt named nid20962 with 128 processors, by Unknown Wed Oct  2 12:24:10 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.407e+03      1.00001   2.407e+03
Objects:              3.145e+05      1.00000   3.145e+05
Flops:                3.135e+11      1.00000   3.135e+11  4.012e+13
Flops/sec:            1.303e+08      1.00001   1.303e+08  1.667e+10
MPI Messages:         6.225e+06      1.00000   6.225e+06  7.968e+08
MPI Message Lengths:  2.183e+10      1.00000   3.506e+03  2.794e+12
MPI Reductions:       7.943e+05      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 6.2536e+02  26.0%  5.4104e+12  13.5%  3.273e+07   4.1%  6.724e+02       19.2%  2.969e+05  37.4% 
 1:        MG Apply: 1.7813e+03  74.0%  3.4714e+13  86.5%  7.641e+08  95.9%  2.834e+03       80.8%  4.975e+05  62.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecTDot            62182 1.0 2.3234e+01 1.6 4.08e+09 1.0 0.0e+00 0.0e+00 6.2e+04  1  1  0  0  8   3 10  0  0 21 22451
VecNorm            64690 1.0 1.9899e+01 2.4 4.24e+09 1.0 0.0e+00 0.0e+00 6.5e+04  1  1  0  0  8   2 10  0  0 22 27271
VecCopy            33599 1.0 3.2168e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              2530 1.0 4.4997e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY            62182 1.0 9.0091e+00 1.2 4.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1 10  0  0  0 57899
VecAYPX            62182 1.0 9.1499e+00 1.2 3.03e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1  7  0  0  0 42373
VecScatterBegin    64694 1.0 8.3787e+00 1.2 0.00e+00 0.0 3.3e+07 1.6e+04 0.0e+00  0  0  4 19  0   1  0100100  0     0
VecScatterEnd      64694 1.0 3.8446e+01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   4  0  0  0  0     0
MatMult            63018 1.0 2.2957e+02 1.1 2.68e+10 1.0 3.2e+07 1.6e+04 0.0e+00  9  9  4 19  0  35 64 99 99  0 14968
MatMultTranspose       4 1.0 2.2471e-03 1.1 2.53e+05 1.0 1.5e+03 9.9e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0 14396
MatLUFactorSym         1 1.0 3.5787e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 1.3686e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin     853 1.0 2.7544e+00 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.7e+03  0  0  0  0  0   0  0  0  0  1     0
MatAssemblyEnd       853 1.0 3.0681e+00 1.2 0.00e+00 0.0 1.2e+04 1.1e+03 7.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 4.0531e-06 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.1935e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView             5852 1.0 1.6725e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 5.9e+03  0  0  0  0  1   0  0  0  0  2     0
MatPtAP                4 1.0 2.2126e-01 1.0 5.11e+06 1.0 2.5e+04 6.0e+03 1.0e+02  0  0  0  0  0   0  0  0  0  0  2953
MatPtAPSymbolic        4 1.0 1.5837e-01 1.1 0.00e+00 0.0 1.5e+04 7.8e+03 6.0e+01  0  0  0  0  0   0  0  0  0  0     0
MatPtAPNumeric         4 1.0 7.1259e-02 1.1 5.11e+06 1.0 9.7e+03 3.1e+03 4.0e+01  0  0  0  0  0   0  0  0  0  0  9170
MatGetLocalMat         4 1.0 2.6774e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          4 1.0 3.5020e-02 3.5 0.00e+00 0.0 1.1e+04 8.4e+03 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSymTrans         8 1.0 9.7649e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               6 1.0 9.6231e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01  0  0  0  0  0   0  0  0  0  0     0
Warning -- total time of even greater than time of entire stage -- something is wrong with the timer
KSPSolve             836 1.0 2.1100e+03 1.0 3.13e+11 1.0 8.0e+08 3.5e+03 7.8e+05 88100100100 99 3377422433520264 19016
PCSetUp                1 1.0 4.3869e-01 1.0 5.36e+06 1.0 3.4e+04 4.6e+03 3.0e+02  0  0  0  0  0   0  0  0  0  0  1563
Warning -- total time of even greater than time of entire stage -- something is wrong with the timer
PCApply            31091 1.0 1.7960e+03 1.0 2.71e+11 1.0 7.6e+08 3.0e+03 5.0e+05 74 87 96 81 63 2856422335421168 19329
MGSetup Level 0        1 1.0 1.3880e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
MGSetup Level 1        1 1.0 2.2409e-0312.2 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MGSetup Level 2        1 1.0 3.0804e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MGSetup Level 3        1 1.0 1.5497e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MGSetup Level 4        1 1.0 1.6789e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: MG Apply

VecScale          621820 1.0 3.4077e+00 1.3 1.76e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 66146
VecCopy            31091 1.0 4.3931e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            435274 1.0 6.2440e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX           124364 1.0 5.2623e+00 1.5 1.16e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 28314
VecScatterBegin   994912 1.0 9.1571e+01 1.6 0.00e+00 0.0 7.6e+08 3.0e+03 0.0e+00  3  0 96 81  0   4  0100100  0     0
VecScatterEnd     994912 1.0 1.9157e+02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   7  0  0  0  0     0
MatMult           124364 1.0 1.3612e+02 1.2 2.09e+10 1.0 1.1e+08 3.2e+03 0.0e+00  5  7 14 13  0   7  8 15 16  0 19694
MatMultAdd        124364 1.0 5.9485e+01 1.1 7.86e+09 1.0 4.8e+07 9.9e+02 0.0e+00  2  3  6  2  0   3  3  6  2  0 16907
MatMultTranspose  124364 1.0 5.8217e+01 1.4 7.86e+09 1.0 4.8e+07 9.9e+02 0.0e+00  2  3  6  2  0   3  3  6  2  0 17276
MatSolve           31091 1.0 5.5108e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   3  0  0  0  0     0
MatSOR            248728 1.0 1.4900e+03 1.0 2.33e+11 1.0 5.6e+08 3.2e+03 5.0e+05 61 74 70 65 63  83 86 73 80100 20050
KSPSolve          279819 1.0 1.5449e+03 1.0 2.33e+11 1.0 5.6e+08 3.2e+03 5.0e+05 64 74 70 65 63  86 86 73 80100 19337
PCApply            31091 1.0 5.5162e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   3  0  0  0  0     0
MGSmooth Level 0   31091 1.0 5.5668e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   3  0  0  0  0     0
MGSmooth Level 1   62182 1.0 2.0214e+01 1.2 8.54e+08 1.0 1.6e+08 1.9e+02 1.2e+05  1  0 20  1 16   1  0 21  1 25  5405
MGResid Level 1    31091 1.0 1.6500e+00 1.1 1.07e+08 1.0 3.2e+07 1.9e+02 0.0e+00  0  0  4  0  0   0  0  4  0  0  8336
MGInterp Level 1   62182 1.0 5.2735e+00 5.1 2.69e+07 1.0 2.4e+07 6.4e+01 0.0e+00  0  0  3  0  0   0  0  3  0  0   652
MGSmooth Level 2   62182 1.0 5.2998e+01 1.1 7.98e+09 1.0 1.6e+08 6.4e+02 1.2e+05  2  3 20  4 16   3  3 21  5 25 19271
MGResid Level 2    31091 1.0 4.2289e+00 1.2 8.60e+08 1.0 3.2e+07 6.4e+02 0.0e+00  0  0  4  1  0   0  0  4  1  0 26018
MGInterp Level 2   62182 1.0 3.7472e+00 1.6 2.15e+08 1.0 2.4e+07 2.1e+02 0.0e+00  0  0  3  0  0   0  0  3  0  0  7341
MGSmooth Level 3   62182 1.0 3.1296e+02 1.1 6.94e+10 1.0 1.6e+08 2.3e+03 1.2e+05 12 22 20 13 16  16 26 21 16 25 28390
MGResid Level 3    31091 1.0 2.4739e+01 1.1 6.88e+09 1.0 3.2e+07 2.3e+03 0.0e+00  1  2  4  3  0   1  3  4  3  0 35580
MGInterp Level 3   62182 1.0 1.3544e+01 1.2 1.72e+09 1.0 2.4e+07 7.7e+02 0.0e+00  0  1  3  1  0   1  1  3  1  0 16248
MGSmooth Level 4   62182 1.0 1.1547e+03 1.1 1.55e+11 1.0 8.0e+07 1.6e+04 1.2e+05 47 49 10 47 16  63 57 10 58 25 17197
MGResid Level 4    31091 1.0 1.1227e+02 1.2 1.43e+10 1.0 1.6e+07 1.6e+04 0.0e+00  4  5  2  9  0   6  5  2 12  0 16262
MGInterp Level 4   62182 1.0 9.7205e+01 1.2 1.38e+10 1.0 2.4e+07 2.9e+03 0.0e+00  4  4  3  2  0   5  5  3  3  0 18111
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector 65592          65592  17415948192     0
      Vector Scatter    19             19        22572     0
              Matrix    38             38     14004608     0
   Matrix Null Space     1              1          652     0
    Distributed Mesh     5              5       830792     0
     Bipartite Graph    10             10         8560     0
           Index Set    47             47       534480     0
   IS L to G Mapping     5              5       405756     0
       Krylov Solver     7              7         9536     0
     DMKSP interface     3              3         2088     0
      Preconditioner     7              7         7352     0
              Viewer     1              0            0     0

--- Event Stage 1: MG Apply

              Vector 248728          248728  19044605504     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 7.16209e-05
Average time for zero size MPI_Send(): 1.87568e-06
#PETSc Option Table entries:
-ksp_atol 1e-9
-ksp_monitor_true_residual
-ksp_view
-log_summary
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu
-mg_levels_ksp_max_it 3
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_log
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure run at: Wed Aug 28 23:25:43 2013
Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=0 --known-mpi-c-double-complex=0 --with-batch="1 " --known-mpi-shared="0 " --known-memcmp-ok  --with-blas-lapack-lib="-L/opt/acml/5.3.0/pgi64/lib  -lacml" --COPTFLAGS="-O3 -fastsse" --FOPTFLAGS="-O3 -fastsse" --CXXOPTFLAGS="-O3 -fastsse" --with-x="0 " --with-debugging="0 " --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries=0 --with-dynamic-loading=0 --with-mpi-compilers="1 " --known-mpi-shared-libraries=0 --with-64-bit-indices --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " --with-cc=cc --with-cxx=CC --with-fc=ftn PETSC_ARCH=interlagos-64idx-pgi-opt
-----------------------------------------
Libraries compiled on Wed Aug 28 23:25:43 2013 on h2ologin3 
Machine characteristics: Linux-2.6.32.59-0.7-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2
Using PETSc arch: interlagos-64idx-pgi-opt
-----------------------------------------

Using C compiler: cc  -O3 -fastsse  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -O3 -fastsse   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/include -I/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/include -I/opt/cray/udreg/2.3.2-1.0402.7311.2.1.gem/include -I/opt/cray/ugni/5.0-1.0402.7128.7.6.gem/include -I/opt/cray/pmi/4.0.1-1.0000.9421.73.3.gem/include -I/opt/cray/dmapp/4.0.1-1.0402.7439.5.1.gem/include -I/opt/cray/gni-headers/2.1-1.0402.7082.6.2.gem/include -I/opt/cray/xpmem/0.1-2.0402.44035.2.1.gem/include -I/opt/cray/rca/1.0.0-2.0402.42153.2.106.gem/include -I/opt/cray-hss-devel/7.0.0/include -I/opt/cray/krca/1.0.0-2.0402.42157.2.94.gem/include -I/opt/cray/mpt/6.0.1/gni/mpich2-pgi/121/include -I/opt/acml/5.3.0/pgi64_fma4/include -I/opt/cray/libsci/12.1.01/pgi/121/interlagos/include -I/opt/fftw/3.3.0.3/interlagos/include -I/usr/include/alps -I/opt/pgi/13.6.0/linux86-64/13.6/include -I/opt/cray/xe-sysroot/4.2.24/usr/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/lib -lpetsc -Wl,-rpath,/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/lib -L/mnt/a/u/sciteam/mrosso/LIBS/petsc-3.4.2/interlagos-64idx-pgi-opt/lib -lsuperlu_dist_3.3 -L/opt/acml/5.3.0/pgi64/lib -lacml -lpthread -lparmetis -lmetis -ldl 
-----------------------------------------

#PETSc Option Table entries:
-ksp_atol 1e-9
-ksp_monitor_true_residual
-ksp_view
-log_summary
-mg_coarse_pc_factor_mat_solver_package superlu_dist
-mg_coarse_pc_type lu
-mg_levels_ksp_max_it 3
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_log
-pc_type mg
#End of PETSc Option Table entries
There are no unused options.


More information about the petsc-users mailing list