[petsc-users] GAMG Parallel Performance

Thu Nov 15 10:50:46 CST 2018

Dear PETSc team,

I am solving a linear transient dynamic problem, based on a discretization
with finite elements. To do that, I am using FGMRES with GAMG as a
preconditioner. I consider here 10 time steps.
The problem has round to 118e6 dof and I am running on 1000, 1500 and 2000
procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
I notice that the performance deteriorates when I increase the number of
processes.
You can find as attached file the log_view of the execution and the
detailled definition of the KSP.

Is the problem too small to run on that number of processes or is there
something wrong with my use of GAMG?

I thank you in advance for your help,
Nicolas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181115/9c9caee6/attachment-0001.html>
-------------- next part --------------
 ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

 Unknown Name on a arch-linux2-c-opt-mpi-ml-hypre named eocn0117 with 1000 processors, by B07947 Thu Nov 15 16:14:46 2018
 Using Petsc Release Version 3.8.2, Nov, 09, 2017 

                          Max       Max/Min        Avg      Total 
 Time (sec):           1.661e+02      1.00034   1.661e+02
 Objects:              1.401e+03      1.00143   1.399e+03
 Flop:                 7.695e+10      1.13672   7.354e+10  7.354e+13
 Flop/sec:            4.633e+08      1.13672   4.428e+08  4.428e+11
 MPI Messages:         3.697e+05     12.46258   1.179e+05  1.179e+08
 MPI Message Lengths:  8.786e+08      3.98485   4.086e+03  4.817e+11
 MPI Reductions:       2.635e+03      1.00000

 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length N --> 2N flop
                             and VecAXPY() for complex vectors of length N --> 8N flop

 Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
  0:      Main Stage: 1.6608e+02 100.0%  7.3541e+13 100.0%  1.178e+08  99.9%  4.081e+03       99.9%  2.603e+03  98.8% 

 ------------------------------------------------------------------------------------------------------------------------
 See the 'Profiling' chapter of the users' manual for details on interpreting output.
 Phase summary info:
    Count: number of times phase was executed
    Time and Flop: Max - maximum over all processors
                    Ratio - ratio of maximum to minimum over all processors
    Mess: number of messages sent
    Avg. len: average message length (bytes)
    Reduct: number of global reductions
    Global: entire computation
    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
       %T - percent time in this phase         %F - percent flop in this phase
       %M - percent messages in this phase     %L - percent message lengths in this phase
       %R - percent reductions in this phase
    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
 ------------------------------------------------------------------------------------------------------------------------
 Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 ------------------------------------------------------------------------------------------------------------------------

 --- Event Stage 0: Main Stage

 MatMult             7342 1.0 4.4956e+01 1.4 4.09e+10 1.2 9.6e+07 4.3e+03 0.0e+00 23 53 81 86  0  23 53 81 86  0 859939
 MatMultAdd          1130 1.0 3.4048e+00 2.3 1.55e+09 1.1 8.4e+06 8.2e+02 0.0e+00  2  2  7  1  0   2  2  7  1  0 434274
 MatMultTranspose    1130 1.0 4.7555e+00 3.8 1.55e+09 1.1 8.4e+06 8.2e+02 0.0e+00  1  2  7  1  0   1  2  7  1  0 310924
 MatSolve             226 0.0 6.8927e-04 0.0 6.24e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    90
 MatSOR              6835 1.0 3.6061e+01 1.4 2.85e+10 1.1 0.0e+00 0.0e+00 0.0e+00 20 37  0  0  0  20 37  0  0  0 760198
 MatLUFactorSym         1 1.0 1.0800e-0390.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatLUFactorNum         1 1.0 8.0395e-04421.5 1.09e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
 MatScale              15 1.0 1.7925e-02 1.8 9.12e+06 1.1 6.6e+04 1.1e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0 485856
 MatResidual         1130 1.0 6.3576e+00 1.5 5.31e+09 1.2 1.5e+07 3.7e+03 0.0e+00  3  7 13 11  0   3  7 13 11  0 781728
 MatAssemblyBegin     112 1.0 9.9765e-01 3.0 0.00e+00 0.0 2.1e+05 7.8e+04 7.4e+01  0  0  0  3  3   0  0  0  3  3     0
 MatAssemblyEnd       112 1.0 6.8845e-01 1.1 0.00e+00 0.0 8.3e+05 3.4e+02 2.6e+02  0  0  1  0 10   0  0  1  0 10     0
 MatGetRow         582170 1.0 8.5022e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatGetRowIJ            1 0.0 2.0885e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCreateSubMat        6 1.0 3.7804e-02 1.0 0.00e+00 0.0 5.6e+04 2.8e+03 1.0e+02  0  0  0  0  4   0  0  0  0  4     0
 MatGetOrdering         1 0.0 4.4608e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCoarsen             5 1.0 3.2871e-02 1.1 0.00e+00 0.0 1.2e+06 4.9e+02 5.2e+01  0  0  1  0  2   0  0  1  0  2     0
 MatZeroEntries         5 1.0 6.6769e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatView               90 1.3 8.9249e-0216.6 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+01  0  0  0  0  3   0  0  0  0  3     0
 MatAXPY                5 1.0 6.4984e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
 MatMatMult             5 1.0 6.8333e-01 1.0 1.41e+08 1.2 3.7e+05 1.0e+04 8.2e+01  0  0  0  1  3   0  0  0  1  3 193093
 MatMatMultSym          5 1.0 4.8541e-01 1.0 0.00e+00 0.0 3.0e+05 7.8e+03 7.0e+01  0  0  0  0  3   0  0  0  0  3     0
 MatMatMultNum          5 1.0 1.9432e-01 1.0 1.41e+08 1.2 6.6e+04 2.2e+04 1.0e+01  0  0  0  0  0   0  0  0  0  0 679018
 MatPtAP                5 1.0 4.2329e+00 1.0 1.54e+09 1.5 8.3e+05 4.3e+04 8.7e+01  3  2  1  7  3   3  2  1  7  3 292103
 MatPtAPSymbolic        5 1.0 2.7832e+00 1.0 0.00e+00 0.0 3.5e+05 5.6e+04 3.7e+01  2  0  0  4  1   2  0  0  4  1     0
 MatPtAPNumeric         5 1.0 1.4511e+00 1.0 1.54e+09 1.5 4.8e+05 3.3e+04 5.0e+01  1  2  0  3  2   1  2  0  3  2 852080
 MatTrnMatMult          1 1.0 1.5337e+00 1.0 5.87e+07 1.3 6.9e+04 8.1e+04 1.9e+01  1  0  0  1  1   1  0  0  1  1 36505
 MatTrnMatMultSym       1 1.0 9.2151e-01 1.0 0.00e+00 0.0 5.7e+04 3.4e+04 1.7e+01  1  0  0  0  1   1  0  0  0  1     0
 MatTrnMatMultNum       1 1.0 6.1297e-01 1.0 5.87e+07 1.3 1.1e+04 3.2e+05 2.0e+00  0  0  0  1  0   0  0  0  1  0 91341
 MatGetLocalMat        17 1.0 5.4432e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatGetBrAoCol         15 1.0 7.0758e-02 2.1 0.00e+00 0.0 4.6e+05 4.2e+04 0.0e+00  0  0  0  4  0   0  0  0  4  0     0
 VecMDot              329 1.0 6.2030e+0013.7 6.68e+08 1.0 0.0e+00 0.0e+00 3.3e+02  1  1  0  0 12   1  1  0  0 13 106230
 VecNorm              595 1.0 1.1655e+00 8.5 1.21e+08 1.0 0.0e+00 0.0e+00 6.0e+02  0  0  0  0 23   0  0  0  0 23 102761
 VecScale             349 1.0 6.6033e-02 4.6 3.13e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 467735
 VecCopy             1386 1.0 1.0624e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecSet              4392 1.0 8.6035e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecAXPY              246 1.0 4.8357e-02 1.4 5.69e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 1160750
 VecAYPX             9276 1.0 4.4571e-01 1.4 3.11e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 687917
 VecAXPBYCZ          4520 1.0 2.8744e-01 1.4 5.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 1939847
 VecMAXPY             575 1.0 8.4132e-01 1.5 1.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 1600021
 VecAssemblyBegin     185 1.0 6.6342e-02 1.3 0.00e+00 0.0 2.3e+04 2.2e+04 5.5e+02  0  0  0  0 21   0  0  0  0 21     0
 VecAssemblyEnd       185 1.0 4.2391e-04 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecPointwiseMult      55 1.0 3.7224e-03 1.5 1.38e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 364534
 VecScatterBegin     9786 1.0 8.6765e-01 5.5 0.00e+00 0.0 1.1e+08 3.8e+03 0.0e+00  0  0 97 90  0   0  0 97 90  0     0
 VecScatterEnd       9786 1.0 1.9699e+01 9.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
 VecSetRandom           5 1.0 4.3778e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecNormalize         113 1.0 9.7592e-02 3.3 9.34e+06 1.0 0.0e+00 0.0e+00 1.1e+02  0  0  0  0  4   0  0  0  0  4 94297
 KSPGMRESOrthog       326 1.0 6.4559e+00 9.1 1.33e+09 1.0 0.0e+00 0.0e+00 3.3e+02  1  2  0  0 12   1  2  0  0 13 203262
 KSPSetUp              18 1.0 1.4065e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  1   0  0  0  0  1     0
 KSPSolve              10 1.0 7.9545e+01 1.0 7.50e+10 1.1 1.1e+08 3.8e+03 8.1e+02 48 98 95 89 31  48 98 95 89 31 903224
 PCGAMGGraph_AGG        5 1.0 1.2315e+00 1.0 2.25e+06 1.2 3.3e+05 4.2e+02 1.3e+02  1  0  0  0  5   1  0  0  0  5  1759
 PCGAMGCoarse_AGG       5 1.0 1.5847e+00 1.0 5.87e+07 1.3 1.3e+06 5.2e+03 8.7e+01  1  0  1  1  3   1  0  1  1  3 35331
 PCGAMGProl_AGG         5 1.0 3.5152e-01 1.0 0.00e+00 0.0 2.3e+06 1.5e+03 9.0e+02  0  0  2  1 34   0  0  2  1 35     0
 PCGAMGPOpt_AGG         5 1.0 1.0543e+00 1.0 4.17e+08 1.2 1.0e+06 6.1e+03 2.4e+02  1  1  1  1  9   1  1  1  1  9 372220
 GAMG: createProl       5 1.0 4.2217e+00 1.0 4.78e+08 1.2 5.0e+06 3.3e+03 1.4e+03  3  1  4  3 52   3  1  4  3 52 106734
   Graph               10 1.0 1.2300e+00 1.0 2.25e+06 1.2 3.3e+05 4.2e+02 1.3e+02  1  0  0  0  5   1  0  0  0  5  1761
   MIS/Agg              5 1.0 3.2935e-02 1.1 0.00e+00 0.0 1.2e+06 4.9e+02 5.2e+01  0  0  1  0  2   0  0  1  0  2     0
   SA: col data         5 1.0 1.3732e-01 1.0 0.00e+00 0.0 2.2e+06 1.2e+03 8.4e+02  0  0  2  1 32   0  0  2  1 32     0
   SA: frmProl0         5 1.0 2.0841e-01 1.0 0.00e+00 0.0 9.5e+04 7.1e+03 5.0e+01  0  0  0  0  2   0  0  0  0  2     0
   SA: smooth           5 1.0 7.6907e-01 1.0 1.48e+08 1.2 3.7e+05 1.0e+04 1.0e+02  0  0  0  1  4   0  0  0  1  4 180072
 GAMG: partLevel        5 1.0 4.2824e+00 1.0 1.54e+09 1.5 8.9e+05 4.0e+04 2.5e+02  3  2  1  7  9   3  2  1  7  9 288729
   repartition          3 1.0 2.1951e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  1   0  0  0  0  1     0
   Invert-Sort          3 1.0 2.9290e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
   Move A               3 1.0 2.8378e-02 1.1 0.00e+00 0.0 3.0e+04 5.2e+03 5.4e+01  0  0  0  0  2   0  0  0  0  2     0
   Move P               3 1.0 1.5999e-02 1.2 0.00e+00 0.0 2.6e+04 4.0e+01 5.4e+01  0  0  0  0  2   0  0  0  0  2     0
 PCSetUp                2 1.0 8.5208e+00 1.0 2.01e+09 1.4 5.8e+06 8.9e+03 1.6e+03  5  2  5 11 62   5  2  5 11 63 197991
 PCSetUpOnBlocks      226 1.0 1.7779e-0310.4 1.09e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
 PCApply              226 1.0 6.9594e+01 1.1 6.40e+10 1.1 1.1e+08 3.3e+03 1.0e+02 41 83 90 72  4  41 83 90 72  4 878121
 SFSetGraph             5 1.0 5.3883e-0556.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 SFBcastBegin          62 1.0 9.9101e-03 1.7 0.00e+00 0.0 1.2e+06 4.9e+02 0.0e+00  0  0  1  0  0   0  0  1  0  0     0
 SFBcastEnd            62 1.0 6.8467e-03 6.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 BuildTwoSided          5 1.0 7.4060e-03 2.8 0.00e+00 0.0 3.3e+04 4.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 ------------------------------------------------------------------------------------------------------------------------

 Memory usage is given in bytes:

 Object Type          Creations   Destructions     Memory  Descendants' Mem.
 Reports information only for process 0.

 --- Event Stage 0: Main Stage

               Matrix   154            154    347782424     0.
       Matrix Coarsen     5              5         3140     0.
    Matrix Null Space     1              1          688     0.
               Vector  1035           1035    582369520     0.
       Vector Scatter    36             36        39744     0.
            Index Set   112            112       484084     0.
        Krylov Solver    18             18       330336     0.
       Preconditioner    13             13        12868     0.
          PetscRandom    10             10         6380     0.
    Star Forest Graph     5              5         4280     0.
               Viewer    12             11         9152     0.
 ========================================================================================================================
   0 KSP unpreconditioned resid norm 3.738834777485e+08 true resid norm 3.738834777485e+08 ||r(i)||/||b|| 1.000000000000e+00
   1 KSP unpreconditioned resid norm 1.256707592053e+08 true resid norm 1.256707592053e+08 ||r(i)||/||b|| 3.361227940911e-01
   2 KSP unpreconditioned resid norm 1.824938621520e+07 true resid norm 1.824938621520e+07 ||r(i)||/||b|| 4.881035750789e-02
   3 KSP unpreconditioned resid norm 6.102002718084e+06 true resid norm 6.102002718084e+06 ||r(i)||/||b|| 1.632060008329e-02
   4 KSP unpreconditioned resid norm 2.562432902883e+06 true resid norm 2.562432902883e+06 ||r(i)||/||b|| 6.853560147439e-03
   5 KSP unpreconditioned resid norm 1.188336046012e+06 true resid norm 1.188336046012e+06 ||r(i)||/||b|| 3.178359346520e-03
   6 KSP unpreconditioned resid norm 5.326022866065e+05 true resid norm 5.326022866065e+05 ||r(i)||/||b|| 1.424514102131e-03
   7 KSP unpreconditioned resid norm 2.433972087119e+05 true resid norm 2.433972087119e+05 ||r(i)||/||b|| 6.509974984122e-04
   8 KSP unpreconditioned resid norm 1.095996827533e+05 true resid norm 1.095996827533e+05 ||r(i)||/||b|| 2.931386094225e-04
   9 KSP unpreconditioned resid norm 4.986951871355e+04 true resid norm 4.986951871355e+04 ||r(i)||/||b|| 1.333825153597e-04
  10 KSP unpreconditioned resid norm 2.330078182947e+04 true resid norm 2.330078182946e+04 ||r(i)||/||b|| 6.232097221779e-05
  11 KSP unpreconditioned resid norm 1.084965391397e+04 true resid norm 1.084965391396e+04 ||r(i)||/||b|| 2.901881083191e-05
  12 KSP unpreconditioned resid norm 5.108480961660e+03 true resid norm 5.108480961647e+03 ||r(i)||/||b|| 1.366329689776e-05
  13 KSP unpreconditioned resid norm 2.450752492671e+03 true resid norm 2.450752492670e+03 ||r(i)||/||b|| 6.554856361741e-06
  14 KSP unpreconditioned resid norm 1.181086403619e+03 true resid norm 1.181086403614e+03 ||r(i)||/||b|| 3.158969234817e-06
  15 KSP unpreconditioned resid norm 5.606721134498e+02 true resid norm 5.606721134433e+02 ||r(i)||/||b|| 1.499590505629e-06
  16 KSP unpreconditioned resid norm 2.700319247455e+02 true resid norm 2.700319247344e+02 ||r(i)||/||b|| 7.222355113430e-07
  17 KSP unpreconditioned resid norm 1.314293551958e+02 true resid norm 1.314293551859e+02 ||r(i)||/||b|| 3.515249081809e-07
  18 KSP unpreconditioned resid norm 6.357572858020e+01 true resid norm 6.357572858253e+01 ||r(i)||/||b|| 1.700415567047e-07
  19 KSP unpreconditioned resid norm 3.077536939056e+01 true resid norm 3.077536939188e+01 ||r(i)||/||b|| 8.231272902779e-08
  20 KSP unpreconditioned resid norm 1.504910881547e+01 true resid norm 1.504910882709e+01 ||r(i)||/||b|| 4.025079930707e-08
  21 KSP unpreconditioned resid norm 7.400345249992e+00 true resid norm 7.400345259132e+00 ||r(i)||/||b|| 1.979318611161e-08
  22 KSP unpreconditioned resid norm 3.607811417234e+00 true resid norm 3.607811420482e+00 ||r(i)||/||b|| 9.649560986776e-09
 Linear solve converged due to CONVERGED_RTOL iterations 22
 KSP Object: 1000 MPI processes
   type: fgmres
     restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
     happy breakdown tolerance 1e-30
   maximum iterations=10000, initial guess is zero
   tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
   right preconditioning
   using UNPRECONDITIONED norm type for convergence test
 PC Object: 1000 MPI processes
   type: gamg
     type is MULTIPLICATIVE, levels=6 cycles=v
       Cycles per PCApply=1
       Using externally compute Galerkin coarse grid matrices
       GAMG specific options
         Threshold for dropping small values in graph on each level =   0.   0.   0.   0.  
         Threshold scaling factor for each level not specified = 1.
         AGG specific options
           Symmetric graph false
           Number of levels to square graph 1
           Number smoothing steps 1
    Coarse grid solver -- level -------------------------------
     KSP Object: (mg_coarse_) 1000 MPI processes
       type: preonly
       maximum iterations=10000, initial guess is zero
       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
       left preconditioning
       using NONE norm type for convergence test
     PC Object: (mg_coarse_) 1000 MPI processes
       type: bjacobi
         number of blocks = 1000
         Local solve is same for all blocks, in the following KSP and PC objects:
       KSP Object: (mg_coarse_sub_) 1 MPI processes
         type: preonly
         maximum iterations=1, initial guess is zero
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object: (mg_coarse_sub_) 1 MPI processes
         type: lu
           out-of-place factorization
           tolerance for zero pivot 2.22045e-14
           using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
           matrix ordering: nd
           factor fill ratio given 5., needed 1.
             Factored matrix follows:
               Mat Object: 1 MPI processes
                 type: seqaij
                 rows=12, cols=12, bs=6
                 package used to perform factorization: petsc
                 total: nonzeros=144, allocated nonzeros=144
                 total number of mallocs used during MatSetValues calls =0
                   using I-node routines: found 3 nodes, limit used is 5
         linear system matrix = precond matrix:
         Mat Object: 1 MPI processes
           type: seqaij
           rows=12, cols=12, bs=6
           total: nonzeros=144, allocated nonzeros=144
           total number of mallocs used during MatSetValues calls =0
             using I-node routines: found 3 nodes, limit used is 5
        linear system matrix = precond matrix:
       Mat Object: 1000 MPI processes
         type: mpiaij
         rows=12, cols=12, bs=6
         total: nonzeros=144, allocated nonzeros=144
          total number of mallocs used during MatSetValues calls =0
           using I-node (on process 0) routines: found 3 nodes, limit used is 5
   Down solver (pre-smoother) on level 1 -------------------------------
     KSP Object: (mg_levels_1_) 1000 MPI processes
       type: chebyshev
          eigenvalue estimates used:  min = 0.0999997, max = 1.1
         eigenvalues estimate via gmres min 0.0078618, max 0.999997
          eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
         KSP Object: (mg_levels_1_esteig_) 1000 MPI processes
           type: gmres
             restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
             happy breakdown tolerance 1e-30
            maximum iterations=10, initial guess is zero
            tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
            left preconditioning
            using PRECONDITIONED norm type for convergence test
          estimating eigenvalues using noisy right hand side
       maximum iterations=2, nonzero initial guess
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
       left preconditioning
        using NONE norm type for convergence test
     PC Object:  (mg_levels_1_) 1000 MPI processes
       type: sor
          type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
       linear system matrix = precond matrix:
       Mat Object:  1000 MPI processes
          type: mpiaij
          rows=288, cols=288, bs=6
          total: nonzeros=78408, allocated nonzeros=78408
         total number of mallocs used during MatSetValues calls =0
            using I-node (on process 0) routines: found 86 nodes, limit used is 5
   Up solver (post-smoother) same as down solver (pre-smoother)
   Down solver (pre-smoother) on level 2 -------------------------------
     KSP Object: (mg_levels_2_)  1000 MPI processes
       type: chebyshev
         eigenvalue estimates used:  min = 0.139457, max = 1.53403
          eigenvalues estimate via gmres min 0.077969, max 1.39457
         eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
         KSP Object: (mg_levels_2_esteig_)  1000 MPI processes
            type: gmres
              restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
             happy breakdown tolerance 1e-30
           maximum iterations=10, initial guess is zero
            tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
           left preconditioning
            using PRECONDITIONED norm type for convergence test
         estimating eigenvalues using noisy right hand side
       maximum iterations=2, nonzero initial guess
       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
       left preconditioning
        using NONE norm type for convergence test
     PC Object:  (mg_levels_2_) 1000 MPI processes
       type: sor
         type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
       linear system matrix = precond matrix:
        Mat Object: 1000 MPI processes
          type: mpiaij
          rows=10254, cols=10254, bs=6
         total: nonzeros=12883716, allocated nonzeros=12883716
          total number of mallocs used during MatSetValues calls =0
            using I-node (on process 0) routines: found 8 nodes, limit used is 5
   Up solver (post-smoother) same as down solver (pre-smoother)
   Down solver (pre-smoother) on level 3 -------------------------------
     KSP Object: (mg_levels_3_) 1000 MPI processes
        type: chebyshev
          eigenvalue estimates used:  min = 0.14493, max = 1.59423
          eigenvalues estimate via gmres min 0.356008, max 1.4493
          eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
         KSP Object: (mg_levels_3_esteig_)  1000 MPI processes
            type: gmres
              restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
              happy breakdown tolerance 1e-30
           maximum iterations=10, initial guess is zero
            tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
            left preconditioning
            using PRECONDITIONED norm type for convergence test
          estimating eigenvalues using noisy right hand side
       maximum iterations=2, nonzero initial guess
       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
       using NONE norm type for convergence test
     PC Object: (mg_levels_3_)  1000 MPI processes
       type: sor
         type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
       linear system matrix = precond matrix:
        Mat Object: 1000 MPI processes
         type: mpiaij
         rows=332466, cols=332466, bs=6
         total: nonzeros=286141284, allocated nonzeros=286141284
          total number of mallocs used during MatSetValues calls =0
           using nonscalable MatPtAP() implementation
            using I-node (on process 0) routines: found 88 nodes, limit used is 5
   Up solver (post-smoother) same as down solver (pre-smoother)
    Down solver (pre-smoother) on level 4 -------------------------------
     KSP Object: (mg_levels_4_) 1000 MPI processes
       type: chebyshev
          eigenvalue estimates used:  min = 0.175972, max = 1.93569
          eigenvalues estimate via gmres min 0.145536, max 1.75972
          eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
          KSP Object: (mg_levels_4_esteig_) 1000 MPI processes
           type: gmres
              restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
              happy breakdown tolerance 1e-30
            maximum iterations=10, initial guess is zero
           tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
           left preconditioning
           using PRECONDITIONED norm type for convergence test
         estimating eigenvalues using noisy right hand side
       maximum iterations=2, nonzero initial guess
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
       left preconditioning
       using NONE norm type for convergence test
      PC Object: (mg_levels_4_) 1000 MPI processes
       type: sor
         type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
        linear system matrix = precond matrix:
       Mat Object: 1000 MPI processes
          type: mpiaij
         rows=5142126, cols=5142126, bs=6
         total: nonzeros=1363101804, allocated nonzeros=1363101804
          total number of mallocs used during MatSetValues calls =0
            using nonscalable MatPtAP() implementation
            using I-node (on process 0) routines: found 1522 nodes, limit used is 5
   Up solver (post-smoother) same as down solver (pre-smoother)
   Down solver (pre-smoother) on level 5 -------------------------------
      KSP Object: (mg_levels_5_) 1000 MPI processes
       type: chebyshev
          eigenvalue estimates used:  min = 0.234733, max = 2.58207
          eigenvalues estimate via gmres min 0.061528, max 2.34733
         eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
         KSP Object: (mg_levels_5_esteig_) 1000 MPI processes
            type: gmres
             restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
              happy breakdown tolerance 1e-30
           maximum iterations=10, initial guess is zero
            tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
           left preconditioning
            using PRECONDITIONED norm type for convergence test
         estimating eigenvalues using noisy right hand side
       maximum iterations=2, nonzero initial guess
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
       left preconditioning
       using NONE norm type for convergence test
      PC Object: (mg_levels_5_) 1000 MPI processes
       type: sor
          type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
        linear system matrix = precond matrix:
        Mat Object: 1000 MPI processes
          type: mpiaij
         rows=117874305, cols=117874305, bs=3
         total: nonzeros=9333251991, allocated nonzeros=9333251991
          total number of mallocs used during MatSetValues calls =0
            has attached near null space
            using I-node (on process 0) routines: found 39198 nodes, limit used is 5
   Up solver (post-smoother) same as down solver (pre-smoother)
   linear system matrix = precond matrix:
   Mat Object:  1000 MPI processes
     type: mpiaij
     rows=117874305, cols=117874305, bs=3
     total: nonzeros=9333251991, allocated nonzeros=9333251991
     total number of mallocs used during MatSetValues calls =0
       has attached near null space
       using I-node (on process 0) routines: found 39198 nodes, limit used is 5
-------------- next part --------------
 ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

 Unknown Name on a arch-linux2-c-opt-mpi-ml-hypre named eobm0011 with 2000 processors, by B07947 Thu Nov 15 15:47:29 2018
 Using Petsc Release Version 3.8.2, Nov, 09, 2017 

                          Max       Max/Min        Avg      Total 
 Time (sec):           2.837e+02      1.00021   2.836e+02
 Objects:              1.409e+03      1.00142   1.407e+03
 Flop:                 3.920e+10      1.16752   3.710e+10  7.420e+13
 Flop/sec:            1.382e+08      1.16751   1.308e+08  2.616e+11
 MPI Messages:         4.031e+05     10.62284   1.243e+05  2.486e+08
 MPI Message Lengths:  6.348e+08      4.13328   2.721e+03  6.762e+11
 MPI Reductions:       2.654e+03      1.00000

 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length N --> 2N flop
                             and VecAXPY() for complex vectors of length N --> 8N flop

 Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
  0:      Main Stage: 2.8364e+02 100.0%  7.4202e+13 100.0%  2.484e+08  99.9%  2.718e+03       99.9%  2.622e+03  98.8% 

 ------------------------------------------------------------------------------------------------------------------------
 See the 'Profiling' chapter of the users' manual for details on interpreting output.
 Phase summary info:
    Count: number of times phase was executed
    Time and Flop: Max - maximum over all processors
                    Ratio - ratio of maximum to minimum over all processors
    Mess: number of messages sent
    Avg. len: average message length (bytes)
    Reduct: number of global reductions
    Global: entire computation
    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
       %T - percent time in this phase         %F - percent flop in this phase
       %M - percent messages in this phase     %L - percent message lengths in this phase
       %R - percent reductions in this phase
    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
 ------------------------------------------------------------------------------------------------------------------------
 Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 ------------------------------------------------------------------------------------------------------------------------

 --- Event Stage 0: Main Stage

 MatMult             7470 1.0 4.7611e+01 1.9 2.11e+10 1.2 2.0e+08 2.9e+03 0.0e+00 11 53 81 86  0  11 53 81 86  0 827107
 MatMultAdd          1150 1.0 3.8834e+00 3.5 8.06e+08 1.2 1.7e+07 5.7e+02 0.0e+00  1  2  7  1  0   1  2  7  1  0 388724
 MatMultTranspose    1150 1.0 6.7493e+00 7.4 8.06e+08 1.2 1.7e+07 5.7e+02 0.0e+00  1  2  7  1  0   1  2  7  1  0 223663
 MatSolve             230 0.0 8.3327e-04 0.0 6.35e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    76
 MatSOR              6955 1.0 2.9793e+01 2.8 1.41e+10 1.1 0.0e+00 0.0e+00 0.0e+00  9 37  0  0  0   9 37  0  0  0 912909
 MatLUFactorSym         1 1.0 4.5509e-03561.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatLUFactorNum         1 1.0 3.5341e-031852.9 1.09e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatScale              15 1.0 1.7186e-02 3.3 4.62e+06 1.2 1.4e+05 7.0e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0 508009
 MatResidual         1150 1.0 7.0952e+00 2.3 2.75e+09 1.2 3.1e+07 2.5e+03 0.0e+00  2  7 13 12  0   2  7 13 12  0 713964
 MatAssemblyBegin     112 1.0 1.0418e+00 4.7 0.00e+00 0.0 4.3e+05 5.3e+04 7.4e+01  0  0  0  3  3   0  0  0  3  3     0
 MatAssemblyEnd       112 1.0 5.9064e-01 1.1 0.00e+00 0.0 1.6e+06 2.4e+02 2.6e+02  0  0  1  0 10   0  0  1  0 10     0
 MatGetRow         291670 1.0 3.9900e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatGetRowIJ            1 0.0 4.3106e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCreateSubMat        6 1.0 4.7464e-02 1.0 0.00e+00 0.0 7.5e+04 2.0e+03 1.0e+02  0  0  0  0  4   0  0  0  0  4     0
 MatGetOrdering         1 0.0 1.0009e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCoarsen             5 1.0 3.4372e-02 1.1 0.00e+00 0.0 3.0e+06 3.0e+02 5.9e+01  0  0  1  0  2   0  0  1  0  2     0
 MatZeroEntries         5 1.0 5.3163e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatView               90 1.3 6.1949e-01 5.2 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+01  0  0  0  0  3   0  0  0  0  3     0
 MatAXPY                5 1.0 4.1116e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
 MatMatMult             5 1.0 4.7434e-01 1.2 7.18e+07 1.2 7.5e+05 7.0e+03 8.2e+01  0  0  0  1  3   0  0  0  1  3 278596
 MatMatMultSym          5 1.0 2.8504e-01 1.0 0.00e+00 0.0 6.2e+05 5.3e+03 7.0e+01  0  0  0  0  3   0  0  0  0  3     0
 MatMatMultNum          5 1.0 1.1035e-01 1.0 7.18e+07 1.2 1.4e+05 1.5e+04 1.0e+01  0  0  0  0  0   0  0  0  0  0 1197494
 MatPtAP                5 1.0 2.6336e+00 1.0 8.35e+08 1.7 1.7e+06 3.0e+04 8.7e+01  1  2  1  7  3   1  2  1  7  3 472910
 MatPtAPSymbolic        5 1.0 1.6345e+00 1.0 0.00e+00 0.0 7.2e+05 3.8e+04 3.7e+01  1  0  0  4  1   1  0  0  4  1     0
 MatPtAPNumeric         5 1.0 1.0015e+00 1.0 8.35e+08 1.7 9.3e+05 2.3e+04 5.0e+01  0  2  0  3  2   0  2  0  3  2 1243604
 MatTrnMatMult          1 1.0 8.1209e-01 1.0 2.97e+07 1.3 1.5e+05 5.0e+04 1.9e+01  0  0  0  1  1   0  0  0  1  1 69321
 MatTrnMatMultSym       1 1.0 4.7897e-01 1.0 0.00e+00 0.0 1.3e+05 2.1e+04 1.7e+01  0  0  0  0  1   0  0  0  0  1     0
 MatTrnMatMultNum       1 1.0 3.3517e-01 1.0 2.97e+07 1.3 2.4e+04 2.0e+05 2.0e+00  0  0  0  1  0   0  0  0  1  0 167958
 MatGetLocalMat        17 1.0 3.8855e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatGetBrAoCol         15 1.0 6.2124e-02 2.5 0.00e+00 0.0 9.5e+05 2.9e+04 0.0e+00  0  0  0  4  0   0  0  0  4  0     0
 VecMDot              333 1.0 1.2028e+0113.7 3.44e+08 1.0 0.0e+00 0.0e+00 3.3e+02  1  1  0  0 13   1  1  0  0 13 56587
 VecNorm              603 1.0 2.7685e+00 4.5 6.16e+07 1.0 0.0e+00 0.0e+00 6.0e+02  0  0  0  0 23   0  0  0  0 23 43942
 VecScale             353 1.0 9.8841e-03 1.8 1.59e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 3172544
 VecCopy             1410 1.0 7.7031e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecSet              4468 1.0 5.7269e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecAXPY              250 1.0 3.3906e-02 2.1 2.89e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 1683301
 VecAYPX             9440 1.0 3.6537e-01 2.9 1.59e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 854103
 VecAXPBYCZ          4600 1.0 2.7121e-01 3.2 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 2092658
 VecMAXPY             583 1.0 7.1103e-01 2.7 7.03e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 1955563
 VecAssemblyBegin     185 1.0 1.1164e-01 1.4 0.00e+00 0.0 4.9e+04 1.4e+04 5.5e+02  0  0  0  0 21   0  0  0  0 21     0
 VecAssemblyEnd       185 1.0 3.3379e-04 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecPointwiseMult      55 1.0 2.9728e-03 2.9 6.90e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 456523
 VecScatterBegin     9954 1.0 1.0825e+00 7.3 0.00e+00 0.0 2.4e+08 2.5e+03 0.0e+00  0  0 97 90  0   0  0 97 90  0     0
 VecScatterEnd       9954 1.0 3.8453e+0111.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   5  0  0  0  0     0
 VecSetRandom           5 1.0 2.1403e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecNormalize         113 1.0 7.4105e-02 6.0 4.68e+06 1.0 0.0e+00 0.0e+00 1.1e+02  0  0  0  0  4   0  0  0  0  4 124201
 KSPGMRESOrthog       330 1.0 1.2168e+0110.7 6.86e+08 1.0 0.0e+00 0.0e+00 3.3e+02  1  2  0  0 12   1  2  0  0 13 111406
 KSPSetUp              18 1.0 1.2172e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  1   0  0  0  0  1     0
 KSPSolve              10 1.0 6.5991e+01 1.0 3.82e+10 1.2 2.4e+08 2.6e+03 8.2e+02 23 98 95 89 31  23 98 95 89 31 1098603
 PCGAMGGraph_AGG        5 1.0 6.7798e-01 1.0 1.13e+06 1.2 6.8e+05 2.8e+02 1.3e+02  0  0  0  0  5   0  0  0  0  5  3197
 PCGAMGCoarse_AGG       5 1.0 8.5740e-01 1.0 2.97e+07 1.3 3.3e+06 2.7e+03 9.4e+01  0  0  1  1  4   0  0  1  1  4 65658
 PCGAMGProl_AGG         5 1.0 2.4710e-01 1.0 0.00e+00 0.0 4.8e+06 9.8e+02 9.0e+02  0  0  2  1 34   0  0  2  1 35     0
 PCGAMGPOpt_AGG         5 1.0 7.5785e-01 1.0 2.12e+08 1.2 2.1e+06 4.1e+03 2.4e+02  0  1  1  1  9   0  1  1  1  9 518589
 GAMG: createProl       5 1.0 2.5407e+00 1.0 2.43e+08 1.2 1.1e+07 2.1e+03 1.4e+03  1  1  4  3 51   1  1  4  3 52 177698
   Graph               10 1.0 6.7570e-01 1.0 1.13e+06 1.2 6.8e+05 2.8e+02 1.3e+02  0  0  0  0  5   0  0  0  0  5  3208
   MIS/Agg              5 1.0 3.4434e-02 1.1 0.00e+00 0.0 3.0e+06 3.0e+02 5.9e+01  0  0  1  0  2   0  0  1  0  2     0
   SA: col data         5 1.0 1.3094e-01 1.0 0.00e+00 0.0 4.6e+06 8.2e+02 8.4e+02  0  0  2  1 31   0  0  2  1 32     0
   SA: frmProl0         5 1.0 1.1028e-01 1.0 0.00e+00 0.0 1.9e+05 4.7e+03 5.0e+01  0  0  0  0  2   0  0  0  0  2     0
   SA: smooth           5 1.0 5.2676e-01 1.2 7.53e+07 1.2 7.5e+05 7.0e+03 1.0e+02  0  0  0  1  4   0  0  0  1  4 263330
 GAMG: partLevel        5 1.0 2.7087e+00 1.0 8.35e+08 1.7 1.7e+06 2.9e+04 2.5e+02  1  2  1  7  9   1  2  1  7  9 459805
   repartition          3 1.0 5.6183e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  1   0  0  0  0  1     0
   Invert-Sort          3 1.0 7.8020e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
   Move A               3 1.0 4.1104e-02 1.1 0.00e+00 0.0 3.2e+04 4.5e+03 5.4e+01  0  0  0  0  2   0  0  0  0  2     0
   Move P               3 1.0 1.8200e-02 1.3 0.00e+00 0.0 4.3e+04 3.6e+01 5.4e+01  0  0  0  0  2   0  0  0  0  2     0
 PCSetUp                2 1.0 5.2812e+00 1.0 1.08e+09 1.5 1.3e+07 5.7e+03 1.6e+03  2  2  5 11 62   2  2  5 11 63 321316
 PCSetUpOnBlocks      230 1.0 6.2256e-0319.5 1.09e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 PCApply              230 1.0 5.7271e+01 1.3 3.26e+10 1.2 2.2e+08 2.2e+03 1.0e+02 20 83 90 73  4  20 83 90 73  4 1074640
 SFSetGraph             5 1.0 5.3167e-0555.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 SFBcastBegin          69 1.0 1.1146e-02 1.9 0.00e+00 0.0 3.0e+06 3.0e+02 0.0e+00  0  0  1  0  0   0  0  1  0  0     0
 SFBcastEnd            69 1.0 7.9596e-03 7.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 BuildTwoSided          5 1.0 7.5631e-03 2.8 0.00e+00 0.0 6.9e+04 4.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 ------------------------------------------------------------------------------------------------------------------------

 Memory usage is given in bytes:

 Object Type          Creations   Destructions     Memory  Descendants' Mem.
 Reports information only for process 0.

 --- Event Stage 0: Main Stage

               Matrix   154            154    176666644     0.
       Matrix Coarsen     5              5         3140     0.
    Matrix Null Space     1              1          688     0.
               Vector  1043           1043    297405224     0.
       Vector Scatter    36             36        39456     0.
            Index Set   112            112       395240     0.
        Krylov Solver    18             18       330336     0.
       Preconditioner    13             13        12868     0.
          PetscRandom    10             10         6380     0.
    Star Forest Graph     5              5         4280     0.
               Viewer    12             11         9152     0.
 ========================================================================================================================
   0 KSP unpreconditioned resid norm 3.738834778097e+08 true resid norm 3.738834778097e+08 ||r(i)||/||b|| 1.000000000000e+00
   1 KSP unpreconditioned resid norm 1.256561415764e+08 true resid norm 1.256561415764e+08 ||r(i)||/||b|| 3.360836972859e-01
   2 KSP unpreconditioned resid norm 1.843932942229e+07 true resid norm 1.843932942229e+07 ||r(i)||/||b|| 4.931838531703e-02
   3 KSP unpreconditioned resid norm 6.189553415818e+06 true resid norm 6.189553415818e+06 ||r(i)||/||b|| 1.655476581120e-02
   4 KSP unpreconditioned resid norm 2.614928212473e+06 true resid norm 2.614928212473e+06 ||r(i)||/||b|| 6.993965680944e-03
   5 KSP unpreconditioned resid norm 1.208975553355e+06 true resid norm 1.208975553355e+06 ||r(i)||/||b|| 3.233562393388e-03
   6 KSP unpreconditioned resid norm 5.481792905733e+05 true resid norm 5.481792905733e+05 ||r(i)||/||b|| 1.466176825423e-03
   7 KSP unpreconditioned resid norm 2.526854282559e+05 true resid norm 2.526854282559e+05 ||r(i)||/||b|| 6.758400497828e-04
   8 KSP unpreconditioned resid norm 1.150052500229e+05 true resid norm 1.150052500229e+05 ||r(i)||/||b|| 3.075965022488e-04
   9 KSP unpreconditioned resid norm 5.289416146528e+04 true resid norm 5.289416146528e+04 ||r(i)||/||b|| 1.414723158540e-04
  10 KSP unpreconditioned resid norm 2.495584369428e+04 true resid norm 2.495584369427e+04 ||r(i)||/||b|| 6.674765047246e-05
  11 KSP unpreconditioned resid norm 1.184780633606e+04 true resid norm 1.184780633605e+04 ||r(i)||/||b|| 3.168849932994e-05
  12 KSP unpreconditioned resid norm 5.709557885707e+03 true resid norm 5.709557885717e+03 ||r(i)||/||b|| 1.527095532321e-05
  13 KSP unpreconditioned resid norm 2.811037623050e+03 true resid norm 2.811037623058e+03 ||r(i)||/||b|| 7.518485811476e-06
  14 KSP unpreconditioned resid norm 1.399589249024e+03 true resid norm 1.399589249031e+03 ||r(i)||/||b|| 3.743383519460e-06
  15 KSP unpreconditioned resid norm 6.919705622362e+02 true resid norm 6.919705622376e+02 ||r(i)||/||b|| 1.850765287333e-06
  16 KSP unpreconditioned resid norm 3.469221128804e+02 true resid norm 3.469221128823e+02 ||r(i)||/||b|| 9.278883220907e-07
  17 KSP unpreconditioned resid norm 1.747835577077e+02 true resid norm 1.747835577094e+02 ||r(i)||/||b|| 4.674813627318e-07
  18 KSP unpreconditioned resid norm 8.648881836541e+01 true resid norm 8.648881835829e+01 ||r(i)||/||b|| 2.313255960519e-07
  19 KSP unpreconditioned resid norm 4.247581916935e+01 true resid norm 4.247581916507e+01 ||r(i)||/||b|| 1.136071040472e-07
  20 KSP unpreconditioned resid norm 2.086023330347e+01 true resid norm 2.086023330200e+01 ||r(i)||/||b|| 5.579340767933e-08
  21 KSP unpreconditioned resid norm 1.023525173739e+01 true resid norm 1.023525174086e+01 ||r(i)||/||b|| 2.737551228746e-08
  22 KSP unpreconditioned resid norm 4.963414450847e+00 true resid norm 4.963414447514e+00 ||r(i)||/||b|| 1.327529763174e-08
  23 KSP unpreconditioned resid norm 2.415620601642e+00 true resid norm 2.415620604831e+00 ||r(i)||/||b|| 6.460891556327e-09
 Linear solve converged due to CONVERGED_RTOL iterations 23
 KSP Object:  2000 MPI processes
   type: fgmres
      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
      happy breakdown tolerance 1e-30
   maximum iterations=10000, initial guess is zero
   tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
   right preconditioning
   using UNPRECONDITIONED norm type for convergence test
 PC Object: 2000 MPI processes
   type: gamg
     type is MULTIPLICATIVE, levels=6 cycles=v
        Cycles per PCApply=1
       Using externally compute Galerkin coarse grid matrices
       GAMG specific options
          Threshold for dropping small values in graph on each level =    0.    0.    0.     0.   
          Threshold scaling factor for each level not specified = 1.
          AGG specific options
            Symmetric graph false
           Number of levels to square graph 1
            Number smoothing steps 1
    Coarse grid solver -- level -------------------------------
      KSP Object:  (mg_coarse_)  2000 MPI processes
         type: preonly
         maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
        using NONE norm type for convergence test
      PC Object:  (mg_coarse_) 2000 MPI processes
        type: bjacobi
            number of blocks = 2000
            Local solve is same for all blocks, in the following KSP and PC objects:
       KSP Object: (mg_coarse_sub_)  1 MPI processes
         type: preonly
         maximum iterations=1, initial guess is zero
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object:  (mg_coarse_sub_) 1 MPI processes
         type: lu
           out-of-place factorization
           tolerance for zero pivot 2.22045e-14
           using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
           matrix ordering: nd
           factor fill ratio given 5., needed 1.
             Factored matrix follows:
               Mat Object:  1 MPI processes
                 type: seqaij
                 rows=12, cols=12, bs=6
                 package used to perform factorization: petsc
                 total: nonzeros=144, allocated nonzeros=144
                 total number of mallocs used during MatSetValues calls =0
                   using I-node routines: found 3 nodes, limit used is 5
         linear system matrix = precond matrix:
         Mat Object:  1 MPI processes
           type: seqaij
           rows=12, cols=12, bs=6
           total: nonzeros=144, allocated nonzeros=144
           total number of mallocs used during MatSetValues calls =0
             using I-node routines: found 3 nodes, limit used is 5
         linear system matrix = precond matrix:
          Mat Object: 2000 MPI processes
          type: mpiaij
            rows=12, cols=12, bs=6
           total: nonzeros=144, allocated nonzeros=144
           total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 3 nodes, limit used is 5
   Down solver (pre-smoother) on level 1 -------------------------------
      KSP Object:  (mg_levels_1_)  2000 MPI processes
        type: chebyshev
           eigenvalue estimates used:  min = 0.0999937, max = 1.09993
            eigenvalues estimate via gmres min 0.075342, max 0.999937
           eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
           KSP Object:  (mg_levels_1_esteig_)  2000 MPI processes
             type: gmres
               restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                  happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
              left preconditioning
             using PRECONDITIONED norm type for convergence test
            estimating eigenvalues using noisy right hand side
         maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
       PC Object:  (mg_levels_1_) 2000 MPI processes
        type: sor
            type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
        linear system matrix = precond matrix:
         Mat Object: 2000 MPI processes
          type: mpiaij
            rows=318, cols=318, bs=6
           total: nonzeros=90828, allocated nonzeros=90828
           total number of mallocs used during MatSetValues calls =0
                using I-node (on process 0) routines: found 87 nodes, limit used is 5
   Up solver (post-smoother) same as down solver (pre-smoother)
   Down solver (pre-smoother) on level 2 -------------------------------
      KSP Object:  (mg_levels_2_) 2000 MPI processes
         type: chebyshev
           eigenvalue estimates used:  min = 0.130639, max = 1.43703
            eigenvalues estimate via gmres min 0.077106, max 1.30639
           eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
           KSP Object: (mg_levels_2_esteig_)  2000 MPI processes
               type: gmres
               restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                happy breakdown tolerance 1e-30
               maximum iterations=10, initial guess is zero
               tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
              left preconditioning
             using PRECONDITIONED norm type for convergence test
           estimating eigenvalues using noisy right hand side
         maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
      PC Object:  (mg_levels_2_)  2000 MPI processes
        type: sor
           type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
        linear system matrix = precond matrix:
          Mat Object:  2000 MPI processes
           type: mpiaij
             rows=9870, cols=9870, bs=6
           total: nonzeros=11941884, allocated nonzeros=11941884
           total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 9 nodes, limit used is 5
    Up solver (post-smoother) same as down solver (pre-smoother)
    Down solver (pre-smoother) on level 3 -------------------------------
       KSP Object:  (mg_levels_3_)  2000 MPI processes
         type: chebyshev
           eigenvalue estimates used:  min = 0.151779, max = 1.66957
          eigenvalues estimate via gmres min 0.352485, max 1.51779
           eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
             KSP Object:  (mg_levels_3_esteig_)  2000 MPI processes
              type: gmres
                 restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                 happy breakdown tolerance 1e-30
             maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
              left preconditioning
              using PRECONDITIONED norm type for convergence test
            estimating eigenvalues using noisy right hand side
         maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
        using NONE norm type for convergence test
       PC Object:  (mg_levels_3_)  2000 MPI processes
         type: sor
            type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
         linear system matrix = precond matrix:
         Mat Object:  2000 MPI processes
            type: mpiaij
            rows=334476, cols=334476, bs=6
         total: nonzeros=292009536, allocated nonzeros=292009536
         total number of mallocs used during MatSetValues calls =0
           using nonscalable MatPtAP() implementation
              using I-node (on process 0) routines: found 50 nodes, limit used is 5
   Up solver (post-smoother) same as down solver (pre-smoother)
    Down solver (pre-smoother) on level 4 -------------------------------
       KSP Object:  (mg_levels_4_)  2000 MPI processes
         type: chebyshev
           eigenvalue estimates used:  min = 0.181248, max = 1.99372
           eigenvalues estimate via gmres min 0.141976, max 1.81248
            eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
            KSP Object:  (mg_levels_4_esteig_)  2000 MPI processes
              type: gmres
                 restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                happy breakdown tolerance 1e-30
               maximum iterations=10, initial guess is zero
               tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
               left preconditioning
               using PRECONDITIONED norm type for convergence test
            estimating eigenvalues using noisy right hand side
        maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
      PC Object:  (mg_levels_4_)  2000 MPI processes
        type: sor
            type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
        linear system matrix = precond matrix:
        Mat Object:  2000 MPI processes
           type: mpiaij
            rows=5160228, cols=5160228, bs=6
             total: nonzeros=1375082208, allocated nonzeros=1375082208
             total number of mallocs used during MatSetValues calls =0
                using nonscalable MatPtAP() implementation
                using I-node (on process 0) routines: found 792 nodes, limit used is 5
    Up solver (post-smoother) same as down solver (pre-smoother)
    Down solver (pre-smoother) on level 5 -------------------------------
       KSP Object:  (mg_levels_5_)  2000 MPI processes
         type: chebyshev
            eigenvalue estimates used:  min = 0.23761, max = 2.61371
            eigenvalues estimate via gmres min 0.0632228, max 2.3761
            eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
             KSP Object:  (mg_levels_5_esteig_)  2000 MPI processes
               type: gmres
                  restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                  happy breakdown tolerance 1e-30
               maximum iterations=10, initial guess is zero
               tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
               left preconditioning
               using PRECONDITIONED norm type for convergence test
            estimating eigenvalues using noisy right hand side
         maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object:  (mg_levels_5_)  2000 MPI processes
         type: sor
            type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
         linear system matrix = precond matrix:
          Mat Object:  2000 MPI processes
            type: mpiaij
             rows=117874305, cols=117874305, bs=3
             total: nonzeros=9333251991, allocated nonzeros=9333251991
             total number of mallocs used during MatSetValues calls =0
               has attached near null space
                using I-node (on process 0) routines: found 19690 nodes, limit used is 5
    Up solver (post-smoother) same as down solver (pre-smoother)
   linear system matrix = precond matrix:
    Mat Object:  2000 MPI processes
      type: mpiaij
       rows=117874305, cols=117874305, bs=3
       total: nonzeros=9333251991, allocated nonzeros=9333251991
       total number of mallocs used during MatSetValues calls =0
         has attached near null space
          using I-node (on process 0) routines: found 19690 nodes, limit used is 5
-------------- next part --------------
 ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

 Unknown Name on a arch-linux2-c-opt-mpi-ml-hypre named eocn0055 with 1500 processors, by B07947 Thu Nov 15 15:55:02 2018
 Using Petsc Release Version 3.8.2, Nov, 09, 2017 

                          Max       Max/Min        Avg      Total 
 Time (sec):           2.296e+02      1.00007   2.296e+02
 Objects:              1.409e+03      1.00142   1.407e+03
 Flop:                 5.219e+10      1.14806   4.965e+10  7.447e+13
 Flop/sec:            2.273e+08      1.14806   2.162e+08  3.243e+11
 MPI Messages:         4.774e+05     14.16274   1.262e+05  1.893e+08
 MPI Message Lengths:  7.718e+08      4.12637   3.102e+03  5.872e+11
 MPI Reductions:       2.667e+03      1.00000

 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length N --> 2N flop
                             and VecAXPY() for complex vectors of length N --> 8N flop

 Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
  0:      Main Stage: 2.2961e+02 100.0%  7.4472e+13 100.0%  1.892e+08  99.9%  3.099e+03       99.9%  2.635e+03  98.8% 

 ------------------------------------------------------------------------------------------------------------------------
 See the 'Profiling' chapter of the users' manual for details on interpreting output.
 Phase summary info:
    Count: number of times phase was executed
    Time and Flop: Max - maximum over all processors
                    Ratio - ratio of maximum to minimum over all processors
    Mess: number of messages sent
    Avg. len: average message length (bytes)
    Reduct: number of global reductions
    Global: entire computation
    Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
       %T - percent time in this phase         %F - percent flop in this phase
       %M - percent messages in this phase     %L - percent message lengths in this phase
       %R - percent reductions in this phase
    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
 ------------------------------------------------------------------------------------------------------------------------
 Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 ------------------------------------------------------------------------------------------------------------------------

 --- Event Stage 0: Main Stage

 MatMult             7470 1.0 6.1871e+01 1.8 2.79e+10 1.2 1.5e+08 3.3e+03 0.0e+00 18 53 81 86  0  18 53 81 86  0 636062
 MatMultAdd          1150 1.0 4.4228e+00 3.1 1.07e+09 1.2 1.3e+07 6.4e+02 0.0e+00  1  2  7  1  0   1  2  7  1  0 340660
 MatMultTranspose    1150 1.0 5.8074e+00 4.5 1.07e+09 1.2 1.3e+07 6.4e+02 0.0e+00  1  2  7  1  0   1  2  7  1  0 259436
 MatSolve             230 0.0 7.8106e-04 0.0 6.35e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    81
 MatSOR              6955 1.0 4.0051e+01 2.6 1.90e+10 1.1 0.0e+00 0.0e+00 0.0e+00 15 37  0  0  0  15 37  0  0  0 686820
 MatLUFactorSym         1 1.0 1.9209e-03175.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatLUFactorNum         1 1.0 1.7691e-03927.5 1.09e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
 MatScale              15 1.0 2.3391e-02 4.6 6.13e+06 1.1 1.0e+05 8.0e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0 372687
 MatResidual         1150 1.0 9.1807e+00 2.2 3.63e+09 1.2 2.4e+07 2.8e+03 0.0e+00  2  7 13 12  0   2  7 13 12  0 551315
 MatAssemblyBegin     112 1.0 9.0080e-01 2.5 0.00e+00 0.0 3.2e+05 6.2e+04 7.4e+01  0  0  0  3  3   0  0  0  3  3     0
 MatAssemblyEnd       112 1.0 6.8422e-01 1.1 0.00e+00 0.0 1.2e+06 2.7e+02 2.6e+02  0  0  1  0 10   0  0  1  0 10     0
 MatGetRow         388852 1.0 5.5644e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatGetRowIJ            1 0.0 1.6968e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCreateSubMat        6 1.0 6.8178e-02 1.0 0.00e+00 0.0 8.2e+04 1.8e+03 1.0e+02  0  0  0  0  4   0  0  0  0  4     0
 MatGetOrdering         1 0.0 1.8709e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatCoarsen             5 1.0 3.8623e-02 1.1 0.00e+00 0.0 2.7e+06 2.9e+02 7.2e+01  0  0  1  0  3   0  0  1  0  3     0
 MatZeroEntries         5 1.0 6.3353e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatView               90 1.3 6.1051e-01 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+01  0  0  0  0  3   0  0  0  0  3     0
 MatAXPY                5 1.0 6.4298e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
 MatMatMult             5 1.0 5.4698e-01 1.0 9.46e+07 1.2 5.7e+05 8.1e+03 8.2e+01  0  0  0  1  3   0  0  0  1  3 241395
 MatMatMultSym          5 1.0 3.6737e-01 1.0 0.00e+00 0.0 4.6e+05 6.1e+03 7.0e+01  0  0  0  0  3   0  0  0  0  3     0
 MatMatMultNum          5 1.0 1.7525e-01 1.0 9.46e+07 1.2 1.0e+05 1.7e+04 1.0e+01  0  0  0  0  0   0  0  0  0  0 753412
 MatPtAP                5 1.0 3.4278e+00 1.0 1.10e+09 1.6 1.2e+06 3.4e+04 8.7e+01  1  2  1  7  3   1  2  1  7  3 361157
 MatPtAPSymbolic        5 1.0 2.2084e+00 1.0 0.00e+00 0.0 5.4e+05 4.4e+04 3.7e+01  1  0  0  4  1   1  0  0  4  1     0
 MatPtAPNumeric         5 1.0 1.2233e+00 1.0 1.10e+09 1.6 7.0e+05 2.7e+04 5.0e+01  1  2  0  3  2   1  2  0  3  2 1011960
 MatTrnMatMult          1 1.0 1.0668e+00 1.0 3.95e+07 1.3 1.1e+05 6.0e+04 1.9e+01  0  0  0  1  1   0  0  0  1  1 52637
 MatTrnMatMultSym       1 1.0 6.2306e-01 1.0 0.00e+00 0.0 9.0e+04 2.5e+04 1.7e+01  0  0  0  0  1   0  0  0  0  1     0
 MatTrnMatMultNum       1 1.0 4.4524e-01 1.0 3.95e+07 1.3 1.8e+04 2.4e+05 2.0e+00  0  0  0  1  0   0  0  0  1  0 126116
 MatGetLocalMat        17 1.0 5.2980e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 MatGetBrAoCol         15 1.0 7.0928e-02 2.7 0.00e+00 0.0 7.2e+05 3.2e+04 0.0e+00  0  0  0  4  0   0  0  0  4  0     0
 VecMDot              333 1.0 6.9139e+00 5.8 4.59e+08 1.0 0.0e+00 0.0e+00 3.3e+02  1  1  0  0 12   1  1  0  0 13 98444
 VecNorm              603 1.0 3.4630e+00 7.2 8.20e+07 1.0 0.0e+00 0.0e+00 6.0e+02  0  0  0  0 23   0  0  0  0 23 35129
 VecScale             353 1.0 5.6857e-02 5.8 2.11e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 551520
 VecCopy             1410 1.0 1.0825e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecSet              4468 1.0 8.2095e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecAXPY              250 1.0 5.0242e-02 2.3 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 1135966
 VecAYPX             9440 1.0 5.1576e-01 2.6 2.11e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 605004
 VecAXPBYCZ          4600 1.0 3.6595e-01 2.7 3.85e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 1550752
 VecMAXPY             583 1.0 1.0786e+00 3.4 9.38e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 1289187
 VecAssemblyBegin     185 1.0 7.6495e-02 1.2 0.00e+00 0.0 3.5e+04 1.6e+04 5.5e+02  0  0  0  0 20   0  0  0  0 21     0
 VecAssemblyEnd       185 1.0 3.8767e-04 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecPointwiseMult      55 1.0 4.6344e-03 2.8 9.20e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 292821
 VecScatterBegin     9954 1.0 1.1589e+00 7.6 0.00e+00 0.0 1.8e+08 2.9e+03 0.0e+00  0  0 97 90  0   0  0 97 90  0     0
 VecScatterEnd       9954 1.0 4.8668e+0111.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
 VecSetRandom           5 1.0 2.8229e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 VecNormalize         113 1.0 1.3515e-01 3.0 6.23e+06 1.0 0.0e+00 0.0e+00 1.1e+02  0  0  0  0  4   0  0  0  0  4 68095
 KSPGMRESOrthog       330 1.0 7.1848e+00 4.6 9.14e+08 1.0 0.0e+00 0.0e+00 3.3e+02  1  2  0  0 12   1  2  0  0 13 188679
 KSPSetUp              18 1.0 1.2331e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  1   0  0  0  0  1     0
 KSPSolve              10 1.0 8.7217e+01 1.0 5.09e+10 1.1 1.8e+08 2.9e+03 8.2e+02 38 98 94 89 31  38 98 94 89 31 834427
 PCGAMGGraph_AGG        5 1.0 8.6509e-01 1.0 1.50e+06 1.2 5.2e+05 3.2e+02 1.3e+02  0  0  0  0  5   0  0  0  0  5  2505
 PCGAMGCoarse_AGG       5 1.0 1.1177e+00 1.0 3.95e+07 1.3 2.9e+06 2.7e+03 1.1e+02  0  0  2  1  4   0  0  2  1  4 50240
 PCGAMGProl_AGG         5 1.0 3.2632e-01 1.0 0.00e+00 0.0 3.7e+06 1.1e+03 9.0e+02  0  0  2  1 34   0  0  2  1 34     0
 PCGAMGPOpt_AGG         5 1.0 9.1948e-01 1.0 2.80e+08 1.2 1.6e+06 4.7e+03 2.4e+02  0  1  1  1  9   0  1  1  1  9 427090
 GAMG: createProl       5 1.0 3.2296e+00 1.0 3.20e+08 1.2 8.7e+06 2.3e+03 1.4e+03  1  1  5  3 52   1  1  5  3 52 139652
   Graph               10 1.0 8.6263e-01 1.0 1.50e+06 1.2 5.2e+05 3.2e+02 1.3e+02  0  0  0  0  5   0  0  0  0  5  2512
   MIS/Agg              5 1.0 3.8683e-02 1.1 0.00e+00 0.0 2.7e+06 2.9e+02 7.2e+01  0  0  1  0  3   0  0  1  0  3     0
   SA: col data         5 1.0 1.6884e-01 1.0 0.00e+00 0.0 3.5e+06 9.4e+02 8.4e+02  0  0  2  1 31   0  0  2  1 32     0
   SA: frmProl0         5 1.0 1.5309e-01 1.0 0.00e+00 0.0 1.4e+05 5.5e+03 5.0e+01  0  0  0  0  2   0  0  0  0  2     0
   SA: smooth           5 1.0 6.2292e-01 1.0 9.92e+07 1.2 5.7e+05 8.1e+03 1.0e+02  0  0  0  1  4   0  0  0  1  4 222484
 GAMG: partLevel        5 1.0 3.5234e+00 1.0 1.10e+09 1.6 1.3e+06 3.2e+04 2.5e+02  2  2  1  7  9   2  2  1  7  9 351357
   repartition          3 1.0 4.2057e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  1   0  0  0  0  1     0
   Invert-Sort          3 1.0 3.6008e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
   Move A               3 1.0 5.2288e-02 1.1 0.00e+00 0.0 4.3e+04 3.4e+03 5.4e+01  0  0  0  0  2   0  0  0  0  2     0
   Move P               3 1.0 2.8588e-02 1.2 0.00e+00 0.0 3.8e+04 3.3e+01 5.4e+01  0  0  0  0  2   0  0  0  0  2     0
 PCSetUp                2 1.0 6.7734e+00 1.0 1.42e+09 1.5 1.0e+07 6.2e+03 1.7e+03  3  2  5 11 62   3  2  5 11 63 249358
 PCSetUpOnBlocks      230 1.0 3.0496e-03 6.5 1.09e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 PCApply              230 1.0 7.5882e+01 1.1 4.35e+10 1.2 1.7e+08 2.5e+03 1.0e+02 32 83 90 73  4  32 83 90 73  4 814742
 SFSetGraph             5 1.0 2.3127e-0524.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 SFBcastBegin          82 1.0 1.2156e-02 1.9 0.00e+00 0.0 2.7e+06 2.9e+02 0.0e+00  0  0  1  0  0   0  0  1  0  0     0
 SFBcastEnd            82 1.0 9.9132e-03 9.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 BuildTwoSided          5 1.0 7.6580e-03 2.6 0.00e+00 0.0 5.2e+04 4.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
 ------------------------------------------------------------------------------------------------------------------------

 Memory usage is given in bytes:

 Object Type          Creations   Destructions     Memory  Descendants' Mem.
 Reports information only for process 0.

 --- Event Stage 0: Main Stage

               Matrix   154            154    245066064     0.
       Matrix Coarsen     5              5         3140     0.
    Matrix Null Space     1              1          688     0.
               Vector  1043           1043    395730096     0.
       Vector Scatter    36             36        39456     0.
            Index Set   112            112       566408     0.
        Krylov Solver    18             18       330336     0.
       Preconditioner    13             13        12868     0.
          PetscRandom    10             10         6380     0.
    Star Forest Graph     5              5         4280     0.
               Viewer    12             11         9152     0.
 ========================================================================================================================
   0 KSP unpreconditioned resid norm 3.738834778994e+08 true resid norm 3.738834778994e+08 ||r(i)||/||b|| 1.000000000000e+00
   1 KSP unpreconditioned resid norm 1.279113569974e+08 true resid norm 1.279113569974e+08 ||r(i)||/||b|| 3.421155642289e-01
   2 KSP unpreconditioned resid norm 1.874944207644e+07 true resid norm 1.874944207644e+07 ||r(i)||/||b|| 5.014782194118e-02
   3 KSP unpreconditioned resid norm 6.305464086727e+06 true resid norm 6.305464086727e+06 ||r(i)||/||b|| 1.686478397536e-02
   4 KSP unpreconditioned resid norm 2.648974672476e+06 true resid norm 2.648974672476e+06 ||r(i)||/||b|| 7.085027365634e-03
   5 KSP unpreconditioned resid norm 1.239886218685e+06 true resid norm 1.239886218685e+06 ||r(i)||/||b|| 3.316236988195e-03
   6 KSP unpreconditioned resid norm 5.641563718944e+05 true resid norm 5.641563718944e+05 ||r(i)||/||b|| 1.508909607517e-03
   7 KSP unpreconditioned resid norm 2.606746938444e+05 true resid norm 2.606746938444e+05 ||r(i)||/||b|| 6.972083797577e-04
   8 KSP unpreconditioned resid norm 1.184535518381e+05 true resid norm 1.184535518381e+05 ||r(i)||/||b|| 3.168194339682e-04
   9 KSP unpreconditioned resid norm 5.392667623794e+04 true resid norm 5.392667623794e+04 ||r(i)||/||b|| 1.442339108990e-04
  10 KSP unpreconditioned resid norm 2.520203694105e+04 true resid norm 2.520203694106e+04 ||r(i)||/||b|| 6.740612632217e-05
  11 KSP unpreconditioned resid norm 1.185967319435e+04 true resid norm 1.185967319434e+04 ||r(i)||/||b|| 3.172023877859e-05
  12 KSP unpreconditioned resid norm 5.627359926956e+03 true resid norm 5.627359926969e+03 ||r(i)||/||b|| 1.505110618577e-05
  13 KSP unpreconditioned resid norm 2.702021069922e+03 true resid norm 2.702021069923e+03 ||r(i)||/||b|| 7.226906856392e-06
  14 KSP unpreconditioned resid norm 1.307500233445e+03 true resid norm 1.307500233448e+03 ||r(i)||/||b|| 3.497079466561e-06
  15 KSP unpreconditioned resid norm 6.250158790292e+02 true resid norm 6.250158790312e+02 ||r(i)||/||b|| 1.671686276545e-06
  16 KSP unpreconditioned resid norm 3.038680168367e+02 true resid norm 3.038680168345e+02 ||r(i)||/||b|| 8.127345410977e-07
  17 KSP unpreconditioned resid norm 1.504350436399e+02 true resid norm 1.504350436436e+02 ||r(i)||/||b|| 4.023580942618e-07
  18 KSP unpreconditioned resid norm 7.388944694136e+01 true resid norm 7.388944694645e+01 ||r(i)||/||b|| 1.976269381081e-07
  19 KSP unpreconditioned resid norm 3.596911660459e+01 true resid norm 3.596911660288e+01 ||r(i)||/||b|| 9.620408156298e-08
  20 KSP unpreconditioned resid norm 1.769248937152e+01 true resid norm 1.769248936529e+01 ||r(i)||/||b|| 4.732086441662e-08
  21 KSP unpreconditioned resid norm 8.746482066795e+00 true resid norm 8.746482056876e+00 ||r(i)||/||b|| 2.339360408761e-08
  22 KSP unpreconditioned resid norm 4.283455600167e+00 true resid norm 4.283455596172e+00 ||r(i)||/||b|| 1.145665922506e-08
  23 KSP unpreconditioned resid norm 2.096047551274e+00 true resid norm 2.096047547699e+00 ||r(i)||/||b|| 5.606151840341e-09
 Linear solve converged due to CONVERGED_RTOL iterations 23
 KSP Object:  1500 MPI processes
   type: fgmres
      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
      happy breakdown tolerance 1e-30
   maximum iterations=10000, initial guess is zero
   tolerances:  relative=1e-08, absolute=1e-50, divergence=10000.
   right preconditioning
   using UNPRECONDITIONED norm type for convergence test
 PC Object:  1500 MPI processes
   type: gamg
      type is MULTIPLICATIVE, levels=6 cycles=v
        Cycles per PCApply=1
        Using externally compute Galerkin coarse grid matrices
        GAMG specific options
          Threshold for dropping small values in graph on each level =     0.     0.     0.     0.    
          Threshold scaling factor for each level not specified = 1.
          AGG specific options
            Symmetric graph false
            Number of levels to square graph 1
            Number smoothing steps 1
    Coarse grid solver -- level -------------------------------
       KSP Object:  (mg_coarse_)  1500 MPI processes
         type: preonly
         maximum iterations=10000, initial guess is zero
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object:  (mg_coarse_)  1500 MPI processes
         type: bjacobi
            number of blocks = 1500
            Local solve is same for all blocks, in the following KSP and PC objects:
       KSP Object:  (mg_coarse_sub_)  1 MPI processes
         type: preonly
         maximum iterations=1, initial guess is zero
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object:  (mg_coarse_sub_)  1 MPI processes
         type: lu
           out-of-place factorization
           tolerance for zero pivot 2.22045e-14
           using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
           matrix ordering: nd
           factor fill ratio given 5., needed 1.
             Factored matrix follows:
               Mat Object:  1 MPI processes
                 type: seqaij
                 rows=12, cols=12, bs=6
                 package used to perform factorization: petsc
                 total: nonzeros=144, allocated nonzeros=144
                 total number of mallocs used during MatSetValues calls =0
                   using I-node routines: found 3 nodes, limit used is 5
         linear system matrix = precond matrix:
         Mat Object:  1 MPI processes
           type: seqaij
           rows=12, cols=12, bs=6
           total: nonzeros=144, allocated nonzeros=144
           total number of mallocs used during MatSetValues calls =0
             using I-node routines: found 3 nodes, limit used is 5
         linear system matrix = precond matrix:
          Mat Object:  1500 MPI processes
            type: mpiaij
             rows=12, cols=12, bs=6
          total: nonzeros=144, allocated nonzeros=144
            total number of mallocs used during MatSetValues calls =0
                using I-node (on process 0) routines: found 3 nodes, limit used is 5
    Down solver (pre-smoother) on level 1 -------------------------------
       KSP Object:  (mg_levels_1_)  1500 MPI processes
         type: chebyshev
            eigenvalue estimates used:  min = 0.0999807, max = 1.09979
            eigenvalues estimate via gmres min 0.310311, max 0.999807
            eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
             KSP Object:  (mg_levels_1_esteig_)  1500 MPI processes
               type: gmres
                  restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                  happy breakdown tolerance 1e-30
               maximum iterations=10, initial guess is zero
               tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
               left preconditioning
               using PRECONDITIONED norm type for convergence test
            estimating eigenvalues using noisy right hand side
         maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object:  (mg_levels_1_)  1500 MPI processes
         type: sor
            type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
         linear system matrix = precond matrix:
          Mat Object:  1500 MPI processes
            type: mpiaij
             rows=312, cols=312, bs=6
             total: nonzeros=90792, allocated nonzeros=90792
             total number of mallocs used during MatSetValues calls =0
                using I-node (on process 0) routines: found 87 nodes, limit used is 5
    Up solver (post-smoother) same as down solver (pre-smoother)
    Down solver (pre-smoother) on level 2 -------------------------------
       KSP Object:  (mg_levels_2_)  1500 MPI processes
         type: chebyshev
            eigenvalue estimates used:  min = 0.128747, max = 1.41622
            eigenvalues estimate via gmres min 0.191833, max 1.28747
            eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
             KSP Object:  (mg_levels_2_esteig_)  1500 MPI processes
               type: gmres
                  restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                  happy breakdown tolerance 1e-30
               maximum iterations=10, initial guess is zero
               tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
               left preconditioning
               using PRECONDITIONED norm type for convergence test
            estimating eigenvalues using noisy right hand side
         maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object:  (mg_levels_2_)  1500 MPI processes
         type: sor
            type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
         linear system matrix = precond matrix:
          Mat Object:  1500 MPI processes
            type: mpiaij
             rows=9990, cols=9990, bs=6
             total: nonzeros=11862180, allocated nonzeros=11862180
             total number of mallocs used during MatSetValues calls =0
                using I-node (on process 0) routines: found 6 nodes, limit used is 5
    Up solver (post-smoother) same as down solver (pre-smoother)
    Down solver (pre-smoother) on level 3 -------------------------------
       KSP Object:  (mg_levels_3_)  1500 MPI processes
         type: chebyshev
            eigenvalue estimates used:  min = 0.149515, max = 1.64466
            eigenvalues estimate via gmres min 0.342896, max 1.49515
            eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
             KSP Object:  (mg_levels_3_esteig_)  1500 MPI processes
               type: gmres
                  restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                  happy breakdown tolerance 1e-30
               maximum iterations=10, initial guess is zero
               tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
               left preconditioning
               using PRECONDITIONED norm type for convergence test
            estimating eigenvalues using noisy right hand side
         maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object:  (mg_levels_3_)  1500 MPI processes
         type: sor
            type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
         linear system matrix = precond matrix:
          Mat Object:  1500 MPI processes
            type: mpiaij
             rows=333960, cols=333960, bs=6
             total: nonzeros=289654416, allocated nonzeros=289654416
             total number of mallocs used during MatSetValues calls =0
                using nonscalable MatPtAP() implementation
                using I-node (on process 0) routines: found 39 nodes, limit used is 5
    Up solver (post-smoother) same as down solver (pre-smoother)
    Down solver (pre-smoother) on level 4 -------------------------------
       KSP Object:  (mg_levels_4_)  1500 MPI processes
         type: chebyshev
            eigenvalue estimates used:  min = 0.173537, max = 1.90891
            eigenvalues estimate via gmres min 0.143849, max 1.73537
            eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
             KSP Object:  (mg_levels_4_esteig_)  1500 MPI processes
               type: gmres
                  restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                  happy breakdown tolerance 1e-30
               maximum iterations=10, initial guess is zero
               tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
               left preconditioning
               using PRECONDITIONED norm type for convergence test
            estimating eigenvalues using noisy right hand side
         maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object:  (mg_levels_4_)  1500 MPI processes
         type: sor
            type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
         linear system matrix = precond matrix:
          Mat Object:  1500 MPI processes
            type: mpiaij
             rows=5149116, cols=5149116, bs=6
             total: nonzeros=1368332496, allocated nonzeros=1368332496
             total number of mallocs used during MatSetValues calls =0
                using nonscalable MatPtAP() implementation
                using I-node (on process 0) routines: found 976 nodes, limit used is 5
    Up solver (post-smoother) same as down solver (pre-smoother)
    Down solver (pre-smoother) on level 5 -------------------------------
       KSP Object:  (mg_levels_5_)  1500 MPI processes
         type: chebyshev
            eigenvalue estimates used:  min = 0.241719, max = 2.65891
            eigenvalues estimate via gmres min 0.0638427, max 2.41719
            eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
             KSP Object:  (mg_levels_5_esteig_)  1500 MPI processes
               type: gmres
                  restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                  happy breakdown tolerance 1e-30
               maximum iterations=10, initial guess is zero
               tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
               left preconditioning
               using PRECONDITIONED norm type for convergence test
            estimating eigenvalues using noisy right hand side
         maximum iterations=2, nonzero initial guess
         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
         left preconditioning
         using NONE norm type for convergence test
       PC Object:  (mg_levels_5_)  1500 MPI processes
         type: sor
            type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
         linear system matrix = precond matrix:
          Mat Object:  1500 MPI processes
            type: mpiaij
             rows=117874305, cols=117874305, bs=3
             total: nonzeros=9333251991, allocated nonzeros=9333251991
             total number of mallocs used during MatSetValues calls =0
               has attached near null space
                using I-node (on process 0) routines: found 26223 nodes, limit used is 5
    Up solver (post-smoother) same as down solver (pre-smoother)
   linear system matrix = precond matrix:
    Mat Object:  1500 MPI processes
      type: mpiaij
       rows=117874305, cols=117874305, bs=3
       total: nonzeros=9333251991, allocated nonzeros=9333251991
       total number of mallocs used during MatSetValues calls =0
         has attached near null space
          using I-node (on process 0) routines: found 26223 nodes, limit used is 5