[petsc-users] PETSc (3.9.0) GAMG weak scaling test issue

Wed Nov 7 10:02:15 CST 2018

Dear All,

we are performing a weak scaling test of the PETSc (v3.9.0) GAMG 
preconditioner when applied to the linear system arising
from the *conforming unfitted FE discretization *(using Q1 Lagrangian 
FEs) of a 3D PDE Poisson problem, where
the boundary of the domain (a popcorn flake)  is described as a 
zero-level-set embedded within a uniform background
(Cartesian-like) hexahedral mesh. Details underlying the FEM formulation 
can be made available on demand if you
believe that this might be helpful, but let me just point out that it is 
designed such that it addresses the well-known
ill-conditioning issues of unfitted FE discretizations due to the small 
cut cell problem.

The weak scaling test is set up as follows. We start from a single cube 
background mesh, and refine it uniformly several
steps, until we have approximately either 10**3 (load1), 20**3 (load2), 
or 40**3 (load3) hexahedra/MPI task when
distributing it over 4 MPI tasks. The benchmark is scaled such that the 
next larger scale problem to be tested is obtained
by uniformly refining the mesh from the previous scale and running it on 
8x times the number of MPI tasks that we used
in the previous scale.  As a result, we obtain three weak scaling curves 
for each of the three fixed loads per MPI task
above, on the following total number of MPI tasks: 4, 32, 262, 2097, 
16777. The underlying mesh is not partitioned among
MPI tasks using ParMETIS (unstructured multilevel graph partitioning)  
nor optimally by hand, but following the so-called
z-shape space-filling curves provided by an underlying octree-like mesh 
handler (i.e., p4est library).

I configured the preconditioned linear solver as follows:

-ksp_type cg
-ksp_monitor
-ksp_rtol 1.0e-6
-ksp_converged_reason
-ksp_max_it 500
-ksp_norm_type unpreconditioned
-ksp_view
-log_view

-pc_type gamg
-pc_gamg_type agg
-mg_levels_esteig_ksp_type cg
-mg_coarse_sub_pc_type cholesky
-mg_coarse_sub_pc_factor_mat_ordering_type nd
-pc_gamg_process_eq_limit 50
-pc_gamg_square_graph 0
-pc_gamg_agg_nsmooths 1

Raw timings (in seconds) of the preconditioner set up and PCG iterative 
solution stage, and number of iterations are as follows:

**preconditioner set up**
(load1): [0.02542160451, 0.05169247743, 0.09266782179, 0.2426272957, 
13.64161944]
(load2): [0.1239175797  , 0.1885528499  , 0.2719282564  , 0.4783878336, 
13.37947339]
(load3): [0.6565349903  , 0.9435049873  , 1.299908397    , 1.916243652  
, 16.02904088]

**PCG stage**
(load1): [0.003287350759, 0.008163803257, 0.03565631993, 0.08343045413, 
0.6937994603]
(load2): [0.0205939794    , 0.03594723623  , 0.07593298424, 
0.1212046621  , 0.6780373845]
(load3): [0.1310882876    , 0.3214917686    , 0.5532023879  , 
0.766881627    , 1.485446003]

**number of PCG iterations**
(load1): [5, 8, 11, 13, 13]
(load2): [7, 10, 12, 13, 13]
(load3): [8, 10, 12, 13, 13]

It can be observed that both the number of linear solver iterations and 
the PCG stage timings (weakly)
scale remarkably, but t*here is a significant time increase when scaling 
the problem from 2097 to 16777 MPI tasks **
**for the preconditioner setup stage* (e.g., 1.916243652 vs 16.02904088 
sec. with 40**3 cells per MPI task).
I gathered the combined output of -ksp_view and -log_view (only) for all 
the points involving the load3 weak scaling
test (find them attached to this message). Please note that within each 
run, I execute the these two stages up-to
three times, and this influences absolute timings given in -log_view.

Looking at the output of -log_view, it is very strange to me, e.g., that 
the stage labelled as "Graph"
does not scale properly as it is just a call to MatDuplicate if the 
block size of the matrix is 1 (our case), and
I guess that it is just a local operation that does not require any 
communication.
What I am missing here? The load does not seem to be unbalanced looking 
at the "Ratio" column.

I wonder whether the observed behaviour is as expected, or this a 
miss-configuration of the solver from our side.
I played (quite a lot) with several parameter-value combinations, and 
the configuration above is the one that led to fastest
execution  (from the ones tested, that might be incomplete, I can also 
provide further feedback if helpful).
Any feedback that we can get from your experience in order to find the 
cause(s) of this issue and a mitigating solution
will be of high added value.

Thanks very much in advance!
Best regards,
  Alberto.

-- 
Alberto F. Martín-Huertas
Senior Researcher, PhD. Computational Science
Centre Internacional de Mètodes Numèrics a l'Enginyeria (CIMNE)
Parc Mediterrani de la Tecnologia, UPC
Esteve Terradas 5, Building C3, Office 215,
08860 Castelldefels (Barcelona, Spain)
Tel.: (+34) 9341 34223
e-mail:amartin at cimne.upc.edu

FEMPAR project co-founder
web: http://www.fempar.org

________________
IMPORTANT NOTICE
All personal data contained on this mail will be processed confidentially and registered in a file property of CIMNE in
order to manage corporate communications. You may exercise the rights of access, rectification, erasure and object by
letter sent to Ed. C1 Campus Norte UPC. Gran Capitán s/n Barcelona.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20181107/6ccab3ca/attachment-0001.html>
-------------- next part --------------
KSP Object: 4 MPI processes
  type: cg
  maximum iterations=500, initial guess is zero
  tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: gamg
    type is MULTIPLICATIVE, levels=4 cycles=v
      Cycles per PCApply=1
      Using externally compute Galerkin coarse grid matrices
      GAMG specific options
        Threshold for dropping small values in graph on each level =   0.   0.  
        Threshold scaling factor for each level not specified = 1.
        AGG specific options
          Symmetric graph false
          Number of levels to square graph 0
          Number smoothing steps 1
  Coarse grid solver -- level -------------------------------
        KSP Object: (mg_coarse_) 4 MPI processes
          type: preonly
          maximum iterations=10000, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
          left preconditioning
          using NONE norm type for convergence test
        PC Object: (mg_coarse_) 4 MPI processes
          type: bjacobi
            number of blocks = 4
            Local solve is same for all blocks, in the following KSP and PC objects:
          KSP Object: (mg_coarse_sub_) 1 MPI processes
            type: preonly
            maximum iterations=1, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
            left preconditioning
            using NONE norm type for convergence test
          PC Object: (mg_coarse_sub_) 1 MPI processes
            type: cholesky
              out-of-place factorization
              tolerance for zero pivot 2.22045e-14
              matrix ordering: nd
              factor fill ratio given 5., needed 1.
                Factored matrix follows:
                  Mat Object: 1 MPI processes
                    type: seqsbaij
                    rows=6, cols=6
                    package used to perform factorization: petsc
                    total: nonzeros=21, allocated nonzeros=21
                    total number of mallocs used during MatSetValues calls =0
                        block size is 1
            linear system matrix = precond matrix:
            Mat Object: 1 MPI processes
              type: seqaij
              rows=6, cols=6
              total: nonzeros=36, allocated nonzeros=36
              total number of mallocs used during MatSetValues calls =0
                using I-node routines: found 2 nodes, limit used is 5
          linear system matrix = precond matrix:
          Mat Object: 4 MPI processes
            type: mpiaij
            rows=6, cols=6
            total: nonzeros=36, allocated nonzeros=36
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 2 nodes, limit used is 5
  Down solver (pre-smoother) on level 1 -------------------------------
      KSP Object: (mg_levels_1_) 4 MPI processes
        type: chebyshev
          eigenvalue estimates used:  min = 0.129951, max = 1.42946
          eigenvalues estimate via cg min 0.51315, max 1.29951
          eigenvalues estimated using cg with translations  [0. 0.1; 0. 1.1]
        KSP Object: (mg_levels_1_esteig_) 4 MPI processes
          type: cg
          maximum iterations=10, initial guess is zero
          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
          left preconditioning
          using PRECONDITIONED norm type for convergence test
          estimating eigenvalues using noisy right hand side
        maximum iterations=2, nonzero initial guess
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object: (mg_levels_1_) 4 MPI processes
        type: sor
          type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
        linear system matrix = precond matrix:
        Mat Object: 4 MPI processes
          type: mpiaij
          rows=201, cols=201
          total: nonzeros=24313, allocated nonzeros=24313
          total number of mallocs used during MatSetValues calls =0
            using nonscalable MatPtAP() implementation
            not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object: (mg_levels_2_) 4 MPI processes
      type: chebyshev
        eigenvalue estimates used:  min = 0.132036, max = 1.4524
        eigenvalues estimate via cg min 0.0839922, max 1.32036
        eigenvalues estimated using cg with translations  [0. 0.1; 0. 1.1]
      KSP Object: (mg_levels_2_esteig_) 4 MPI processes
        type: cg
        maximum iterations=10, initial guess is zero
        tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
        left preconditioning
        using PRECONDITIONED norm type for convergence test
        estimating eigenvalues using noisy right hand side
      maximum iterations=2, nonzero initial guess
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (mg_levels_2_) 4 MPI processes
      type: sor
        type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object: 4 MPI processes
        type: mpiaij
        rows=4621, cols=4621
        total: nonzeros=369149, allocated nonzeros=369149
        total number of mallocs used during MatSetValues calls =0
          using nonscalable MatPtAP() implementation
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
  KSP Object: (mg_levels_3_) 4 MPI processes
    type: chebyshev
      eigenvalue estimates used:  min = 0.167146, max = 1.83861
      eigenvalues estimate via cg min 0.0634859, max 1.67146
      eigenvalues estimated using cg with translations  [0. 0.1; 0. 1.1]
    KSP Object: (mg_levels_3_esteig_) 4 MPI processes
      type: cg
      maximum iterations=10, initial guess is zero
      tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
      left preconditioning
      using PRECONDITIONED norm type for convergence test
      estimating eigenvalues using noisy right hand side
    maximum iterations=2, nonzero initial guess
    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
    left preconditioning
    using NONE norm type for convergence test
  PC Object: (mg_levels_3_) 4 MPI processes
    type: sor
      type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
    linear system matrix = precond matrix:
    Mat Object: 4 MPI processes
      type: mpiaij
      rows=63511, cols=63511
      total: nonzeros=2301395, allocated nonzeros=38106600
      total number of mallocs used during MatSetValues calls =0
        not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object: 4 MPI processes
    type: mpiaij
    rows=63511, cols=63511
    total: nonzeros=2301395, allocated nonzeros=38106600
    total number of mallocs used during MatSetValues calls =0
      not using I-node (on process 0) routines

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/gpfs/scratch/upc26/upc26229/build_rel_fempar_cell_agg_ompi/FEMPAR/bin/par_test_h_adaptive_poisson_unfitted on a arch-linux2-c-opt named s14r2b46 with 4 processors, by upc26229 Wed Nov  7 01:07:35 2018
Using Petsc Release Version 3.9.0, Apr, 07, 2018 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.076e+02      1.00000   1.076e+02
Objects:              9.890e+02      1.00304   9.868e+02
Flop:                 6.620e+08      1.09228   6.334e+08  2.533e+09
Flop/sec:            6.150e+06      1.09228   5.884e+06  2.353e+07
MPI Messages:         3.141e+03      1.04997   3.054e+03  1.222e+04
MPI Message Lengths:  1.331e+07      1.02147   4.319e+03  5.277e+07
MPI Reductions:       1.427e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.0765e+02 100.0%  2.5334e+09 100.0%  1.222e+04 100.0%  4.319e+03      100.0%  1.414e+03  99.1% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          9 1.0 1.0124e-03 2.9 0.00e+00 0.0 5.4e+01 8.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BuildTwoSidedF        75 1.0 1.8647e-0111.1 0.00e+00 0.0 3.2e+02 5.0e+04 0.0e+00  0  0  3 30  0   0  0  3 30  0     0
VecMDot               90 1.0 4.3525e-03 1.9 5.94e+06 1.1 0.0e+00 0.0e+00 9.0e+01  0  1  0  0  6   0  1  0  0  6  5180
VecTDot              237 1.0 1.6390e-02 3.5 3.88e+06 1.1 0.0e+00 0.0e+00 2.4e+02  0  1  0  0 17   0  1  0  0 17   897
VecNorm              225 1.0 7.4959e-03 2.6 3.28e+06 1.1 0.0e+00 0.0e+00 2.2e+02  0  0  0  0 16   0  0  0  0 16  1661
VecScale              99 1.0 2.6664e-04 1.2 5.94e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  8457
VecCopy              105 1.0 5.5974e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               402 1.0 5.1273e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              237 1.0 1.1864e-03 1.1 3.88e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 12396
VecAYPX              678 1.0 3.3991e-03 1.2 6.00e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  6695
VecAXPBYCZ           288 1.0 2.0727e-03 1.1 8.64e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 15825
VecMAXPY              99 1.0 1.9828e-03 1.3 7.02e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 13441
VecAssemblyBegin      24 1.0 3.5199e-03 1.1 0.00e+00 0.0 6.0e+01 3.6e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd        24 1.0 1.2138e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult      99 1.0 6.6968e-04 1.1 5.94e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3367
VecScatterBegin      810 1.0 5.6504e-03 1.0 0.00e+00 0.0 9.3e+03 2.7e+03 0.0e+00  0  0 76 47  0   0  0 76 47  0     0
VecScatterEnd        810 1.0 1.4432e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSetRandom           9 1.0 1.5864e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize          99 1.0 1.6803e-03 1.7 1.78e+06 1.1 0.0e+00 0.0e+00 9.9e+01  0  0  0  0  7   0  0  0  0  7  4026
MatMult              636 1.0 1.9035e-01 1.0 3.12e+08 1.1 7.6e+03 3.0e+03 0.0e+00  0 47 62 44  0   0 47 62 44  0  6275
MatMultAdd            72 1.0 1.0663e-02 1.1 6.23e+06 1.1 6.5e+02 5.0e+02 0.0e+00  0  1  5  1  0   0  1  5  1  0  2248
MatMultTranspose      72 1.0 1.3565e-02 1.3 6.23e+06 1.1 6.5e+02 5.0e+02 0.0e+00  0  1  5  1  0   0  1  5  1  0  1767
MatSolve              24 0.0 4.1796e-05 0.0 1.58e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    38
MatSOR               531 1.0 2.2511e-01 1.1 2.33e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0 35  0  0  0   0 35  0  0  0  3950
MatCholFctrSym         3 1.0 4.2330e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCholFctrNum         3 1.0 4.0101e-05 1.8 1.80e+01 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             9 1.0 1.6046e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale              27 1.0 5.5906e-03 1.1 5.00e+06 1.1 1.1e+02 2.9e+03 0.0e+00  0  1  1  1  0   0  1  1  1  0  3428
MatResidual           72 1.0 2.0435e-02 1.1 3.38e+07 1.1 8.6e+02 2.9e+03 0.0e+00  0  5  7  5  0   0  5  7  5  0  6330
MatAssemblyBegin     168 1.0 2.3268e-01 1.4 0.00e+00 0.0 2.6e+02 6.1e+04 0.0e+00  0  0  2 30  0   0  0  2 30  0     0
MatAssemblyEnd       168 1.0 1.3291e-01 1.1 0.00e+00 0.0 7.8e+02 8.4e+02 3.6e+02  0  0  6  1 25   0  0  6  1 25     0
MatGetRow         162018 1.1 1.9420e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            3 0.0 4.2650e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMat        6 1.0 1.6496e-03 1.0 0.00e+00 0.0 4.8e+01 5.1e+01 9.6e+01  0  0  0  0  7   0  0  0  0  7     0
MatGetOrdering         3 0.0 2.7078e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen             9 1.0 7.8692e-03 1.1 0.00e+00 0.0 7.0e+02 2.2e+03 2.7e+01  0  0  6  3  2   0  0  6  3  2     0
MatZeroEntries         9 1.0 1.0505e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               21 1.4 3.8399e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.5e+01  0  0  0  0  1   0  0  0  0  1     0
MatAXPY                9 1.0 1.6539e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMatMult             9 1.0 7.2559e-02 1.0 4.22e+06 1.1 6.3e+02 2.1e+03 1.1e+02  0  1  5  2  8   0  1  5  2  8   223
MatMatMultSym          9 1.0 5.8217e-02 1.0 0.00e+00 0.0 5.2e+02 1.9e+03 1.1e+02  0  0  4  2  8   0  0  4  2  8     0
MatMatMultNum          9 1.0 1.4338e-02 1.0 4.22e+06 1.1 1.1e+02 2.9e+03 0.0e+00  0  1  1  1  0   0  1  1  1  0  1128
MatPtAP                9 1.0 3.5600e-01 1.0 5.53e+07 1.1 9.5e+02 1.7e+04 1.4e+02  0  9  8 30  9   0  9  8 30 10   605
MatPtAPSymbolic        9 1.0 2.5158e-01 1.0 0.00e+00 0.0 6.2e+02 1.4e+04 6.3e+01  0  0  5 16  4   0  0  5 16  4     0
MatPtAPNumeric         9 1.0 1.0436e-01 1.0 5.53e+07 1.1 3.3e+02 2.2e+04 7.2e+01  0  9  3 14  5   0  9  3 14  5  2064
MatGetLocalMat        27 1.0 1.0135e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol         27 1.0 7.4055e-03 1.2 0.00e+00 0.0 7.6e+02 9.9e+03 0.0e+00  0  0  6 14  0   0  0  6 14  0     0
KSPGMRESOrthog        90 1.0 5.7355e-03 1.4 1.19e+07 1.1 0.0e+00 0.0e+00 9.0e+01  0  2  0  0  6   0  2  0  0  6  7863
KSPSetUp              36 1.0 3.4705e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01  0  0  0  0  2   0  0  0  0  2     0
KSPSolve               3 1.0 4.3135e-01 1.0 5.40e+08 1.1 7.9e+03 2.6e+03 3.6e+02  0 81 64 39 25   0 81 64 39 26  4786
PCGAMGGraph_AGG        9 1.0 2.2263e-01 1.0 4.22e+06 1.1 3.2e+02 1.9e+03 1.1e+02  0  1  3  1  8   0  1  3  1  8    73
PCGAMGCoarse_AGG       9 1.0 1.1966e-02 1.0 0.00e+00 0.0 7.0e+02 2.2e+03 2.7e+01  0  0  6  3  2   0  0  6  3  2     0
PCGAMGProl_AGG         9 1.0 4.8133e-02 1.0 0.00e+00 0.0 3.8e+02 1.7e+03 1.4e+02  0  0  3  1 10   0  0  3  1 10     0
PCGAMGPOpt_AGG         9 1.0 1.4798e-01 1.0 6.22e+07 1.1 1.7e+03 2.6e+03 3.7e+02  0  9 14  8 26   0  9 14  8 26  1605
GAMG: createProl       9 1.0 4.3246e-01 1.0 6.64e+07 1.1 3.1e+03 2.3e+03 6.5e+02  0 10 26 14 45   0 10 26 14 46   586
  Graph               18 1.0 2.2127e-01 1.0 4.22e+06 1.1 3.2e+02 1.9e+03 1.1e+02  0  1  3  1  8   0  1  3  1  8    73
  MIS/Agg              9 1.0 7.9907e-03 1.1 0.00e+00 0.0 7.0e+02 2.2e+03 2.7e+01  0  0  6  3  2   0  0  6  3  2     0
  SA: col data         9 1.0 5.8624e-03 1.1 0.00e+00 0.0 2.2e+02 2.7e+03 3.6e+01  0  0  2  1  3   0  0  2  1  3     0
  SA: frmProl0         9 1.0 4.0594e-02 1.0 0.00e+00 0.0 1.7e+02 4.9e+02 7.2e+01  0  0  1  0  5   0  0  1  0  5     0
  SA: smooth           9 1.0 9.3119e-02 1.0 5.00e+06 1.1 6.3e+02 2.1e+03 1.3e+02  0  1  5  2  9   0  1  5  2  9   206
GAMG: partLevel        9 1.0 3.5860e-01 1.0 5.53e+07 1.1 1.0e+03 1.6e+04 2.9e+02  0  9  8 30 20   0  9  8 30 20   601
  repartition          3 1.0 1.7322e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  1   0  0  0  0  1     0
  Invert-Sort          3 1.0 1.5528e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  1   0  0  0  0  1     0
  Move A               3 1.0 1.1300e-03 1.0 0.00e+00 0.0 3.0e+01 6.6e+01 5.1e+01  0  0  0  0  4   0  0  0  0  4     0
  Move P               3 1.0 7.7750e-04 1.0 0.00e+00 0.0 1.8e+01 2.6e+01 5.1e+01  0  0  0  0  4   0  0  0  0  4     0
PCSetUp                6 1.0 7.9483e-01 1.0 1.22e+08 1.1 4.1e+03 5.7e+03 9.9e+02  1 19 34 44 69   1 19 34 44 70   590
PCSetUpOnBlocks       24 1.0 5.2736e-04 1.1 1.80e+01 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply               24 1.0 4.0983e-01 1.0 5.07e+08 1.1 7.6e+03 2.5e+03 2.9e+02  0 76 62 36 20   0 76 62 36 20  4727
SFSetGraph             9 1.0 2.1600e-06 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                9 1.0 1.6055e-03 1.7 0.00e+00 0.0 1.6e+02 1.8e+03 0.0e+00  0  0  1  1  0   0  0  1  1  0     0
SFBcastBegin          45 1.0 5.1013e-04 1.1 0.00e+00 0.0 5.4e+02 2.4e+03 0.0e+00  0  0  4  2  0   0  0  4  2  0     0
SFBcastEnd            45 1.0 7.3911e-04 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes: