[petsc-dev] Seeming performance regression with GAMG

Mon Apr 27 13:36:40 CDT 2015

   Lawrence,

    The git bisect command is a useful way to see exactly what commit caused a regression such as this.  In this case I think Toby pretty much found it for you.

  Barry

> On Apr 27, 2015, at 12:58 PM, Tobin Isaac <tisaac at ices.utexas.edu> wrote:
> 
> On Mon, Apr 27, 2015 at 04:06:30PM +0100, Lawrence Mitchell wrote:
>> Dear all,
>> 
>> we recently noticed a slowdown when using GAMG that I'm trying to
>> track down in a little more detail.  I'm solving an Hdiv-L2
>> "helmholtz" pressure correction using a schur complement.  I
>> precondition the schur complement with 'selfp', which morally looks
>> like a normal Helmholtz operator (except in the DG space).  The domain
>> is very anisotropic (a thin atmospheric shell), so getting round to
>> trying Toby's column-based coarsening plugin is on the horizon but I
>> haven't done it yet.
>> 
>> I don't have a good feel for exactly when things go worse, but here
>> are two data points:
>> 
>> A recentish master (e4b003c), and master from 26th Feb (30ab49e4).  I
>> notice in the former that MatPtAP takes significantly longer (full
>> logs below), different coarsening maybe?  As a point of comparison,
>> the PCSetup for Hypre takes ballpark half a second on the same operator.
>> 
>> I test with KSP ex6 (with a constant RHS):
>> 
>> Any ideas?
> 
> While there may be other changes that have affected your performance,
> I see two things in your logs:
> 
> - The coarse matrix is much smaller (3 vs. 592).  The default coarse
>  equations limit was recently changed from 800 to 50.  You can
>  recover the old behavior with `-pc_gamg_coarse_eq_limit 800`.
> - GAMG now uses the square of the adjacency graph only on the finest
>  level.  This means that matrices on the coarser levels will be
>  larger and have more entries, which probably explains the extra PtAP
>  time.  Maybe Mark can explain the decision to make this change.
> 
> Cheers,
>  Toby
> 
>> 
>> Cheers,
>> 
>> Lawrence
>> 
>> $ ./ex6-e4b003c -f helmholtz-sphere.dat  -ksp_type cg
>> -ksp_convergence_test skip -ksp_max_it 2  -ksp_monitor -table
>> -pc_type gamg  -log_summary -ksp_view
>>  0 KSP Residual norm 3.676132751311e-11
>>  1 KSP Residual norm 1.764616084171e-14
>>  2 KSP Residual norm 9.253867842133e-14
>> KSP Object: 1 MPI processes
>>  type: cg
>>  maximum iterations=2, initial guess is zero
>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>>  type: gamg
>>    MG: type is MULTIPLICATIVE, levels=5 cycles=v
>>      Cycles per PCApply=1
>>      Using Galerkin computed coarse grid matrices
>>      GAMG specific options
>>        Threshold for dropping small values from graph 0
>>        AGG specific options
>>          Symmetric graph false
>>  Coarse grid solver -- level -------------------------------
>>    KSP Object:    (mg_coarse_)     1 MPI processes
>>      type: gmres
>>        GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>>        GMRES: happy breakdown tolerance 1e-30
>>      maximum iterations=1, initial guess is zero
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_coarse_)     1 MPI processes
>>      type: bjacobi
>>        block Jacobi: number of blocks = 1
>>        Local solve is same for all blocks, in the following KSP and
>> PC objects:
>>        KSP Object:        (mg_coarse_sub_)         1 MPI processes
>>          type: preonly
>>          maximum iterations=1, initial guess is zero
>>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>          left preconditioning
>>          using NONE norm type for convergence test
>>        PC Object:        (mg_coarse_sub_)         1 MPI processes
>>          type: lu
>>            LU: out-of-place factorization
>>            tolerance for zero pivot 2.22045e-14
>>            using diagonal shift on blocks to prevent zero pivot
>> [INBLOCKS]
>>            matrix ordering: nd
>>            factor fill ratio given 5, needed 1
>>              Factored matrix follows:
>>                Mat Object:                 1 MPI processes
>>                  type: seqaij
>>                  rows=3, cols=3
>>                  package used to perform factorization: petsc
>>                  total: nonzeros=9, allocated nonzeros=9
>>                  total number of mallocs used during MatSetValues
>> calls =0
>>                    using I-node routines: found 1 nodes, limit used is 5
>>          linear system matrix = precond matrix:
>>          Mat Object:           1 MPI processes
>>            type: seqaij
>>            rows=3, cols=3
>>            total: nonzeros=9, allocated nonzeros=9
>>            total number of mallocs used during MatSetValues calls =0
>>              using I-node routines: found 1 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object:       1 MPI processes
>>        type: seqaij
>>        rows=3, cols=3
>>        total: nonzeros=9, allocated nonzeros=9
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node routines: found 1 nodes, limit used is 5
>>  Down solver (pre-smoother) on level 1 -------------------------------
>>    KSP Object:    (mg_levels_1_)     1 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.0999929, max = 1.09992
>>        Chebyshev: eigenvalues estimated using gmres with translations
>> [0 0.1; 0 1.1]
>>        KSP Object:        (mg_levels_1_esteig_)         1 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10
>>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>          left preconditioning
>>          using nonzero initial guess
>>          using NONE norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_1_)     1 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations
>> = 1, omega = 1
>>      linear system matrix = precond matrix:
>>      Mat Object:       1 MPI processes
>>        type: seqaij
>>        rows=93, cols=93
>>        total: nonzeros=8649, allocated nonzeros=8649
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node routines: found 19 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 2 -------------------------------
>>    KSP Object:    (mg_levels_2_)     1 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.0998389, max = 1.09823
>>        Chebyshev: eigenvalues estimated using gmres with translations
>> [0 0.1; 0 1.1]
>>        KSP Object:        (mg_levels_2_esteig_)         1 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10
>>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>          left preconditioning
>>          using nonzero initial guess
>>          using NONE norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_2_)     1 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations
>> = 1, omega = 1
>>      linear system matrix = precond matrix:
>>      Mat Object:       1 MPI processes
>>        type: seqaij
>>        rows=2991, cols=2991
>>        total: nonzeros=8.94608e+06, allocated nonzeros=8.94608e+06
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node routines: found 599 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 3 -------------------------------
>>    KSP Object:    (mg_levels_3_)     1 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.0998975, max = 1.09887
>>        Chebyshev: eigenvalues estimated using gmres with translations
>> [0 0.1; 0 1.1]
>>        KSP Object:        (mg_levels_3_esteig_)         1 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10
>>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>          left preconditioning
>>          using nonzero initial guess
>>          using NONE norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_3_)     1 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations
>> = 1, omega = 1
>>      linear system matrix = precond matrix:
>>      Mat Object:       1 MPI processes
>>        type: seqaij
>>        rows=35419, cols=35419
>>        total: nonzeros=1.55936e+07, allocated nonzeros=1.55936e+07
>>        total number of mallocs used during MatSetValues calls =1
>>          not using I-node routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 4 -------------------------------
>>    KSP Object:    (mg_levels_4_)     1 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.1, max = 1.1
>>        Chebyshev: eigenvalues estimated using gmres with translations
>> [0 0.1; 0 1.1]
>>        KSP Object:        (mg_levels_4_esteig_)         1 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10
>>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>          left preconditioning
>>          using nonzero initial guess
>>          using NONE norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_4_)     1 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations
>> = 1, omega = 1
>>      linear system matrix = precond matrix:
>>      Mat Object:       1 MPI processes
>>        type: seqaij
>>        rows=327680, cols=327680
>>        total: nonzeros=3.25828e+06, allocated nonzeros=3.25828e+06
>>        total number of mallocs used during MatSetValues calls =0
>>          not using I-node routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  linear system matrix = precond matrix:
>>  Mat Object:   1 MPI processes
>>    type: seqaij
>>    rows=327680, cols=327680
>>    total: nonzeros=3.25828e+06, allocated nonzeros=3.25828e+06
>>    total number of mallocs used during MatSetValues calls =0
>>      not using I-node routines
>> helmholt   2 9e+03  gamg
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>> ************************************************************************************************************************
>> 
>> ---------------------------------------------- PETSc Performance
>> Summary: ----------------------------------------------
>> 
>> ./ex6-master on a arch-linux2-c-opt named yam.doc.ic.ac.uk with 1
>> processor, by lmitche1 Mon Apr 27 16:03:36 2015
>> Using Petsc Development GIT revision: v3.5.3-2602-ga9b180a  GIT Date:
>> 2015-04-07 20:34:49 -0500
>> 
>>                         Max       Max/Min        Avg      Total
>> Time (sec):           1.072e+02      1.00000   1.072e+02
>> Objects:              2.620e+02      1.00000   2.620e+02
>> Flops:                4.582e+10      1.00000   4.582e+10  4.582e+10
>> Flops/sec:            4.275e+08      1.00000   4.275e+08  4.275e+08
>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Reductions:       0.000e+00      0.00000
>> 
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                            e.g., VecAXPY() for real vectors of length
>> N --> 2N flops
>>                            and VecAXPY() for complex vectors of
>> length N --> 8N flops
>> 
>> Summary of Stages:   ----- Time ------  ----- Flops -----  ---
>> Messages ---  -- Message Lengths --  -- Reductions --
>>                        Avg     %Total     Avg     %Total   counts
>> %Total     Avg         %Total   counts   %Total
>> 0:      Main Stage: 1.0898e-01   0.1%  7.4996e+06   0.0%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>> 1:       mystage 1: 1.0466e+02  97.6%  4.2348e+10  92.4%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>> 2:       mystage 2: 2.4395e+00   2.3%  3.4689e+09   7.6%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>> 
>> -
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops: Max - maximum over all processors
>>                   Ratio - ratio of maximum to minimum over all processors
>>   Mess: number of messages sent
>>   Avg. len: average message length (bytes)
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>>      %T - percent time in this phase         %F - percent flops in
>> this phase
>>      %M - percent messages in this phase     %L - percent message
>> lengths in this phase
>>      %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>> -
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flops
>>           --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
>> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> -
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> --- Event Stage 0: Main Stage
>> 
>> ThreadCommRunKer       2 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMult                1 1.0 4.4990e-03 1.0 6.19e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   4 83  0  0  0  1376
>> MatAssemblyBegin       1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         1 1.0 1.0468e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0  10  0  0  0  0     0
>> MatLoad                1 1.0 9.3672e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0  86  0  0  0  0     0
>> VecNorm                1 1.0 9.1791e-05 1.0 6.55e+05 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  9  0  0  0  7140
>> VecSet                 5 1.0 3.1860e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   3  0  0  0  0     0
>> VecAXPY                1 1.0 3.9697e-04 1.0 6.55e+05 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  9  0  0  0  1651
>> 
>> --- Event Stage 1: mystage 1
>> 
>> MatMult               40 1.0 3.5990e-01 1.0 5.52e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1535
>> MatConvert             4 1.0 1.2038e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatScale              12 1.0 8.2839e-02 1.0 7.21e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   870
>> MatAssemblyBegin      31 1.0 1.4067e-05 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd        31 1.0 1.0995e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetRow        1464732 1.0 8.5680e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatCoarsen             4 1.0 2.4768e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAXPY                4 1.0 8.1362e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMatMult             4 1.0 6.0399e-01 1.0 6.39e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0   106
>> MatMatMultSym          4 1.0 4.4536e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMatMultNum          4 1.0 1.5859e-01 1.0 6.39e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   403
>> MatPtAP                4 1.0 1.0255e+02 1.0 4.15e+10 1.0 0.0e+00
>> 0.0e+00 0.0e+00 96 91  0  0  0  98 98  0  0  0   405
>> MatPtAPSymbolic        4 1.0 6.1707e+01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 58  0  0  0  0  59  0  0  0  0     0
>> MatPtAPNumeric         4 1.0 4.0840e+01 1.0 4.15e+10 1.0 0.0e+00
>> 0.0e+00 0.0e+00 38 91  0  0  0  39 98  0  0  0  1017
>> MatTrnMatMult          1 1.0 1.9648e-01 1.0 2.34e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   119
>> MatTrnMatMultSym       1 1.0 1.3640e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatTrnMatMultNum       1 1.0 6.0079e-02 1.0 2.34e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   389
>> MatGetSymTrans         5 1.0 8.4374e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecMDot               40 1.0 1.5124e-02 1.0 4.03e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2663
>> VecNorm               44 1.0 1.1821e-03 1.0 8.06e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6815
>> VecScale              44 1.0 1.4737e-03 1.0 4.03e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2733
>> VecCopy                4 1.0 3.2711e-04 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet               143 1.0 2.7236e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY                4 1.0 3.9601e-04 1.0 7.32e+05 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1849
>> VecMAXPY              44 1.0 1.9676e-02 1.0 4.76e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2419
>> VecAssemblyBegin       4 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd         4 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult      44 1.0 7.9026e-03 1.0 4.03e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   510
>> VecSetRandom           4 1.0 3.6092e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecNormalize          44 1.0 2.6934e-03 1.0 1.21e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4486
>> KSPGMRESOrthog        40 1.0 3.1765e-02 1.0 8.06e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2536
>> KSPSetUp              10 1.0 1.1930e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCGAMGGraph_AGG        4 1.0 5.7353e-01 1.0 5.56e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    97
>> PCGAMGCoarse_AGG       4 1.0 2.5846e-01 1.0 2.34e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    90
>> PCGAMGProl_AGG         4 1.0 4.9806e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCGAMGPOpt_AGG         4 1.0 1.2128e+00 1.0 7.38e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0   608
>> GAMG: createProl       4 1.0 2.0973e+00 1.0 8.17e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   389
>>  Graph                8 1.0 5.7211e-01 1.0 5.56e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    97
>>  MIS/Agg              4 1.0 2.4861e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>  SA: col data         4 1.0 7.6509e-04 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>  SA: frmProl0         4 1.0 4.6539e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>  SA: smooth           4 1.0 1.2128e+00 1.0 7.38e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0   608
>> GAMG: partLevel        4 1.0 1.0255e+02 1.0 4.15e+10 1.0 0.0e+00
>> 0.0e+00 0.0e+00 96 91  0  0  0  98 98  0  0  0   405
>> PCSetUp                1 1.0 1.0465e+02 1.0 4.23e+10 1.0 0.0e+00
>> 0.0e+00 0.0e+00 98 92  0  0  0 100100  0  0  0   405
>> 
>> --- Event Stage 2: mystage 2
>> 
>> MatMult              121 1.0 1.0144e+00 1.0 1.61e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00  1  4  0  0  0  42 47  0  0  0  1592
>> MatMultAdd            12 1.0 3.1757e-02 1.0 4.95e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   1  1  0  0  0  1558
>> MatMultTranspose      12 1.0 3.7137e-02 1.0 4.95e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   2  1  0  0  0  1332
>> MatSolve               6 1.0 9.2983e-06 1.0 9.00e+01 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    10
>> MatSOR               116 1.0 1.2805e+00 1.0 1.61e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00  1  4  0  0  0  52 47  0  0  0  1260
>> MatLUFactorSym         1 1.0 1.5020e-05 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatLUFactorNum         1 1.0 5.9605e-06 1.0 1.60e+01 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     3
>> MatResidual           12 1.0 1.0432e-01 1.0 1.67e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   4  5  0  0  0  1599
>> MatGetRowIJ            1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetOrdering         1 1.0 4.2915e-05 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatView                8 1.0 6.7115e-04 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecMDot               43 1.0 1.3515e-02 1.0 4.03e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   1  1  0  0  0  2980
>> VecTDot                4 1.0 1.1048e-03 1.0 2.62e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2373
>> VecNorm               53 1.0 1.4720e-03 1.0 1.00e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6809
>> VecScale              50 1.0 1.4396e-03 1.0 4.03e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2798
>> VecCopy               21 1.0 3.6387e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet               108 1.0 1.0684e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY               15 1.0 2.1303e-03 1.0 4.09e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1918
>> VecAYPX               97 1.0 1.1820e-02 1.0 1.16e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   985
>> VecAXPBYCZ            48 1.0 8.2519e-03 1.0 2.20e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0  2663
>> VecMAXPY              50 1.0 1.6957e-02 1.0 4.76e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   1  1  0  0  0  2807
>> VecNormalize          50 1.0 2.6865e-03 1.0 1.21e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4498
>> KSPGMRESOrthog        43 1.0 2.7892e-02 1.0 8.06e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   1  2  0  0  0  2888
>> KSPSetUp               5 1.0 2.9690e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 2.4374e+00 1.0 3.47e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00  2  8  0  0  0 100100  0  0  0  1423
>> PCSetUp                1 1.0 9.3937e-05 1.0 1.60e+01 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCSetUpOnBlocks        3 1.0 9.6798e-05 1.0 1.60e+01 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCApply                3 1.0 2.4240e+00 1.0 3.45e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00  2  8  0  0  0  99 99  0  0  0  1423
>> -
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> Memory usage is given in bytes:
>> 
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>> 
>> --- Event Stage 0: Main Stage
>> 
>>              Viewer     2              2         1520     0
>>              Matrix     1             10     54702412     0
>>              Vector     3             97     72154440     0
>>       Krylov Solver     0             11       146840     0
>>      Preconditioner     0              7         7332     0
>>           Index Set     0              3         2400     0
>> 
>> --- Event Stage 1: mystage 1
>> 
>>              Viewer     1              0            0     0
>>              Matrix    22             14    691989068     0
>>      Matrix Coarsen     4              4         2576     0
>>              Vector   125             91     67210200     0
>>       Krylov Solver    15              4       120864     0
>>      Preconditioner    15              8         7520     0
>>           Index Set     4              4         3168     0
>>         PetscRandom     4              4         2560     0
>> 
>> --- Event Stage 2: mystage 2
>> 
>>              Matrix     1              0            0     0
>>              Vector    60              0            0     0
>>           Index Set     5              2         1592     0
>> ========================================================================================================================
>> Average time to get PetscTime(): 0
>> #PETSc Option Table entries:
>> -f helmholtz-sphere.dat
>> -ksp_convergence_test skip
>> -ksp_max_it 2
>> -ksp_monitor
>> -ksp_type cg
>> -ksp_view
>> -log_summary
>> -matload_block_size 1
>> -pc_type gamg
>> -table
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: --download-chaco=1 --download-ctetgen=1
>> --download-exodusii=1 --download-hdf5=1 --download-hypre=1
>> --download-metis=1 --download-ml=1 --download-mumps=1
>> --download-netcdf=1 --download-parmetis=1 --download-ptscotch=1
>> --download-scalapack=1 --download-superlu=1 --download-superlu_dist=1
>> --download-triangle=1 --with-c2html=0 --with-debugging=0
>> --with-make-np=32 --with-openmp=0 --with-pthreadclasses=0
>> --with-shared-libraries=1 --with-threadcomm=0 PETSC_ARCH=arch-linux2-c-opt
>> -----------------------------------------
>> Libraries compiled on Wed Apr  8 10:00:43 2015 on yam.doc.ic.ac.uk
>> Machine characteristics:
>> Linux-3.13.0-45-generic-x86_64-with-Ubuntu-14.04-trusty
>> Using PETSc directory: /data/lmitche1/src/deps/petsc
>> Using PETSc arch: arch-linux2-c-opt
>> -----------------------------------------
>> 
>> Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings
>> -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
>> Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable
>> -ffree-line-length-0 -Wno-unused-dummy-argument -O   ${FOPTFLAGS}
>> ${FFLAGS}
>> -----------------------------------------
>> 
>> Using include paths:
>> -I/data/lmitche1/src/deps/petsc/arch-linux2-c-opt/include
>> -I/data/lmitche1/src/deps/petsc/include
>> -I/data/lmitche1/src/deps/petsc/include
>> -I/data/lmitche1/src/deps/petsc/arch-linux2-c-opt/include
>> -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
>> -----------------------------------------
>> 
>> Using C linker: mpicc
>> Using Fortran linker: mpif90
>> Using libraries:
>> -Wl,-rpath,/data/lmitche1/src/deps/petsc/arch-linux2-c-opt/lib
>> -L/data/lmitche1/src/deps/petsc/arch-linux2-c-opt/lib -lpetsc
>> -Wl,-rpath,/data/lmitche1/src/deps/petsc/arch-linux2-c-opt/lib
>> -L/data/lmitche1/src/deps/petsc/arch-linux2-c-opt/lib -lcmumps
>> -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lsuperlu_4.3
>> -lsuperlu_dist_4.0 -lHYPRE -Wl,-rpath,/usr/lib/openmpi/lib
>> -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8
>> -L/usr/lib/gcc/x86_64-linux-gnu/4.8
>> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
>> -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx
>> -lstdc++ -lscalapack -lml -lmpi_cxx -lstdc++ -lexoIIv2for -lexodus
>> -llapack -lblas -lparmetis -ltriangle -lnetcdf -lmetis -lchaco
>> -lctetgen -lX11 -lptesmumps -lptscotch -lptscotcherr -lscotch
>> -lscotcherr -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lssl
>> -lcrypto -lm -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm
>> -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lz
>> -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8
>> -L/usr/lib/gcc/x86_64-linux-gnu/4.8
>> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
>> -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu
>> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl
>> -lmpi -lhwloc -lgcc_s -lpthread -ldl
>> -----------------------------------------
>> 
>> 
>> 
>> $ ./ex6-30ab49e4
>>  0 KSP Residual norm 3.679528502747e-11
>>  1 KSP Residual norm 1.410011347346e-14
>>  2 KSP Residual norm 2.871653636831e-14
>> KSP Object: 1 MPI processes
>>  type: cg
>>  maximum iterations=2, initial guess is zero
>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>>  type: gamg
>>    MG: type is MULTIPLICATIVE, levels=3 cycles=v
>>      Cycles per PCApply=1
>>      Using Galerkin computed coarse grid matrices
>>  Coarse grid solver -- level -------------------------------
>>    KSP Object:    (mg_coarse_)     1 MPI processes
>>      type: preonly
>>      maximum iterations=1, initial guess is zero
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_coarse_)     1 MPI processes
>>      type: bjacobi
>>        block Jacobi: number of blocks = 1
>>        Local solve is same for all blocks, in the following KSP and
>> PC objects:
>>        KSP Object:        (mg_coarse_sub_)         1 MPI processes
>>          type: preonly
>>          maximum iterations=1, initial guess is zero
>>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>          left preconditioning
>>          using NONE norm type for convergence test
>>        PC Object:        (mg_coarse_sub_)         1 MPI processes
>>          type: lu
>>            LU: out-of-place factorization
>>            tolerance for zero pivot 2.22045e-14
>>            using diagonal shift on blocks to prevent zero pivot
>> [INBLOCKS]
>>            matrix ordering: nd
>>            factor fill ratio given 5, needed 1
>>              Factored matrix follows:
>>                Mat Object:                 1 MPI processes
>>                  type: seqaij
>>                  rows=592, cols=592
>>                  package used to perform factorization: petsc
>>                  total: nonzeros=350464, allocated nonzeros=350464
>>                  total number of mallocs used during MatSetValues
>> calls =0
>>                    using I-node routines: found 119 nodes, limit used
>> is 5
>>          linear system matrix = precond matrix:
>>          Mat Object:           1 MPI processes
>>            type: seqaij
>>            rows=592, cols=592
>>            total: nonzeros=350464, allocated nonzeros=350464
>>            total number of mallocs used during MatSetValues calls =0
>>              using I-node routines: found 119 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object:       1 MPI processes
>>        type: seqaij
>>        rows=592, cols=592
>>        total: nonzeros=350464, allocated nonzeros=350464
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node routines: found 119 nodes, limit used is 5
>>  Down solver (pre-smoother) on level 1 -------------------------------
>>    KSP Object:    (mg_levels_1_)     1 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.0871826, max = 1.83084
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_1_)     1 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations
>> = 1, omega = 1
>>      linear system matrix = precond matrix:
>>      Mat Object:       1 MPI processes
>>        type: seqaij
>>        rows=35419, cols=35419
>>        total: nonzeros=1.55936e+07, allocated nonzeros=1.55936e+07
>>        total number of mallocs used during MatSetValues calls =1
>>          not using I-node routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 2 -------------------------------
>>    KSP Object:    (mg_levels_2_)     1 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.099472, max = 2.08891
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_2_)     1 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations
>> = 1, omega = 1
>>      linear system matrix = precond matrix:
>>      Mat Object:       1 MPI processes
>>        type: seqaij
>>        rows=327680, cols=327680
>>        total: nonzeros=3.25828e+06, allocated nonzeros=3.25828e+06
>>        total number of mallocs used during MatSetValues calls =0
>>          not using I-node routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  linear system matrix = precond matrix:
>>  Mat Object:   1 MPI processes
>>    type: seqaij
>>    rows=327680, cols=327680
>>    total: nonzeros=3.25828e+06, allocated nonzeros=3.25828e+06
>>    total number of mallocs used during MatSetValues calls =0
>>      not using I-node routines
>> Number of iterations =   2
>> Residual norm = 8368.22
>> ************************************************************************************************************************
>> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>> -fCourier9' to print this document            ***
>> ************************************************************************************************************************
>> 
>> ---------------------------------------------- PETSc Performance
>> Summary: ----------------------------------------------
>> 
>> ./ex6-1ddf9fe on a test named yam.doc.ic.ac.uk with 1 processor, by
>> lmitche1 Mon Apr 27 16:02:36 2015
>> Using Petsc Release Version 3.5.2, unknown
>> 
>>                         Max       Max/Min        Avg      Total
>> Time (sec):           2.828e+01      1.00000   2.828e+01
>> Objects:              1.150e+02      1.00000   1.150e+02
>> Flops:                1.006e+10      1.00000   1.006e+10  1.006e+10
>> Flops/sec:            3.559e+08      1.00000   3.559e+08  3.559e+08
>> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
>> MPI Reductions:       0.000e+00      0.00000
>> 
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                            e.g., VecAXPY() for real vectors of length
>> N --> 2N flops
>>                            and VecAXPY() for complex vectors of
>> length N --> 8N flops
>> 
>> Summary of Stages:   ----- Time ------  ----- Flops -----  ---
>> Messages ---  -- Message Lengths --  -- Reductions --
>>                        Avg     %Total     Avg     %Total   counts
>> %Total     Avg         %Total   counts   %Total
>> 0:      Main Stage: 9.9010e-02   0.4%  7.4996e+06   0.1%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>> 1:       mystage 1: 2.6509e+01  93.7%  8.4700e+09  84.2%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>> 2:       mystage 2: 1.6704e+00   5.9%  1.5861e+09  15.8%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>> 
>> -
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops: Max - maximum over all processors
>>                   Ratio - ratio of maximum to minimum over all processors
>>   Mess: number of messages sent
>>   Avg. len: average message length (bytes)
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>>      %T - percent time in this phase         %F - percent flops in
>> this phase
>>      %M - percent messages in this phase     %L - percent message
>> lengths in this phase
>>      %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
>> over all processors)
>> -
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flops
>>           --- Global ---  --- Stage ---   Total
>>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
>> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>> -
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> --- Event Stage 0: Main Stage
>> 
>> ThreadCommRunKer       2 1.0 3.0994e-06 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMult                1 1.0 7.2370e-03 1.0 6.19e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   7 83  0  0  0   855
>> MatAssemblyBegin       1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         1 1.0 1.0748e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0  11  0  0  0  0     0
>> MatLoad                1 1.0 8.5824e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0  87  0  0  0  0     0
>> VecNorm                1 1.0 9.2983e-05 1.0 6.55e+05 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  9  0  0  0  7048
>> VecSet                 5 1.0 2.9252e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   3  0  0  0  0     0
>> VecAXPY                1 1.0 4.6611e-04 1.0 6.55e+05 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  9  0  0  0  1406
>> 
>> --- Event Stage 1: mystage 1
>> 
>> MatMult               20 1.0 2.8699e-01 1.0 3.73e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  1  4  0  0  0   1  4  0  0  0  1301
>> MatConvert             2 1.0 8.3888e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatScale               6 1.0 7.1905e-02 1.0 4.85e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0   675
>> MatAssemblyBegin      20 1.0 1.9789e-05 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd        20 1.0 1.1295e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetRow        1452396 1.0 9.5270e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatCoarsen             2 1.0 3.0676e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAXPY                2 1.0 8.2162e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMatMult             2 1.0 4.2625e-01 1.0 4.31e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  2  0  0  0  0   2  1  0  0  0   101
>> MatMatMultSym          2 1.0 3.0257e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
>> MatMatMultNum          2 1.0 1.2364e-01 1.0 4.31e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0   349
>> MatPtAP                2 1.0 2.3871e+01 1.0 7.82e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00 84 78  0  0  0  90 92  0  0  0   328
>> MatPtAPSymbolic        2 1.0 1.4329e+01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00 51  0  0  0  0  54  0  0  0  0     0
>> MatPtAPNumeric         2 1.0 9.5422e+00 1.0 7.82e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00 34 78  0  0  0  36 92  0  0  0   819
>> MatTrnMatMult          2 1.0 9.7712e-01 1.0 8.24e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  3  1  0  0  0   4  1  0  0  0    84
>> MatTrnMatMultSym       2 1.0 5.0258e-01 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
>> MatTrnMatMultNum       2 1.0 4.7454e-01 1.0 8.24e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   174
>> MatGetSymTrans         4 1.0 6.2370e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecMDot               20 1.0 1.7304e-02 1.0 3.99e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2308
>> VecNorm               22 1.0 1.3692e-03 1.0 7.99e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5834
>> VecScale              22 1.0 1.8549e-03 1.0 3.99e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2153
>> VecCopy                2 1.0 5.0211e-04 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecSet                77 1.0 3.6245e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY                2 1.0 3.7718e-04 1.0 7.26e+05 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1925
>> VecMAXPY              22 1.0 2.2252e-02 1.0 4.72e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0  2121
>> VecAssemblyBegin       2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyEnd         2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecPointwiseMult      22 1.0 9.2957e-03 1.0 3.99e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   430
>> VecSetRandom           2 1.0 8.8599e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecNormalize          22 1.0 3.2570e-03 1.0 1.20e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3679
>> KSPGMRESOrthog        20 1.0 3.6396e-02 1.0 7.99e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  2195
>> KSPSetUp               6 1.0 1.4364e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCGAMGgraph_AGG        2 1.0 4.8670e-01 1.0 3.77e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0    77
>> PCGAMGcoarse_AGG       2 1.0 1.0664e+00 1.0 8.24e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  4  1  0  0  0   4  1  0  0  0    77
>> PCGAMGProl_AGG         2 1.0 6.4827e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCGAMGPOpt_AGG         2 1.0 9.9913e-01 1.0 5.31e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  4  5  0  0  0   4  6  0  0  0   532
>> PCSetUp                1 1.0 2.6505e+01 1.0 8.47e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00 94 84  0  0  0 100100  0  0  0   320
>> 
>> --- Event Stage 2: mystage 2
>> 
>> MatMult               38 1.0 5.6846e-01 1.0 6.85e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  2  7  0  0  0  34 43  0  0  0  1204
>> MatMultAdd             6 1.0 2.7303e-02 1.0 3.25e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   2  2  0  0  0  1189
>> MatMultTranspose       6 1.0 3.2745e-02 1.0 3.25e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   2  2  0  0  0   991
>> MatSolve               3 1.0 1.6339e-03 1.0 2.10e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1286
>> MatSOR                36 1.0 9.4071e-01 1.0 6.79e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  3  7  0  0  0  56 43  0  0  0   722
>> MatLUFactorSym         1 1.0 5.9440e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatLUFactorNum         1 1.0 4.0792e-02 1.0 1.15e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  1  0  0  0   2  7  0  0  0  2820
>> MatResidual            6 1.0 9.2385e-02 1.0 1.13e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  1  0  0  0   6  7  0  0  0  1224
>> MatGetRowIJ            1 1.0 1.4091e-04 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatGetOrdering         1 1.0 2.3508e-04 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatView                6 1.0 5.8508e-04 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecTDot                4 1.0 1.4160e-03 1.0 2.62e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1851
>> VecNorm                3 1.0 3.5286e-04 1.0 1.97e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5572
>> VecCopy                8 1.0 1.1313e-02 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
>> VecSet                22 1.0 4.3163e-03 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY                4 1.0 1.6727e-03 1.0 2.62e+06 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1567
>> VecAYPX               49 1.0 1.7958e-02 1.0 1.15e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   1  1  0  0  0   643
>> VecAXPBYCZ            24 1.0 1.3117e-02 1.0 2.18e+07 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   1  1  0  0  0  1661
>> KSPSetUp               2 1.0 9.2983e-06 1.0 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               1 1.0 1.6690e+00 1.0 1.59e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00  6 16  0  0  0 100100  0  0  0   950
>> PCSetUp                1 1.0 4.7009e-02 1.0 1.15e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  1  0  0  0   3  7  0  0  0  2447
>> PCSetUpOnBlocks        3 1.0 4.7014e-02 1.0 1.15e+08 1.0 0.0e+00
>> 0.0e+00 0.0e+00  0  1  0  0  0   3  7  0  0  0  2447
>> PCApply                3 1.0 1.6409e+00 1.0 1.57e+09 1.0 0.0e+00
>> 0.0e+00 0.0e+00  6 16  0  0  0  98 99  0  0  0   954
>> -
>> ------------------------------------------------------------------------------------------------------------------------
>> 
>> Memory usage is given in bytes:
>> 
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>> 
>> --- Event Stage 0: Main Stage
>> 
>>              Viewer     1              1          760     0
>>              Matrix     1              6     58874852     0
>>              Vector     3             20     27954224     0
>>       Krylov Solver     0              5        23360     0
>>      Preconditioner     0              5         5332     0
>>           Index Set     0              3         7112     0
>> 
>> --- Event Stage 1: mystage 1
>> 
>>              Viewer     1              0            0     0
>>              Matrix    14             10    477163156     0
>>      Matrix Coarsen     2              2         1288     0
>>              Vector    69             52     66638640     0
>>       Krylov Solver     7              2        60432     0
>>      Preconditioner     7              2         2096     0
>>           Index Set     2              2         1584     0
>>         PetscRandom     2              2         1280     0
>> 
>> --- Event Stage 2: mystage 2
>> 
>>              Matrix     1              0            0     0
>>           Index Set     5              2         2536     0
>> ========================================================================================================================
>> Average time to get PetscTime(): 0
>> #PETSc Option Table entries:
>> -f helmholtz-sphere.dat
>> -ksp_convergence_test skip
>> -ksp_max_it 2
>> -ksp_monitor
>> -ksp_type cg
>> -ksp_view
>> -log_summary
>> -matload_block_size 1
>> -pc_type gamg
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure options: PETSC_ARCH=test --with-debugging=0
>> -----------------------------------------
>> Libraries compiled on Mon Apr 27 10:49:24 2015 on yam.doc.ic.ac.uk
>> Machine characteristics:
>> Linux-3.13.0-45-generic-x86_64-with-Ubuntu-14.04-trusty
>> Using PETSc directory: /data/lmitche1/src/deps/petsc
>> Using PETSc arch: test
>> -----------------------------------------
>> 
>> Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings
>> -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
>> Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable
>> -ffree-line-length-0 -Wno-unused-dummy-argument -O   ${FOPTFLAGS}
>> ${FFLAGS}
>> -----------------------------------------
>> 
>> Using include paths: -I/data/lmitche1/src/deps/petsc/test/include
>> -I/data/lmitche1/src/deps/petsc/include
>> -I/data/lmitche1/src/deps/petsc/include
>> -I/data/lmitche1/src/deps/petsc/test/include
>> -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
>> -----------------------------------------
>> 
>> Using C linker: mpicc
>> Using Fortran linker: mpif90
>> Using libraries: -Wl,-rpath,/data/lmitche1/src/deps/petsc/test/lib
>> -L/data/lmitche1/src/deps/petsc/test/lib -lpetsc -llapack -lblas -lX11
>> -lssl -lcrypto -lpthread -lm -Wl,-rpath,/usr/lib/openmpi/lib
>> -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8
>> -L/usr/lib/gcc/x86_64-linux-gnu/4.8
>> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
>> -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_f90
>> -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx
>> -lstdc++ -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib
>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8
>> -L/usr/lib/gcc/x86_64-linux-gnu/4.8
>> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu
>> -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu
>> -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl
>> -lmpi -lhwloc -lgcc_s -lpthread -ldl
>> -----------------------------------------
>>