[petsc-users] KSP changes for successive solver

Fri Jul 24 13:35:13 CDT 2015

Hi Mark and Barry,

I am sorry for my late reply: it was a busy week!
I run a test case for a larger problem with  as many levels (i.e. 5) of
MG I could and  GAMG as PC at the coarse level. I attached the output of
info ( after grep for "gmag"),  ksp_view and log_summary.
The solve takes about 2 seconds on 8192 cores, which is way too much.
The number of iterations to convergence is 24.
I hope there is a way to speed it up.

Thanks,
Michele

On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
> 
> 
> 
> 
> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <mrosso at uci.edu> wrote:
> 
>         Barry,
>         
>         thank you very much for the detailed answer.  I tried what you
>         suggested and it works.
>         So far I tried on a small system but the final goal is to use
>         it for very large runs.  How does  PCGAMG compares to PCMG  as
>         far as performances and scalability are concerned?
>         Also, could you help me to tune the GAMG part ( my current
>         setup is in the attached ksp_view.txt file )? 
>         
> 
> 
> I am going to add this to the document today but you can run with
> -info.  This is very noisy so you might want to do the next step at
> run time.  Then grep on GAMG.  This will be about 20 lines.  Send that
> to us and we can go from there.
> 
> 
> Mark
> 
> 
>  
>         
>         I also tried to use superlu_dist for the LU decomposition on
>         mg_coarse_mg_sub_
>         -mg_coarse_mg_coarse_sub_pc_type lu
>         -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package
>         superlu_dist
>         
>         but I got an error:
>         
>         ****** Error in MC64A/AD. INFO(1) = -2 
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         ****** Error in MC64A/AD. INFO(1) = -2
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         symbfact() error returns 0
>         
>         
>         Thank you,
>         Michele
>         
>         
>         On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote: 
>         
>         > 
>         > > On Jul 16, 2015, at 5:42 PM, Michele Rosso <mrosso at uci.edu> wrote:
>         > > 
>         > > Barry,
>         > > 
>         > > thanks for your reply. So if I want it fixed, I will have to use the master branch, correct?
>         > 
>         >   Yes, or edit mg.c and remove the offending lines of code (easy enough). 
>         > > 
>         > > On a side note, what I am trying to achieve is to be able to use how many levels of MG I want, despite the limitation imposed by the local number of grid nodes.
>         > 
>         >    I assume you are talking about with DMDA? There is no generic limitation for PETSc's multigrid, it is only with the way the DMDA code figures out the interpolation that causes a restriction.
>         > 
>         > > So far I am using a borrowed code that implements a PC that creates a sub communicator and perform MG on it.
>         > > While reading the documentation I found out that PCMGSetLevels takes in an optional array of communicators. How does this work?
>         > 
>         >    It doesn't work. It was an idea that never got pursued.
>         > 
>         > > Can I can simply define my matrix and rhs on the fine grid as I would do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP would take care of it by using the correct communicator for each level?
>         > 
>         >    No.
>         > 
>         >    You can use the PCMG geometric multigrid with DMDA for as many levels as it works and then use PCGAMG as the coarse grid solver. PCGAMG automatically uses fewer processes for the coarse level matrices and vectors. You could do this all from the command line without writing code. 
>         > 
>         >    For example if your code uses a DMDA and calls KSPSetDM() use for example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type gamg  -ksp_view 
>         > 
>         > 
>         > 
>         >   Barry
>         > 
>         > 
>         > > 
>         > > Thanks,
>         > > Michele
>         > > 
>         > > 
>         > > 
>         > > 
>         > > On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
>         > >>    Michel,
>         > >> 
>         > >>     This is a very annoying feature that has been fixed in master 
>         > >> http://www.mcs.anl.gov/petsc/developers/index.html
>         > >>   I would like to have changed it in maint but Jed would have a shit-fit :-) since it changes behavior.
>         > >> 
>         > >>   Barry
>         > >> 
>         > >> 
>         > >> > On Jul 16, 2015, at 4:53 PM, Michele Rosso <mrosso at uci.edu> wrote:
>         > >> > 
>         > >> > Hi,
>         > >> > 
>         > >> > I am performing a series of solves inside a loop. The matrix for each solve changes but not enough to justify a rebuilt of the PC at each solve.
>         > >> > Therefore I am using  KSPSetReusePreconditioner to avoid rebuilding unless necessary. The solver is CG + MG with a custom  PC at the coarse level.
>         > >> > If KSP is not updated each time, everything works as it is supposed to. 
>         > >> > When instead I allow the default PETSc  behavior, i.e. updating PC every time the matrix changes, the coarse level KSP , initially set to PREONLY, is changed into GMRES 
>         > >> > after the first solve. I am not sure where the problem lies (my PC or PETSc), so I would like to have your opinion on this.
>         > >> > I attached the ksp_view for the 2 successive solve and the options stack.
>         > >> > 
>         > >> > Thanks for your help,
>         > >> > Michel
>         > >> > 
>         > >> > 
>         > >> > 
>         > >> > <ksp_view.txt><petsc_options.txt>
>         > >> 
>         > >> 
>         > >> 
>         > > 
>         > 
>         
>         
>         
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150724/6fba76c2/attachment-0001.html>
-------------- next part --------------
[0] PCSetUp_GAMG(): level 0) N=8192, n data rows=1, n data cols=1, nnz/row (ave)=7, np=8192
[0] PCGAMGFilterGraph(): 	 100% nnz after filtering, with threshold 0, 4 nnz ave. (N=8192)
[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
[0] PCGAMGProlongator_AGG(): New grid 1005 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.876420e+00 min=1.105137e-01 PC=jacobi
[135] MatAssemblyEnd_SeqA[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 0 with simple aggregation
[0] PCSetUp_GAMG(): 1) N=1005, n data cols=1, nnz/row (ave)=27, 16 active pes
[0] PCGAMGFilterGraph(): 	 100% nnz after filtering, with threshold 0, 20.5645 nnz ave. (N=1005)
[0] PCGAMGProlongator_AGG(): New grid 103 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.461408e+00 min=1.226917e-03 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 8 with simple aggregation
[0] PCSetUp_GAMG(): 2) N=103, n data cols=1, nnz/row (ave)=55, 2 active pes
[94] PetscCommDuplicate(): Using[0] PCGAMGFilterGraph(): 	 100% nnz after filtering, with threshold 0, 55.5049 nnz ave. (N=103)
[0] PCGAMGProlongator_AGG(): New grid 6 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.697064e+00 min=2.669349e-04 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 6 with simple aggregation
[0] PCSetUp_GAMG(): 3) N=6, n data cols=1, nnz/row (ave)=6, 1 active pes
[0] PCSetUp_GAMG(): 4 levels, grid complexity = 1.60036
      type: gamg
          GAMG specific options
[0] PCSetUp_GAMG(): level 0) N=8192, n data rows=1, n data cols=1, nnz/row (ave)=7, np=8192
[125] MatAssemb[0] PCGAMGFilterGraph(): 	 100% nnz after filtering, with threshold 0, 4 nnz ave. (N=8192)
[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
[0] PCGAMGProlongator_AGG(): New grid 1005 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.876420e+00 min=1.105137e-01 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 0 with simple aggregation
[268] MatAssemblyEn[0] PCSetUp_GAMG(): 1) N=1005, n data cols=1, nnz/row (ave)=27, 16 active pes
[0] PCGAMGFilterGraph(): 	 100% nnz after filtering, with threshold 0, 20.5645 nnz ave. (N=1005)
[0] PCGAMGProlongator_AGG(): New grid 103 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.461408e+00 min=1.226917e-03 PC=jacobi
[233] Pe[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 8 with simple aggregation
[0] PCSetUp_GAMG(): 2) N=103, n data cols=1, nnz/row (ave)=55, 2 active pes
[0] PCGAMGFilterGraph(): 	 100% nnz after filtering, with threshold 0, 55.5049 nnz ave. (N=103)
[0] PCGAMGProlongator_AGG(): New grid 6 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.697064e+00 min=2.669349e-04 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 6 with simple aggregation
[0] PCSetUp_GAMG(): 3) N=6, n data cols=1, nnz/row (ave)=6, 1 active pes
[0] PCSetUp_GAMG(): 4 levels, grid complexity = 1.60036
      type: gamg
          GAMG specific options
      type: gamg
    GAMG specific options
      type: gamg
          GAMG specific options
-------------- next part --------------
KSP Object: 8192 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=1e-09, absolute=1e-50, divergence=10000
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8192 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8192 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8192 MPI processes
      type: gamg
        MG: type is MULTIPLICATIVE, levels=4 cycles=v
          Cycles per PCApply=1
          Using Galerkin computed coarse grid matrices
          GAMG specific options
            Threshold for dropping small values from graph 0
            AGG specific options
              Symmetric graph false
      Coarse grid solver -- level -------------------------------
        KSP Object:        (mg_coarse_mg_coarse_)         8192 MPI processes
          type: preonly
          maximum iterations=1, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_coarse_)         8192 MPI processes
          type: bjacobi
            block Jacobi: number of blocks = 8192
            Local solve is same for all blocks, in the following KSP and PC objects:
          KSP Object:          (mg_coarse_mg_coarse_sub_)           1 MPI processes
            type: preonly
            maximum iterations=1, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_mg_coarse_sub_)           1 MPI processes
            type: lu
              LU: out-of-place factorization
              tolerance for zero pivot 2.22045e-14
              using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
              matrix ordering: nd
              factor fill ratio given 5, needed 1
                Factored matrix follows:
                  Mat Object:                   1 MPI processes
                    type: seqaij
                    rows=6, cols=6
                    package used to perform factorization: petsc
                    total: nonzeros=36, allocated nonzeros=36
                    total number of mallocs used during MatSetValues calls =0
                      using I-node routines: found 2 nodes, limit used is 5
            linear system matrix = precond matrix:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=6, cols=6
              total: nonzeros=36, allocated nonzeros=36
              total number of mallocs used during MatSetValues calls =0
                using I-node routines: found 2 nodes, limit used is 5
          linear system matrix = precond matrix:
          Mat Object:           8192 MPI processes
            type: mpiaij
            rows=6, cols=6
            total: nonzeros=36, allocated nonzeros=36
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 2 nodes, limit used is 5
      Down solver (pre-smoother) on level 1 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_1_)         8192 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.0995252, max = 1.09478
            Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_1_esteig_)             8192 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_1_)         8192 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8192 MPI processes
            type: mpiaij
            rows=103, cols=103
            total: nonzeros=5717, allocated nonzeros=5717
            total number of mallocs used during MatSetValues calls =0
              not using I-node (on process 0) routines
      Up solver (post-smoother) same as down solver (pre-smoother)
      Down solver (pre-smoother) on level 2 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_2_)         8192 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.15748, max = 1.73228
            Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_2_esteig_)             8192 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_2_)         8192 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8192 MPI processes
            type: mpiaij
            rows=1005, cols=1005
            total: nonzeros=27137, allocated nonzeros=27137
            total number of mallocs used during MatSetValues calls =0
              not using I-node (on process 0) routines
      Up solver (post-smoother) same as down solver (pre-smoother)
      Down solver (pre-smoother) on level 3 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_3_)         8192 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.191092, max = 2.10202
            Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_3_esteig_)             8192 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_3_)         8192 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8192 MPI processes
            type: mpiaij
            rows=8192, cols=8192
            total: nonzeros=54784, allocated nonzeros=54784
            total number of mallocs used during MatSetValues calls =0
              not using I-node (on process 0) routines
      Up solver (post-smoother) same as down solver (pre-smoother)
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=8192, cols=8192
        total: nonzeros=54784, allocated nonzeros=54784
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=65536, cols=65536
        total: nonzeros=448512, allocated nonzeros=448512
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=524288, cols=524288
        total: nonzeros=3.62906e+06, allocated nonzeros=3.62906e+06
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=4194304, cols=4194304
        total: nonzeros=2.91963e+07, allocated nonzeros=2.91963e+07
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8192 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8192 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8192 MPI processes
        type: mpiaij
        rows=33554432, cols=33554432
        total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
        total number of mallocs used during MatSetValues calls =0
          has attached null space
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8192 MPI processes
    type: mpiaij
    rows=33554432, cols=33554432
    total: nonzeros=2.34226e+08, allocated nonzeros=2.34226e+08
    total number of mallocs used during MatSetValues calls =0
      has attached null space
-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/u/sciteam/mrosso/mrosso-repo/build/bin/test_droplet_box.exe on a gnu-opt-32idx named p…þÿÿ with 8192 processors, by mrosso Fri Jul 24 13:09:23 2015
Using Petsc Development GIT revision: v3.6-233-g4936542  GIT Date: 2015-07-17 10:15:47 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           1.130e+02      1.00023   1.130e+02
Objects:              1.587e+03      1.00253   1.583e+03
Flops:                8.042e+07      1.28093   6.371e+07  5.219e+11
Flops/sec:            7.115e+05      1.28065   5.639e+05  4.619e+09
MPI Messages:         1.267e+05     13.76755   1.879e+04  1.539e+08
MPI Message Lengths:  8.176e+06      2.12933   3.881e+02  5.972e+10
MPI Reductions:       2.493e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.1300e+02 100.0%  5.2195e+11 100.0%  1.539e+08 100.0%  3.881e+02      100.0%  2.492e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot              120 1.0 2.4560e-01 1.6 7.24e+04329.0 0.0e+00 0.0e+00 1.2e+02  0  0  0  0  5   0  0  0  0  5     9
VecTDot              194 1.0 2.6155e-01 1.3 1.59e+06 1.0 0.0e+00 0.0e+00 1.9e+02  0  2  0  0  8   0  2  0  0  8 49771
VecNorm              236 1.0 4.8733e-01 1.4 8.67e+05 1.0 0.0e+00 0.0e+00 2.4e+02  0  1  0  0  9   0  1  0  0  9 14323
VecScale            1009 1.0 1.1008e-03 1.5 1.63e+05 1.8 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 1156928
VecCopy              405 1.0 3.2604e-03 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              2648 1.0 9.1252e-03 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              594 1.0 1.5715e-02 3.8 4.77e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  7  0  0  0   0  7  0  0  0 2485378
VecAYPX             3103 1.0 8.7631e-03 2.4 2.58e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 2263359
VecAXPBYCZ          1164 1.0 3.5439e-0313.2 3.22e+05166.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5091
VecMAXPY             132 1.0 3.7217e-04 6.8 8.63e+04166.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 12994
VecAssemblyBegin      36 1.0 3.7324e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+01  0  0  0  0  4   0  0  0  0  4     0
VecAssemblyEnd        36 1.0 1.5278e-0364.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult      66 1.0 2.8777e-0426.8 3.65e+03166.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   711
VecScatterBegin     4914 1.0 3.9190e-0128.1 0.00e+00 0.0 1.1e+08 5.3e+02 0.0e+00  0  0 72 99  0   0  0 72 99  0     0
VecScatterEnd       4914 1.0 8.6240e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
VecSetRandom           6 1.0 1.0859e-022168.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize         132 1.0 3.4323e-01 1.5 2.19e+04166.0 0.0e+00 0.0e+00 1.3e+02  0  0  0  0  5   0  0  0  0  5     4
MatMult             2645 1.0 3.6311e+0032.2 3.45e+07 1.3 6.5e+07 7.6e+02 0.0e+00  1 42 42 83  0   1 42 42 83  0 60126
MatMultAdd           679 1.0 4.8583e+0031.6 1.08e+06 1.2 1.6e+06 1.4e+01 0.0e+00  4  1  1  0  0   4  1  1  0  0  1532
MatMultTranspose     683 1.0 4.2303e+00667.0 1.09e+06 1.2 1.6e+06 1.4e+01 0.0e+00  0  1  1  0  0   0  1  1  0  0  1778
MatSolve              97 0.0 2.9469e-04 0.0 6.40e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    22
MatSOR              2782 1.0 3.6662e+0035.7 3.29e+07 1.3 4.1e+07 2.2e+02 0.0e+00  1 40 26 15  0   1 40 26 15  0 56576
MatLUFactorSym         2 1.0 1.5128e-02358.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         2 1.0 6.1989e-0513.0 2.58e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
MatConvert             6 1.0 9.1314e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale              18 1.0 1.3737e-02219.9 3.16e+041579.8 9.3e+04 8.6e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    36
MatResidual          679 1.0 3.0610e-0110.0 7.51e+06 1.2 2.3e+07 5.5e+02 0.0e+00  0 10 15 21  0   0 10 15 21  0 169592
MatAssemblyBegin     119 1.0 1.7048e+01 2.8 0.00e+00 0.0 4.3e+04 7.2e+00 1.3e+02 10  0  0  0  5  10  0  0  0  5     0
MatAssemblyEnd       119 1.0 4.1777e+01 1.4 0.00e+00 0.0 1.7e+06 4.1e+01 3.8e+02 31  0  1  0 15  31  0  1  0 15     0
MatGetRow           1328166.0 3.9291e-0454.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 0.0 1.5020e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrix       12 1.0 2.5183e+01 1.0 0.00e+00 0.0 7.6e+04 1.6e+01 1.9e+02 22  0  0  0  8  22  0  0  0  8     0
MatGetOrdering         2 0.0 9.5510e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen             6 1.0 1.5275e+00 6.0 0.00e+00 0.0 3.8e+07 4.0e+00 2.4e+02  1  0 25  0 10   1  0 25  0 10     0
MatView               60 1.2 1.9943e+0026.6 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+01  1  0  0  0  2   1  0  0  0  2     0
MatAXPY                6 1.0 4.5854e+00326.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatMatMult             6 1.0 1.7226e+01 1.3 2.80e+041749.0 4.8e+05 5.9e+00 9.6e+01 12  0  0  0  4  12  0  0  0  4     0
MatMatMultSym          6 1.0 1.3093e+01 1.0 0.00e+00 0.0 3.9e+05 5.3e+00 8.4e+01 12  0  0  0  3  12  0  0  0  3     0
MatMatMultNum          6 1.0 4.1413e+0087.5 2.80e+041749.0 9.3e+04 8.6e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatPtAP               14 1.0 2.3092e+01 1.2 3.94e+05 2.0 1.7e+06 2.4e+02 1.8e+02 20  0  1  1  7  20  0  1  1  7    73
MatPtAPSymbolic       10 1.0 1.6246e+01 1.7 0.00e+00 0.0 1.0e+06 2.6e+02 7.0e+01 12  0  1  0  3  12  0  1  0  3     0
MatPtAPNumeric        14 1.0 9.1005e+00 1.3 3.94e+05 2.0 7.2e+05 2.1e+02 1.1e+02  8  0  0  0  4   8  0  0  0  4   185
MatTrnMatMult          2 1.0 5.6152e+00 1.0 3.64e+02 2.9 1.1e+06 1.2e+01 3.8e+01  5  0  1  0  2   5  0  1  0  2     0
MatTrnMatMultSym       2 1.0 5.5943e+00 1.0 0.00e+00 0.0 1.0e+06 7.6e+00 3.4e+01  5  0  1  0  1   5  0  1  0  1     0
MatTrnMatMultNum       2 1.0 2.8538e-02 4.6 3.64e+02 2.9 9.3e+04 5.4e+01 4.0e+00  0  0  0  0  0   0  0  0  0  0    96
MatGetLocalMat        30 1.0 4.4808e+001435.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol         26 1.0 4.5314e+00292.0 0.00e+00 0.0 1.4e+06 2.8e+02 0.0e+00  0  0  1  1  0   0  0  1  1  0     0
MatGetSymTrans        20 1.0 3.3071e-0347.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             6 1.0 7.4315e-0443.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFBcastBegin         252 1.0 1.3895e+0014.8 0.00e+00 0.0 3.8e+07 4.0e+00 0.0e+00  1  0 25  0  0   1  0 25  0  0     0
SFBcastEnd           252 1.0 7.1653e-0218.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog       120 1.0 2.4587e-01 1.6 1.45e+05220.3 0.0e+00 0.0e+00 1.2e+02  0  0  0  0  5   0  0  0  0  5    26
KSPSetUp              36 1.0 2.2627e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 2.6e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve               4 1.0 1.0763e+02 1.0 8.04e+07 1.3 1.5e+08 3.9e+02 2.4e+03 95100100100 96  95100100100 96  4848
PCGAMGGraph_AGG        6 1.0 4.2442e+00 1.0 2.80e+041749.0 2.8e+05 5.7e+00 7.2e+01  4  0  0  0  3   4  0  0  0  3     0
PCGAMGCoarse_AGG       6 1.0 7.2416e+00 1.0 3.64e+02 2.9 4.1e+07 4.4e+00 3.1e+02  6  0 26  0 12   6  0 26  0 12     0
PCGAMGProl_AGG         6 1.0 1.0400e+01 1.0 0.00e+00 0.0 7.9e+05 8.0e+00 1.4e+02  9  0  1  0  6   9  0  1  0  6     0
PCGAMGPOpt_AGG         6 1.0 2.3133e+01 1.2 4.03e+05647.5 1.4e+06 7.7e+00 3.0e+02 17  0  1  0 12  17  0  1  0 12     0
GAMG: createProl       6 1.0 4.4975e+01 1.1 4.31e+05565.4 4.3e+07 4.6e+00 8.3e+02 36  0 28  0 33  36  0 28  0 33     0
  Graph               12 1.0 4.2435e+00 1.0 2.80e+041749.0 2.8e+05 5.7e+00 7.2e+01  4  0  0  0  3   4  0  0  0  3     0
  MIS/Agg              6 1.0 1.5276e+00 6.0 0.00e+00 0.0 3.8e+07 4.0e+00 2.4e+02  1  0 25  0 10   1  0 25  0 10     0
  SA: col data         6 1.0 6.9602e+00 1.6 0.00e+00 0.0 7.2e+05 8.2e+00 6.0e+01  6  0  0  0  2   6  0  0  0  2     0
  SA: frmProl0         6 1.0 3.3989e+00 1.0 0.00e+00 0.0 7.2e+04 5.9e+00 6.0e+01  3  0  0  0  2   3  0  0  0  2     0
  SA: smooth           6 1.0 2.3133e+01 1.2 4.03e+05647.5 1.4e+06 7.7e+00 3.0e+02 17  0  1  0 12  17  0  1  0 12     0
GAMG: partLevel        6 1.0 4.3421e+01 1.1 1.97e+053512.3 6.9e+05 2.0e+01 4.1e+02 38  0  0  0 16  38  0  0  0 16     0
  repartition          6 1.0 3.0290e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.6e+01  0  0  0  0  1   0  0  0  0  1     0
  Invert-Sort          6 1.0 2.2100e+00 7.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01  2  0  0  0  1   2  0  0  0  1     0
  Move A               6 1.0 1.3769e+01 1.0 0.00e+00 0.0 1.1e+04 8.4e+01 1.0e+02 12  0  0  0  4  12  0  0  0  4     0
  Move P               6 1.0 1.1463e+01 1.0 0.00e+00 0.0 6.5e+04 5.5e+00 1.0e+02 10  0  0  0  4  10  0  0  0  4     0
PCSetUp                6 1.0 9.5437e+01 1.0 8.96e+05 3.3 4.5e+07 1.4e+01 1.5e+03 84  0 29  1 59  84  0 29  1 59    24
PCSetUpOnBlocks       97 1.0 1.7256e-0221.5 2.58e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply               97 1.0 1.1589e+01 1.0 6.95e+07 1.3 1.0e+08 4.8e+02 5.1e+02 10 84 67 83 21  10 84 67 83 21 37682
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector  1032           1032      3077992     0
      Vector Scatter    64             63        71936     0
              Matrix   211            211      2308880     0
      Matrix Coarsen     6              6         3720     0
   Matrix Null Space     1              1          584     0
    Distributed Mesh     5              4        19808     0
Star Forest Bipartite Graph    16             14        11760     0
     Discrete System     5              4         3360     0
           Index Set   180            180       169588     0
   IS L to G Mapping     5              4         6020     0
       Krylov Solver    22             22       374160     0
     DMKSP interface     4              4         2560     0
      Preconditioner    22             22        20924     0
         PetscRandom     6              6         3696     0
              Viewer     8              6         4512     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 4.57764e-05
Average time for zero size MPI_Send(): 1.04982e-05
#PETSc Option Table entries:
-finput input.txt
-info
-ksp_initial_guess_nonzero yes
-ksp_norm_type unpreconditioned
-ksp_rtol 1e-9
-ksp_type cg
-ksp_view
-log_summary log_gamg.txt
-mg_coarse_ksp_type preonly
-mg_coarse_pc_type gamg
-mg_levels_ksp_type richardson
-pc_dmdarepart_log
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=16384 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=4 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-sdot-returns-double=0 --known-snrm2-returns-double=0 --with-batch="1 " --known-mpi-shared="0 " --known-mpi-shared-libraries=0 --known-memcmp-ok  --with-blas-lapack-lib=/opt/acml/5.3.1/gfortran64/lib/libacml.a --COPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --FOPTFLAGS="-Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native" --CXXOPTFLAGS="-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native" --with-x="0 " --with-debugging=0 --with-clib-autodetect="0 " --with-cxxlib-autodetect="0 " --with-fortranlib-autodetect="0 " --with-shared-libraries="0 " --with-mpi-compilers="1 " --with-cc="cc " --with-cxx="CC " --with-fc="ftn " --download-hypre=1 --download-blacs="1 " --download-scalapack="1 " --download-superlu_dist="1 " --download-metis="1 " --download-parmetis="1 " PETSC_ARCH=gnu-opt-32idx
-----------------------------------------
Libraries compiled on Sat Jul 18 19:48:51 2015 on h2ologin1 
Machine characteristics: Linux-3.0.101-0.46-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /u/sciteam/mrosso/LIBS/petsc
Using PETSc arch: gnu-opt-32idx
-----------------------------------------

Using C compiler: cc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native -mtune=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O3 -march=native -mtune=native   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/include -I/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lpetsc -Wl,-rpath,/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -L/u/sciteam/mrosso/LIBS/petsc/gnu-opt-32idx/lib -lsuperlu_dist_4.0 -lHYPRE -lscalapack -Wl,-rpath,/opt/acml/5.3.1/gfortran64/lib -L/opt/acml/5.3.1/gfortran64/lib -lacml -lparmetis -lmetis -lssl -lcrypto -ldl 
-----------------------------------------