[petsc-users] GAMG speed

Tue Aug 13 21:57:36 CDT 2013

Hi Jed,

I attached the output for both the runs you suggested. At the beginning 
of each file I included the options I used.

On a side note, I tried to run with a grid of 256^3 (exactly as before) 
but with less levels, i.e. 3 instead of 4 or 5.
My system stops the run because of an Out Of Memory condition. It is 
really odd since I have not changed anything except
- pc_mg_levels.  I cannot send you any output since there is none. Do 
you have any guess where the problem comes from?
Thanks,

Michele

On 08/13/2013 07:23 PM, Jed Brown wrote:
> Michele Rosso <mrosso at uci.edu> writes:
>> The matrix arises from discretization of the Poisson equation in
>> incompressible flow calculations.
> Can you try the two runs below and send -log_summary?
>
>    -log_summary -ksp_monitor -ksp_view -ksp_converged_reason -pc_type mg -pc_mg_galerkin -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1
>
>
>    -log_summary -ksp_monitor -ksp_view -ksp_converged_reason -pc_type mg -pc_mg_galerkin -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -pc_mg_type full

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130813/72be2884/attachment-0001.html>
-------------- next part --------------
  -log_summary -ksp_monitor -ksp_view -ksp_converged_reason -pc_type mg 
  -pc_mg_galerkin -pc_mg_levels 5 -mg_levels_ksp_type richardson 
  -mg_levels_ksp_max_it 1

  0 KSP Residual norm 3.653965664551e-05 
  1 KSP Residual norm 1.910638846094e-06 
  2 KSP Residual norm 8.690440116045e-08 
  3 KSP Residual norm 3.732213639394e-09 
  4 KSP Residual norm 1.964855338020e-10 
Linear solve converged due to CONVERGED_RTOL iterations 4
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=0.0001, absolute=1e-50, divergence=10000
  left preconditioning
  has attached null space
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8 MPI processes
      type: redundant
        Redundant preconditioner: First (color=0) of 8 PCs follows
      KSP Object:      (mg_coarse_redundant_)       1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_redundant_)       1 MPI processes
        type: lu
          LU: out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          using diagonal shift on blocks to prevent zero pivot
          matrix ordering: nd
          factor fill ratio given 5, needed 8.69546
            Factored matrix follows:
              Matrix Object:               1 MPI processes
                type: seqaij
                rows=512, cols=512
                package used to perform factorization: petsc
                total: nonzeros=120206, allocated nonzeros=120206
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
        linear system matrix = precond matrix:
        Matrix Object:         1 MPI processes
          type: seqaij
          rows=512, cols=512
          total: nonzeros=13824, allocated nonzeros=13824
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=512, cols=512
        total: nonzeros=13824, allocated nonzeros=13824
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 32 nodes, limit used is 5
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=4096, cols=4096
        total: nonzeros=110592, allocated nonzeros=110592
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=32768, cols=32768
        total: nonzeros=884736, allocated nonzeros=884736
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=262144, cols=262144
        total: nonzeros=7077888, allocated nonzeros=7077888
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=2097152, cols=2097152
        total: nonzeros=14680064, allocated nonzeros=14680064
        total number of mallocs used during MatSetValues calls =0
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Matrix Object:   8 MPI processes
    type: mpiaij
    rows=2097152, cols=2097152
    total: nonzeros=14680064, allocated nonzeros=14680064
    total number of mallocs used during MatSetValues calls =0

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./hit on a arch-cray-xt5-pkgs-opt named nid13790 with 8 processors, by Unknown Tue Aug 13 22:37:31 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           4.048e+00      1.00012   4.048e+00
Objects:              2.490e+02      1.00000   2.490e+02
Flops:                2.663e+08      1.00000   2.663e+08  2.130e+09
Flops/sec:            6.579e+07      1.00012   6.579e+07  5.263e+08
MPI Messages:         6.820e+02      1.00000   6.820e+02  5.456e+03
MPI Message Lengths:  8.245e+06      1.00000   1.209e+04  6.596e+07
MPI Reductions:       4.580e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.0480e+00 100.0%  2.1305e+09 100.0%  5.456e+03 100.0%  1.209e+04      100.0%  4.570e+02  99.8% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecTDot               12 1.0 2.9428e-02 1.2 6.29e+06 1.0 0.0e+00 0.0e+00 1.2e+01  1  2  0  0  3   1  2  0  0  3  1710
VecNorm                9 1.0 1.0796e-02 1.2 4.72e+06 1.0 0.0e+00 0.0e+00 9.0e+00  0  2  0  0  2   0  2  0  0  2  3497
VecScale              24 1.0 2.4652e-04 1.1 1.99e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6442
VecCopy                3 1.0 5.0740e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               116 1.0 1.4349e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               12 1.0 2.8027e-02 1.0 6.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  1796
VecAYPX               29 1.0 3.0655e-02 1.4 4.16e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  1085
VecScatterBegin      123 1.0 3.5391e-02 1.1 0.00e+00 0.0 3.5e+03 1.2e+04 0.0e+00  1  0 65 66  0   1  0 65 66  0     0
VecScatterEnd        123 1.0 2.5395e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               31 1.0 2.3556e-01 1.0 5.62e+07 1.0 1.0e+03 2.3e+04 0.0e+00  6 21 19 36  0   6 21 19 36  0  1908
MatMultAdd            24 1.0 5.9044e-02 1.0 1.21e+07 1.0 5.8e+02 2.8e+03 0.0e+00  1  5 11  2  0   1  5 11  2  0  1644
MatMultTranspose      28 1.0 7.4601e-02 1.1 1.42e+07 1.0 6.7e+02 2.8e+03 0.0e+00  2  5 12  3  0   2  5 12  3  0  1518
MatSolve               6 1.0 3.8311e-03 1.0 1.44e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  3006
MatSOR                48 1.0 5.8050e-01 1.0 1.01e+08 1.0 8.6e+02 1.5e+04 4.8e+01 14 38 16 19 10  14 38 16 19 11  1390
MatLUFactorSym         1 1.0 3.0620e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatLUFactorNum         1 1.0 2.4665e-02 1.0 1.95e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  7  0  0  0   1  7  0  0  0  6329
MatAssemblyBegin      20 1.0 2.4351e-02 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01  0  0  0  0  5   0  0  0  0  5     0
MatAssemblyEnd        20 1.0 1.3176e-01 1.0 0.00e+00 0.0 5.6e+02 2.1e+03 7.2e+01  3  0 10  2 16   3  0 10  2 16     0
MatGetRowIJ            1 1.0 1.1516e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 4.1008e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               16 1.3 1.0209e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  3   0  0  0  0  3     0
MatPtAP                4 1.0 6.4001e-01 1.0 4.06e+07 1.0 1.1e+03 1.7e+04 1.0e+02 16 15 21 30 22  16 15 21 30 22   507
MatPtAPSymbolic        4 1.0 3.7003e-01 1.0 0.00e+00 0.0 7.2e+02 2.0e+04 6.0e+01  9  0 13 22 13   9  0 13 22 13     0
MatPtAPNumeric         4 1.0 2.7004e-01 1.0 4.06e+07 1.0 4.2e+02 1.2e+04 4.0e+01  7 15  8  8  9   7 15  8  8  9  1202
MatGetRedundant        1 1.0 7.9393e-04 1.0 0.00e+00 0.0 1.7e+02 7.1e+03 4.0e+00  0  0  3  2  1   0  0  3  2  1     0
MatGetLocalMat         4 1.0 3.9521e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  1  0  0  0  2   1  0  0  0  2     0
MatGetBrAoCol          4 1.0 1.7719e-02 1.0 0.00e+00 0.0 4.3e+02 2.7e+04 8.0e+00  0  0  8 18  2   0  0  8 18  2     0
MatGetSymTrans         8 1.0 1.3007e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               7 1.0 1.3097e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01  0  0  0  0  5   0  0  0  0  5     0
KSPSolve               2 1.0 1.0450e+00 1.0 2.04e+08 1.0 3.4e+03 1.2e+04 7.5e+01 26 77 62 60 16  26 77 62 60 16  1563
PCSetUp                1 1.0 8.6248e-01 1.0 6.21e+07 1.0 1.9e+03 1.1e+04 3.2e+02 21 23 35 32 69  21 23 35 32 69   576
PCApply                6 1.0 8.4384e-01 1.0 1.61e+08 1.0 3.2e+03 9.0e+03 4.8e+01 21 60 59 44 10  21 60 59 44 11  1523
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container     1              1          564     0
              Vector    99             99     47537368     0
      Vector Scatter    21             21        22092     0
              Matrix    37             37     75834272     0
   Matrix Null Space     1              1          596     0
    Distributed Mesh     5              5      2740736     0
     Bipartite Graph    10             10         7920     0
           Index Set    50             50      1546832     0
   IS L to G Mapping     5              5      1361108     0
       Krylov Solver     7              7         8616     0
     DMKSP interface     3              3         1944     0
      Preconditioner     7              7         6672     0
              Viewer     3              2         1456     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 2.43187e-06
Average time for zero size MPI_Send(): 2.38419e-06
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_view
-log_summary
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Jul 31 22:48:06 2013
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=0 --known-mpi-c-double-complex=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --with-debugging=0 --COPTFLAGS="-fastsse -Mipa=fast -mp" --CXXOPTFLAGS="-fastsse -Mipa=fast -mp" --FOPTFLAGS="-fastsse -Mipa=fast -mp" --with-blas-lapack-lib="-L/opt/acml/4.4.0/pgi64/lib -lacml -lacml_mv" --with-shared-libraries=0 --with-x=0 --with-batch --known-mpi-shared-libraries=0 PETSC_ARCH=arch-cray-xt5-pkgs-opt
-----------------------------------------
Libraries compiled on Wed Jul 31 22:48:06 2013 on krakenpf1 
Machine characteristics: Linux-2.6.27.48-0.12.1_1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /nics/c/home/mrosso/LIBS/petsc-3.4.2
Using PETSc arch: arch-cray-xt5-pkgs-opt
-----------------------------------------

Using C compiler: cc  -fastsse -Mipa=fast -mp  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -fastsse -Mipa=fast -mp   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/include -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include -I/opt/acml/4.4.0/pgi64/include -I/opt/xt-libsci/11.0.04/pgi/109/istanbul/include -I/opt/fftw/3.3.0.0/x86_64/include -I/usr/include/alps
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/lib -L/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/lib -lpetsc -L/opt/acml/4.4.0/pgi64/lib -lacml -lacml_mv -lpthread -ldl 
-----------------------------------------

#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_view
-log_summary
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_type mg
#End of PETSc Option Table entries
There are no unused options.

-------------- next part --------------

-log_summary -ksp_monitor -ksp_view -ksp_converged_reason -pc_type mg 
-pc_mg_galerkin -pc_mg_levels 5 -mg_levels_ksp_type richardson 
-mg_levels_ksp_max_it 1 -pc_mg_type full

  0 KSP Residual norm 3.654533581988e-05 
  1 KSP Residual norm 8.730776244351e-07 
  2 KSP Residual norm 3.474626061661e-08 
  3 KSP Residual norm 1.813665557493e-09 
Linear solve converged due to CONVERGED_RTOL iterations 3
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=10000
  tolerances:  relative=0.0001, absolute=1e-50, divergence=10000
  left preconditioning
  has attached null space
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: mg
    MG: type is FULL, levels=5 cycles=v
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8 MPI processes
      type: redundant
        Redundant preconditioner: First (color=0) of 8 PCs follows
      KSP Object:      (mg_coarse_redundant_)       1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_redundant_)       1 MPI processes
        type: lu
          LU: out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          using diagonal shift on blocks to prevent zero pivot
          matrix ordering: nd
          factor fill ratio given 5, needed 8.69546
            Factored matrix follows:
              Matrix Object:               1 MPI processes
                type: seqaij
                rows=512, cols=512
                package used to perform factorization: petsc
                total: nonzeros=120206, allocated nonzeros=120206
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
        linear system matrix = precond matrix:
        Matrix Object:         1 MPI processes
          type: seqaij
          rows=512, cols=512
          total: nonzeros=13824, allocated nonzeros=13824
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=512, cols=512
        total: nonzeros=13824, allocated nonzeros=13824
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 32 nodes, limit used is 5
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=4096, cols=4096
        total: nonzeros=110592, allocated nonzeros=110592
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=32768, cols=32768
        total: nonzeros=884736, allocated nonzeros=884736
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=262144, cols=262144
        total: nonzeros=7077888, allocated nonzeros=7077888
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     8 MPI processes
      type: richardson
        Richardson: damping factor=1
      maximum iterations=1
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Matrix Object:       8 MPI processes
        type: mpiaij
        rows=2097152, cols=2097152
        total: nonzeros=14680064, allocated nonzeros=14680064
        total number of mallocs used during MatSetValues calls =0
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Matrix Object:   8 MPI processes
    type: mpiaij
    rows=2097152, cols=2097152
    total: nonzeros=14680064, allocated nonzeros=14680064

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./hit on a arch-cray-xt5-pkgs-opt named nid14615 with 8 processors, by Unknown Tue Aug 13 22:44:16 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           4.261e+00      1.00012   4.261e+00
Objects:              2.950e+02      1.00000   2.950e+02
Flops:                3.322e+08      1.00000   3.322e+08  2.658e+09
Flops/sec:            7.797e+07      1.00012   7.796e+07  6.237e+08
MPI Messages:         1.442e+03      1.00000   1.442e+03  1.154e+04
MPI Message Lengths:  1.018e+07      1.00000   7.057e+03  8.141e+07
MPI Reductions:       5.460e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.2609e+00 100.0%  2.6575e+09 100.0%  1.154e+04 100.0%  7.057e+03      100.0%  5.450e+02  99.8% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecTDot               10 1.0 2.4743e-02 1.1 5.24e+06 1.0 0.0e+00 0.0e+00 1.0e+01  1  2  0  0  2   1  2  0  0  2  1695
VecNorm                8 1.0 9.9294e-03 1.3 4.19e+06 1.0 0.0e+00 0.0e+00 8.0e+00  0  1  0  0  1   0  1  0  0  1  3379
VecScale              70 1.0 4.9663e-04 1.1 3.86e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  6222
VecCopy                3 1.0 5.0108e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               271 1.0 1.0437e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               10 1.0 2.3400e-02 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  1792
VecAYPX               54 1.0 2.5038e-02 1.5 3.55e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1133
VecScatterBegin      324 1.0 4.1335e-02 1.1 0.00e+00 0.0 9.6e+03 6.1e+03 0.0e+00  1  0 83 72  0   1  0 83 72  0     0
VecScatterEnd        324 1.0 4.4111e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatMult               76 1.0 2.8557e-01 1.1 6.73e+07 1.0 2.5e+03 9.8e+03 0.0e+00  6 20 22 31  0   6 20 22 31  0  1884
MatMultAdd            50 1.0 5.5734e-02 1.0 1.15e+07 1.0 1.2e+03 1.5e+03 0.0e+00  1  3 10  2  0   1  3 10  2  0  1657
MatMultTranspose      74 1.0 1.2116e-01 1.2 2.37e+07 1.0 1.8e+03 1.9e+03 0.0e+00  3  7 15  4  0   3  7 15  4  0  1563
MatSolve              25 1.0 1.3877e-02 1.0 6.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  3458
MatSOR               100 1.0 7.1429e-01 1.1 1.45e+08 1.0 2.6e+03 9.4e+03 1.4e+02 16 44 23 30 26  16 44 23 30 26  1628
MatLUFactorSym         1 1.0 3.0639e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatLUFactorNum         1 1.0 2.4523e-02 1.0 1.95e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  6  0  0  0   1  6  0  0  0  6366
MatAssemblyBegin      20 1.0 3.1168e-02 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01  0  0  0  0  4   0  0  0  0  4     0
MatAssemblyEnd        20 1.0 1.3784e-01 1.1 0.00e+00 0.0 5.6e+02 2.1e+03 7.2e+01  3  0  5  1 13   3  0  5  1 13     0
MatGetRowIJ            1 1.0 1.1015e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 4.0793e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               16 1.3 1.0140e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  2   0  0  0  0  2     0
MatPtAP                4 1.0 6.4115e-01 1.0 4.06e+07 1.0 1.1e+03 1.7e+04 1.0e+02 15 12 10 24 18  15 12 10 24 18   506
MatPtAPSymbolic        4 1.0 3.7106e-01 1.0 0.00e+00 0.0 7.2e+02 2.0e+04 6.0e+01  9  0  6 18 11   9  0  6 18 11     0
MatPtAPNumeric         4 1.0 2.7011e-01 1.0 4.06e+07 1.0 4.2e+02 1.2e+04 4.0e+01  6 12  4  6  7   6 12  4  6  7  1202
MatGetRedundant        1 1.0 8.1611e-04 1.0 0.00e+00 0.0 1.7e+02 7.1e+03 4.0e+00  0  0  1  1  1   0  0  1  1  1     0
MatGetLocalMat         4 1.0 3.9911e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  1  0  0  0  1   1  0  0  0  1     0
MatGetBrAoCol          4 1.0 1.7765e-02 1.0 0.00e+00 0.0 4.3e+02 2.7e+04 8.0e+00  0  0  4 14  1   0  0  4 14  1     0
MatGetSymTrans         8 1.0 1.3194e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               7 1.0 1.4666e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01  0  0  0  0  4   0  0  0  0  4     0
KSPSolve               2 1.0 1.2287e+00 1.0 2.70e+08 1.0 9.5e+03 5.8e+03 1.6e+02 29 81 82 68 30  29 81 82 68 30  1758
PCSetUp                1 1.0 8.6414e-01 1.0 6.21e+07 1.0 1.9e+03 1.1e+04 3.2e+02 20 19 17 26 58  20 19 17 26 58   575
PCApply                5 1.0 1.0571e+00 1.0 2.33e+08 1.0 9.3e+03 4.9e+03 1.4e+02 24 70 81 56 26  24 70 81 56 26  1764
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container     1              1          564     0
              Vector   145            145     58892872     0
      Vector Scatter    21             21        22092     0
              Matrix    37             37     75834272     0
   Matrix Null Space     1              1          596     0
    Distributed Mesh     5              5      2740736     0
     Bipartite Graph    10             10         7920     0
           Index Set    50             50      1546832     0
   IS L to G Mapping     5              5      1361108     0
       Krylov Solver     7              7         8616     0
     DMKSP interface     3              3         1944     0
      Preconditioner     7              7         6672     0
              Viewer     3              2         1456     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 6.58035e-06
Average time for zero size MPI_Send(): 4.02331e-06
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_view
-log_summary
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_type full
-pc_type mg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Jul 31 22:48:06 2013
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=0 --known-mpi-c-double-complex=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-clib-autodetect=0 --with-cxxlib-autodetect=0 --with-fortranlib-autodetect=0 --with-debugging=0 --COPTFLAGS="-fastsse -Mipa=fast -mp" --CXXOPTFLAGS="-fastsse -Mipa=fast -mp" --FOPTFLAGS="-fastsse -Mipa=fast -mp" --with-blas-lapack-lib="-L/opt/acml/4.4.0/pgi64/lib -lacml -lacml_mv" --with-shared-libraries=0 --with-x=0 --with-batch --known-mpi-shared-libraries=0 PETSC_ARCH=arch-cray-xt5-pkgs-opt
-----------------------------------------
Libraries compiled on Wed Jul 31 22:48:06 2013 on krakenpf1 
Machine characteristics: Linux-2.6.27.48-0.12.1_1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /nics/c/home/mrosso/LIBS/petsc-3.4.2
Using PETSc arch: arch-cray-xt5-pkgs-opt
-----------------------------------------

Using C compiler: cc  -fastsse -Mipa=fast -mp  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -fastsse -Mipa=fast -mp   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/include -I/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/include -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include -I/opt/acml/4.4.0/pgi64/include -I/opt/xt-libsci/11.0.04/pgi/109/istanbul/include -I/opt/fftw/3.3.0.0/x86_64/include -I/usr/include/alps
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/lib -L/nics/c/home/mrosso/LIBS/petsc-3.4.2/arch-cray-xt5-pkgs-opt/lib -lpetsc -L/opt/acml/4.4.0/pgi64/lib -lacml -lacml_mv -lpthread -ldl 
-----------------------------------------

#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor
-ksp_view
-log_summary
-mg_levels_ksp_max_it 1
-mg_levels_ksp_type richardson
-options_left
-pc_mg_galerkin
-pc_mg_levels 5
-pc_mg_type full
-pc_type mg
#End of PETSc Option Table entries
There are no unused options.
Application 6640063 resources: utime ~45s, stime ~2s