[petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve

Sun Jan 8 16:41:37 CST 2017

Ok, i just did the streams and log_summary tests, im attaching the output
for each run, with NPMAX=4 and NPMAX=32, also -log_summary runs with
-pc_type hypre and without it, with 1 and 2 cores, all of this with
debugging turned off.

The matrix is 200,000x200,000, full curvilinear 3d meshes, non-hydrostatic
pressure solver.

Thanks a lot for your insight,

Manuel

On Sun, Jan 8, 2017 at 9:48 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   we need to see the -log_summary with hypre on 1 and 2 processes (with
> debugging tuned off) also we need to see the output from
>
>    make stream NPMAX=4
>
> run in the PETSc directory.
>
>
>
> > On Jan 7, 2017, at 7:38 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> >
> > Ok great, i tried those command line args and this is the result:
> >
> > when i use -pc_type gamg:
> >
> > [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [1]PETSC ERROR: Petsc has generated inconsistent data
> > [1]PETSC ERROR: Have un-symmetric graph (apparently). Use
> '-pc_gamg_sym_graph true' to symetrize the graph or '-pc_gamg_threshold
> -1.0' if the matrix is structurally symmetric.
> > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [1]PETSC ERROR: Petsc Release Version 3.7.4, unknown
> > [1]PETSC ERROR: ./ucmsMR on a arch-linux2-c-debug named ocean by valera
> Sat Jan  7 17:35:05 2017
> > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5
> --download-netcdf --download-hypre --download-metis --download-parmetis
> --download-trillinos
> > [1]PETSC ERROR: #1 smoothAggs() line 462 in /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/agg.c
> > [1]PETSC ERROR: #2 PCGAMGCoarsen_AGG() line 998 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
> > [1]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/gamg.c
> > [1]PETSC ERROR: #4 PCSetUp() line 968 in /usr/dataC/home/valera/petsc/
> src/ksp/pc/interface/precon.c
> > [1]PETSC ERROR: #5 KSPSetUp() line 390 in /usr/dataC/home/valera/petsc/
> src/ksp/ksp/interface/itfunc.c
> > application called MPI_Abort(comm=0x84000002, 77) - process 1
> >
> >
> > when i use -pc_type gamg and -pc_gamg_sym_graph true:
> >
> >  ------------------------------------------------------------
> ------------
> > [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
> Exception,probably divide by zero
> > [0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind
> > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
> OS X to find memory corruption errors
> > [1]PETSC ERROR: ------------------------------
> ------------------------------------------
> > [1]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> > [1]PETSC ERROR:       INSTEAD the line number of the start of the
> function
> > [1]PETSC ERROR:       is given.
> > [1]PETSC ERROR: [1] LAPACKgesvd line 42 /usr/dataC/home/valera/petsc/
> src/ksp/ksp/impls/gmres/gmreig.c
> > [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues_GMRES line 24
> /usr/dataC/home/valera/petsc/src/ksp/ksp/impls/gmres/gmreig.c
> > [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues line 51
> /usr/dataC/home/valera/petsc/src/ksp/ksp/interface/itfunc.c
> > [1]PETSC ERROR: [1] PCGAMGOptProlongator_AGG line 1187
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
> > [1]PETSC ERROR: [1] PCSetUp_GAMG line 472 /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/gamg.c
> > [1]PETSC ERROR: [1] PCSetUp line 930 /usr/dataC/home/valera/petsc/
> src/ksp/pc/interface/precon.c
> > [1]PETSC ERROR: [1] KSPSetUp line 305 /usr/dataC/home/valera/petsc/
> src/ksp/ksp/interface/itfunc.c
> > [0] PCGAMGOptProlongator_AGG line 1187 /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/agg.c
> > [0]PETSC ERROR: [0] PCSetUp_GAMG line 472 /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/gamg.c
> > [0]PETSC ERROR: [0] PCSetUp line 930 /usr/dataC/home/valera/petsc/
> src/ksp/pc/interface/precon.c
> > [0]PETSC ERROR: [0] KSPSetUp line 305 /usr/dataC/home/valera/petsc/
> src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> >
> > when i use -pc_type hypre it actually shows something different on
> -ksp_view :
> >
> > KSP Object: 2 MPI processes
> >   type: gcr
> >     GCR: restart = 30
> >     GCR: restarts performed = 37
> >   maximum iterations=10000, initial guess is zero
> >   tolerances:  relative=1e-14, absolute=1e-50, divergence=10000.
> >   right preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 2 MPI processes
> >   type: hypre
> >     HYPRE BoomerAMG preconditioning
> >     HYPRE BoomerAMG: Cycle type V
> >     HYPRE BoomerAMG: Maximum number of levels 25
> >     HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
> >     HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
> >     HYPRE BoomerAMG: Threshold for strong coupling 0.25
> >     HYPRE BoomerAMG: Interpolation truncation factor 0.
> >     HYPRE BoomerAMG: Interpolation: max elements per row 0
> >     HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
> >     HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
> >     HYPRE BoomerAMG: Maximum row sums 0.9
> >     HYPRE BoomerAMG: Sweeps down         1
> >     HYPRE BoomerAMG: Sweeps up           1
> >     HYPRE BoomerAMG: Sweeps on coarse    1
> >     HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
> >     HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
> >     HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
> >     HYPRE BoomerAMG: Relax weight  (all)      1.
> >     HYPRE BoomerAMG: Outer relax weight (all) 1.
> >     HYPRE BoomerAMG: Using CF-relaxation
> >     HYPRE BoomerAMG: Not using more complex smoothers.
> >     HYPRE BoomerAMG: Measure type        local
> >     HYPRE BoomerAMG: Coarsen type        Falgout
> >     HYPRE BoomerAMG: Interpolation type  classical
> >     HYPRE BoomerAMG: Using nodal coarsening (with
> HYPRE_BOOMERAMGSetNodal() 1
> >     HYPRE BoomerAMG: HYPRE_BoomerAMGSetInterpVecVariant() 1
> >   linear system matrix = precond matrix:
> >   Mat Object:   2 MPI processes
> >     type: mpiaij
> >     rows=200000, cols=200000
> >     total: nonzeros=3373340, allocated nonzeros=3373340
> >     total number of mallocs used during MatSetValues calls =0
> >       not using I-node (on process 0) routines
> >
> >
> > but still the timing is terrible.
> >
> >
> >
> >
> > On Sat, Jan 7, 2017 at 5:28 PM, Jed Brown <jed at jedbrown.org> wrote:
> > Manuel Valera <mvalera at mail.sdsu.edu> writes:
> >
> > > Awesome Matt and Jed,
> > >
> > > The GCR is used because the matrix is not invertible and because this
> was
> > > the algorithm that the previous library used,
> > >
> > > The Preconditioned im aiming to use is multigrid, i thought i
> configured
> > > the hypre-boomerAmg solver for this, but i agree in that it doesn't
> show in
> > > the log anywhere, how can i be sure is being used ? i sent -ksp_view
> log
> > > before in this thread
> >
> > Did you run with -pc_type hypre?
> >
> > > I had a problem with the matrix block sizes so i couldn't make the
> petsc
> > > native multigrid solver to work,
> >
> > What block sizes?  If the only variable is pressure, the block size
> > would be 1 (default).
> >
> > > This is a nonhidrostatic pressure solver, it is an elliptic problem so
> > > multigrid is a must,
> >
> > Yes, multigrid should work well.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170108/d7144614/attachment-0001.html>
-------------- next part --------------
 WARNING:   -log_summary is being deprecated; switch to -log_view

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ucmsMR on a arch-linux2-c-debug named ocean with 1 processor, by valera Sun Jan  8 14:24:49 2017
Using Petsc Release Version 3.7.4, unknown 

                         Max       Max/Min        Avg      Total 
Time (sec):           3.386e+01      1.00000   3.386e+01
Objects:              8.100e+01      1.00000   8.100e+01
Flops:                3.820e+10      1.00000   3.820e+10  3.820e+10
Flops/sec:            1.128e+09      1.00000   1.128e+09  1.128e+09
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.3859e+01 100.0%  3.8199e+10 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDotNorm2         1462 1.0 8.0159e-01 1.0 1.17e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  3  0  0  0   2  3  0  0  0  1459
VecMDot             1411 1.0 2.3061e+00 1.0 8.38e+09 1.0 0.0e+00 0.0e+00 0.0e+00  7 22  0  0  0   7 22  0  0  0  3633
VecNorm             1337 1.0 2.3786e-01 1.0 5.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2248
VecScale            2924 1.0 3.3265e-01 1.0 5.85e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  1758
VecSet              1538 1.0 2.1733e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             2924 1.0 5.2265e-01 1.0 1.17e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  3  0  0  0   2  3  0  0  0  2238
VecAYPX                5 1.0 1.2798e-03 1.0 1.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   781
VecMAXPY            2822 1.0 4.9800e+00 1.0 1.68e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 44  0  0  0  15 44  0  0  0  3365
VecAssemblyBegin       7 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         7 1.0 7.1526e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin        5 1.0 1.6944e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult             1467 1.0 4.9397e+00 1.0 9.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 25  0  0  0  15 25  0  0  0  1944
MatConvert             2 1.0 6.1032e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       3 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 1.5736e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 1.0 2.3842e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                1 1.0 6.9497e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                1 1.0 1.2220e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 5.7235e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               5 1.0 2.6259e+01 1.0 3.82e+10 1.0 0.0e+00 0.0e+00 0.0e+00 78100  0  0  0  78100  0  0  0  1455
PCSetUp                2 1.0 3.8668e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
PCApply             1463 1.0 1.2196e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 36  0  0  0  0  36  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    69              6      9609168     0.
      Vector Scatter     1              1          656     0.
              Matrix     3              1       803112     0.
   Matrix Null Space     1              1          592     0.
              Viewer     3              1          816     0.
       Krylov Solver     1              0            0     0.
      Preconditioner     2              1         1384     0.
           Index Set     1              1          776     0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
#PETSc Option Table entries:
-log_summary
-matload_block_size 1
-pc_hypre_boomeramg_nodal_coarsen 1
-pc_hypre_boomeramg_vec_interp_variant 1
-pc_type hypre
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5 --download-netcdf --download-hypre --download-metis --download-parmetis --download-trillinos --with-debugging=no
-----------------------------------------
Libraries compiled on Sun Jan  8 14:06:45 2017 on ocean 
Machine characteristics: Linux-3.10.0-327.36.3.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc    -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90    -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include -I/home/valera/petsc/include -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include
-----------------------------------------

Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack -lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl 
-----------------------------------------
-------------- next part --------------
 WARNING:   -log_summary is being deprecated; switch to -log_view

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ucmsMR on a arch-linux2-c-debug named ocean with 1 processor, by valera Sun Jan  8 14:33:19 2017
Using Petsc Release Version 3.7.4, unknown 

                         Max       Max/Min        Avg      Total 
Time (sec):           9.016e+00      1.00000   9.016e+00
Objects:              8.300e+01      1.00000   8.300e+01
Flops:                5.021e+09      1.00000   5.021e+09  5.021e+09
Flops/sec:            5.569e+08      1.00000   5.569e+08  5.569e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 9.0160e+00 100.0%  5.0209e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDotNorm2          155 1.0 8.4427e-02 1.0 1.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  1469
VecMDot              145 1.0 2.3651e-01 1.0 8.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3 17  0  0  0   3 17  0  0  0  3679
VecNorm               30 1.0 5.3427e-03 1.0 1.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2246
VecScale             310 1.0 3.4916e-02 1.0 6.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1776
VecSet                74 1.0 4.6141e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY              310 1.0 5.4786e-02 1.0 1.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  2263
VecAYPX                5 1.0 1.1373e-03 1.0 1.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   879
VecMAXPY             290 1.0 5.1323e-01 1.0 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00  6 35  0  0  0   6 35  0  0  0  3390
VecAssemblyBegin       7 1.0 2.3842e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         7 1.0 7.1526e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin        5 1.0 1.0016e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              160 1.0 5.3934e-01 1.0 1.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00  6 21  0  0  0   6 21  0  0  0  1942
MatSolve             155 1.0 6.4364e-01 1.0 1.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00  7 20  0  0  0   7 20  0  0  0  1577
MatLUFactorNum         1 1.0 3.0440e-02 1.0 2.57e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   843
MatILUFactorSym        1 1.0 1.4438e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             1 1.0 1.7564e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       3 1.0 7.1526e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 1.3664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 1.0 1.4305e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2245e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                1 1.0 6.2274e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatView                1 1.0 1.1229e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
KSPSetUp               1 1.0 4.2639e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               5 1.0 2.1151e+00 1.0 5.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 23 99  0  0  0  23 99  0  0  0  2362
PCSetUp                2 1.0 2.8751e-01 1.0 2.57e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3  1  0  0  0   3  1  0  0  0    89
PCApply              156 1.0 6.9180e-01 1.0 1.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00  8 20  0  0  0   8 20  0  0  0  1467
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    67              4      6406112     0.
      Vector Scatter     1              1          656     0.
              Matrix     4              1       803112     0.
   Matrix Null Space     1              1          592     0.
              Viewer     3              1          816     0.
       Krylov Solver     1              0            0     0.
      Preconditioner     2              1         1384     0.
           Index Set     4              1          776     0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
#PETSc Option Table entries:
-log_summary
-matload_block_size 1
-pc_hypre_boomeramg_nodal_coarsen 1
-pc_hypre_boomeramg_vec_interp_variant 1
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5 --download-netcdf --download-hypre --download-metis --download-parmetis --download-trillinos --with-debugging=no
-----------------------------------------
Libraries compiled on Sun Jan  8 14:06:45 2017 on ocean 
Machine characteristics: Linux-3.10.0-327.36.3.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc    -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90    -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include -I/home/valera/petsc/include -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include
-----------------------------------------

Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack -lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl 
-----------------------------------------
-------------- next part --------------
 WARNING:   -log_summary is being deprecated; switch to -log_view

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ucmsMR on a arch-linux2-c-debug named ocean with 2 processors, by valera Sun Jan  8 14:27:52 2017
Using Petsc Release Version 3.7.4, unknown 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.558e+01      1.01638   2.537e+01
Objects:              8.700e+01      1.00000   8.700e+01
Flops:                2.296e+10      1.00000   2.296e+10  4.592e+10
Flops/sec:            9.123e+08      1.01638   9.050e+08  1.810e+09
MPI Messages:         1.768e+03      1.00000   1.768e+03  3.535e+03
MPI Message Lengths:  4.961e+07      1.00000   2.807e+04  9.922e+07
MPI Reductions:       5.153e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.5372e+01 100.0%  4.5918e+10 100.0%  3.535e+03 100.0%  2.807e+04      100.0%  5.152e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDotNorm2         1759 1.0 7.5034e-01 1.1 7.04e+08 1.0 0.0e+00 0.0e+00 1.8e+03  3  3  0  0 34   3  3  0  0 34  1875
VecMDot             1698 1.0 2.1292e+00 1.2 5.03e+09 1.0 0.0e+00 0.0e+00 1.7e+03  8 22  0  0 33   8 22  0  0 33  4727
VecNorm             1634 1.0 2.0453e-01 1.1 3.27e+08 1.0 0.0e+00 0.0e+00 1.6e+03  1  1  0  0 32   1  1  0  0 32  3196
VecScale            3518 1.0 2.1820e-01 1.0 3.52e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  3225
VecSet              1769 1.0 1.5442e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             3518 1.0 3.4378e-01 1.0 7.04e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0  4093
VecAYPX                5 1.0 1.2600e-03 2.1 5.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   794
VecMAXPY            3396 1.0 3.4249e+00 1.0 1.01e+10 1.0 0.0e+00 0.0e+00 0.0e+00 13 44  0  0  0  13 44  0  0  0  5878
VecAssemblyBegin      12 1.7 2.9197e-03 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd        12 1.7 1.1683e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     1774 1.0 2.3620e+0037.3 0.00e+00 0.0 3.5e+03 2.2e+04 1.0e+01  5  0100 79  0   5  0100 79  0     0
VecScatterEnd       1764 1.0 8.7893e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult             1764 1.0 3.4805e+00 1.0 5.77e+09 1.0 3.5e+03 2.0e+04 0.0e+00 14 25100 71  0  14 25100 71  0  3318
MatConvert             2 1.0 2.9602e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 6.1677e-02384.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 8.9786e-03 1.0 0.00e+00 0.0 4.0e+00 5.0e+03 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            4 1.0 2.6226e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                1 1.0 1.7685e-01 1.0 0.00e+00 0.0 7.0e+00 3.0e+06 1.3e+01  1  0  0 21  0   1  0  0 21  0     0
KSPSetUp               1 1.0 1.9739e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               5 1.0 1.8353e+01 1.0 2.30e+10 1.0 3.5e+03 2.0e+04 5.1e+03 72100100 71 99  72100100 71 99  2502
PCSetUp                2 1.0 6.5126e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  3  0  0  0  0   3  0  0  0  0     0
PCApply             1760 1.0 8.2935e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 32  0  0  0  0  32  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    72              8      7212944     0.
      Vector Scatter     3              2         1312     0.
              Matrix     3              0            0     0.
              Viewer     2              0            0     0.
           Index Set     4              4        13104     0.
       Krylov Solver     1              0            0     0.
      Preconditioner     2              1         1384     0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
Average time for MPI_Barrier(): 1.99318e-05
Average time for zero size MPI_Send(): 9.41753e-06
#PETSc Option Table entries:
-log_summary
-matload_block_size 1
-pc_hypre_boomeramg_nodal_coarsen 1
-pc_hypre_boomeramg_vec_interp_variant 1
-pc_type hypre
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5 --download-netcdf --download-hypre --download-metis --download-parmetis --download-trillinos --with-debugging=no
-----------------------------------------
Libraries compiled on Sun Jan  8 14:06:45 2017 on ocean 
Machine characteristics: Linux-3.10.0-327.36.3.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc    -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90    -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include -I/home/valera/petsc/include -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include
-----------------------------------------

Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack -lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl 
-------------- next part --------------
 WARNING:   -log_summary is being deprecated; switch to -log_view

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ucmsMR on a arch-linux2-c-debug named ocean with 2 processors, by valera Sun Jan  8 14:32:12 2017
Using Petsc Release Version 3.7.4, unknown 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.241e+01      1.03508   1.220e+01
Objects:              9.300e+01      1.00000   9.300e+01
Flops:                8.662e+09      1.00000   8.662e+09  1.732e+10
Flops/sec:            7.222e+08      1.03508   7.100e+08  1.420e+09
MPI Messages:         5.535e+02      1.00000   5.535e+02  1.107e+03
MPI Message Lengths:  2.533e+07      1.00000   4.576e+04  5.066e+07
MPI Reductions:       1.548e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.2204e+01 100.0%  1.7325e+10 100.0%  1.107e+03 100.0%  4.576e+04      100.0%  1.547e+03  99.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDotNorm2          545 1.0 4.1106e-01 1.4 2.18e+08 1.0 0.0e+00 0.0e+00 5.4e+02  3  3  0  0 35   3  3  0  0 35  1061
VecMDot              525 1.0 9.6894e-01 1.5 1.48e+09 1.0 0.0e+00 0.0e+00 5.2e+02  7 17  0  0 34   7 17  0  0 34  3061
VecNorm              420 1.0 8.5726e-02 1.4 8.40e+07 1.0 0.0e+00 0.0e+00 4.2e+02  1  1  0  0 27   1  1  0  0 27  1960
VecScale            1090 1.0 8.5441e-02 1.0 1.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2551
VecSet               555 1.0 5.4937e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             1090 1.0 1.2735e-01 1.1 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0  3424
VecAYPX                5 1.0 1.4422e-03 2.4 5.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   693
VecMAXPY            1050 1.0 1.1876e+00 1.1 2.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00  9 34  0  0  0   9 34  0  0  0  4994
VecAssemblyBegin      12 1.7 3.0236e-03 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd        12 1.7 1.1206e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      560 1.0 2.3305e+0099.8 0.00e+00 0.0 1.1e+03 2.7e+04 1.0e+01 10  0 99 59  1  10  0 99 59  1     0
VecScatterEnd        550 1.0 5.8907e-02 5.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              550 1.0 1.2700e+00 1.1 1.80e+09 1.0 1.1e+03 2.0e+04 0.0e+00 10 21 99 43  0  10 21 99 43  0  2835
MatSolve             545 1.0 1.4914e+00 1.2 1.77e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 20  0  0  0  11 20  0  0  0  2375
MatLUFactorNum         1 1.0 3.4873e-02 2.3 1.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   728
MatILUFactorSym        1 1.0 1.3488e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             1 1.0 1.7024e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 6.1171e-02344.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 9.1953e-03 1.0 0.00e+00 0.0 4.0e+00 5.0e+03 8.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatGetRowIJ            3 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.6165e-03 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                1 1.0 1.7585e-01 1.0 0.00e+00 0.0 7.0e+00 3.0e+06 1.3e+01  1  0  1 41  1   1  0  1 41  1     0
KSPSetUp               2 1.0 1.9348e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               5 1.0 5.2705e+00 1.0 8.66e+09 1.0 1.1e+03 2.0e+04 1.5e+03 43100 99 43 96  43100 99 43 96  3287
PCSetUp                3 1.0 6.7521e-01 1.0 1.27e+07 1.0 0.0e+00 0.0e+00 4.0e+00  5  0  0  0  0   5  0  0  0  0    38
PCSetUpOnBlocks        5 1.0 5.0066e-02 2.2 1.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   507
PCApply              546 1.0 1.5895e+00 1.2 1.77e+09 1.0 0.0e+00 0.0e+00 0.0e+00 12 20  0  0  0  12 20  0  0  0  2229
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    72              6      5609648     0.
      Vector Scatter     3              2         1312     0.
              Matrix     4              0            0     0.
              Viewer     2              0            0     0.
           Index Set     7              4        13104     0.
       Krylov Solver     2              0            0     0.
      Preconditioner     3              1         1384     0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
Average time for MPI_Barrier(): 1.95503e-05
Average time for zero size MPI_Send(): 1.03712e-05
#PETSc Option Table entries:
-log_summary
-matload_block_size 1
-pc_hypre_boomeramg_nodal_coarsen 1
-pc_hypre_boomeramg_vec_interp_variant 1
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5 --download-netcdf --download-hypre --download-metis --download-parmetis --download-trillinos --with-debugging=no
-----------------------------------------
Libraries compiled on Sun Jan  8 14:06:45 2017 on ocean 
Machine characteristics: Linux-3.10.0-327.36.3.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc    -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90    -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include -I/home/valera/petsc/include -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include
-----------------------------------------

Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack -lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl 
-----------------------------------------
-------------- next part --------------
[valera at ocean petsc]$ make stream NPMAX=4
cd src/benchmarks/streams; /usr/bin/gmake  --no-print-directory PETSC_DIR=/home/valera/petsc PETSC_ARCH=arch-linux2-c-debug stream
/home/valera/petsc/arch-linux2-c-debug/bin/mpicc -o MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O   -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include    `pwd`/MPIVersion.c
Running streams with '/home/valera/petsc/arch-linux2-c-debug/bin/mpiexec ' using 'NPMAX=4' 
Number of MPI processes 1 Processor names  ocean 
Triad:         5998.4683   Rate (MB/s) 
Number of MPI processes 2 Processor names  ocean ocean 
Triad:        23010.7259   Rate (MB/s) 
Number of MPI processes 3 Processor names  ocean ocean ocean 
Triad:         6295.2156   Rate (MB/s) 
Number of MPI processes 4 Processor names  ocean ocean ocean ocean 
Triad:         7019.8170   Rate (MB/s) 
------------------------------------------------
np  speedup
1 1.0
2 3.84
3 1.05
4 1.17
Estimation of possible speedup of MPI programs based on Streams benchmark.
It appears you have 1 node(s)
Unable to open matplotlib to plot speedup
-------------- next part --------------
[valera at ocean petsc]$ make PETSC_DIR=/home/valera/petsc PETSC_ARCH=arch-linux2-c-debug streams
cd src/benchmarks/streams; /usr/bin/gmake  --no-print-directory PETSC_DIR=/home/valera/petsc PETSC_ARCH=arch-linux2-c-debug streams
/home/valera/petsc/arch-linux2-c-debug/bin/mpicc -o MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O   -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include    `pwd`/MPIVersion.c
Running streams with '/home/valera/petsc/arch-linux2-c-debug/bin/mpiexec ' using 'NPMAX=32' 
Number of MPI processes 1 Processor names  ocean 
Triad:        11830.2146   Rate (MB/s) 
Number of MPI processes 2 Processor names  ocean ocean 
Triad:        23111.7734   Rate (MB/s) 
Number of MPI processes 3 Processor names  ocean ocean ocean 
Triad:         6692.7679   Rate (MB/s) 
Number of MPI processes 4 Processor names  ocean ocean ocean ocean 
Triad:         7043.7175   Rate (MB/s) 
Number of MPI processes 5 Processor names  ocean ocean ocean ocean ocean 
Triad:        33053.3434   Rate (MB/s) 
Number of MPI processes 6 Processor names  ocean ocean ocean ocean ocean ocean 
Triad:        33129.8788   Rate (MB/s) 
Number of MPI processes 7 Processor names  ocean ocean ocean ocean ocean ocean ocean 
Triad:        32379.8370   Rate (MB/s) 
Number of MPI processes 8 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        31644.3971   Rate (MB/s) 
Number of MPI processes 9 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        30214.8803   Rate (MB/s) 
Number of MPI processes 10 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        31700.6859   Rate (MB/s) 
Number of MPI processes 11 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        32369.1251   Rate (MB/s) 
Number of MPI processes 12 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:         7677.4204   Rate (MB/s) 
Number of MPI processes 13 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        33298.0308   Rate (MB/s) 
Number of MPI processes 14 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        33220.6717   Rate (MB/s) 
Number of MPI processes 15 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:         7334.5064   Rate (MB/s) 
Number of MPI processes 16 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:         7463.6337   Rate (MB/s) 
Number of MPI processes 17 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        14108.2617   Rate (MB/s) 
Number of MPI processes 18 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        29450.3077   Rate (MB/s) 
Number of MPI processes 19 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:         8997.3655   Rate (MB/s) 
Number of MPI processes 20 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:         9334.5314   Rate (MB/s) 
Number of MPI processes 21 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        30470.9315   Rate (MB/s) 
Number of MPI processes 22 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        19822.1616   Rate (MB/s) 
Number of MPI processes 23 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        32290.0731   Rate (MB/s) 
Number of MPI processes 24 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        11822.5303   Rate (MB/s) 
Number of MPI processes 25 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        11310.1869   Rate (MB/s) 
Number of MPI processes 26 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        33472.8900   Rate (MB/s) 
Number of MPI processes 27 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        30328.8841   Rate (MB/s) 
Number of MPI processes 28 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        31779.9057   Rate (MB/s) 
Number of MPI processes 29 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        28275.6109   Rate (MB/s) 
Number of MPI processes 30 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        16085.2815   Rate (MB/s) 
Number of MPI processes 31 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        13092.9246   Rate (MB/s) 
Number of MPI processes 32 Processor names  ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean 
Triad:        18967.6909   Rate (MB/s) 
------------------------------------------------
np  speedup
1 1.0
2 1.95
3 0.57
4 0.6
5 2.79
6 2.8
7 2.74
8 2.67
9 2.55
10 2.68
11 2.74
12 0.65
13 2.81
14 2.81
15 0.62
16 0.63
17 1.19
18 2.49
19 0.76
20 0.79
21 2.58
22 1.68
23 2.73
24 1.0
25 0.96
26 2.83
27 2.56
28 2.69
29 2.39
30 1.36
31 1.11
32 1.6
Estimation of possible speedup of MPI programs based on Streams benchmark.
It appears you have 1 node(s)
Unable to open matplotlib to plot speedup