[petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve
Manuel Valera
mvalera at mail.sdsu.edu
Sun Jan 8 16:41:37 CST 2017
Ok, i just did the streams and log_summary tests, im attaching the output
for each run, with NPMAX=4 and NPMAX=32, also -log_summary runs with
-pc_type hypre and without it, with 1 and 2 cores, all of this with
debugging turned off.
The matrix is 200,000x200,000, full curvilinear 3d meshes, non-hydrostatic
pressure solver.
Thanks a lot for your insight,
Manuel
On Sun, Jan 8, 2017 at 9:48 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> we need to see the -log_summary with hypre on 1 and 2 processes (with
> debugging tuned off) also we need to see the output from
>
> make stream NPMAX=4
>
> run in the PETSc directory.
>
>
>
> > On Jan 7, 2017, at 7:38 PM, Manuel Valera <mvalera at mail.sdsu.edu> wrote:
> >
> > Ok great, i tried those command line args and this is the result:
> >
> > when i use -pc_type gamg:
> >
> > [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [1]PETSC ERROR: Petsc has generated inconsistent data
> > [1]PETSC ERROR: Have un-symmetric graph (apparently). Use
> '-pc_gamg_sym_graph true' to symetrize the graph or '-pc_gamg_threshold
> -1.0' if the matrix is structurally symmetric.
> > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [1]PETSC ERROR: Petsc Release Version 3.7.4, unknown
> > [1]PETSC ERROR: ./ucmsMR on a arch-linux2-c-debug named ocean by valera
> Sat Jan 7 17:35:05 2017
> > [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5
> --download-netcdf --download-hypre --download-metis --download-parmetis
> --download-trillinos
> > [1]PETSC ERROR: #1 smoothAggs() line 462 in /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/agg.c
> > [1]PETSC ERROR: #2 PCGAMGCoarsen_AGG() line 998 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
> > [1]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/gamg.c
> > [1]PETSC ERROR: #4 PCSetUp() line 968 in /usr/dataC/home/valera/petsc/
> src/ksp/pc/interface/precon.c
> > [1]PETSC ERROR: #5 KSPSetUp() line 390 in /usr/dataC/home/valera/petsc/
> src/ksp/ksp/interface/itfunc.c
> > application called MPI_Abort(comm=0x84000002, 77) - process 1
> >
> >
> > when i use -pc_type gamg and -pc_gamg_sym_graph true:
> >
> > ------------------------------------------------------------
> ------------
> > [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
> Exception,probably divide by zero
> > [0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/
> documentation/faq.html#valgrind
> > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
> OS X to find memory corruption errors
> > [1]PETSC ERROR: ------------------------------
> ------------------------------------------
> > [1]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> > [1]PETSC ERROR: INSTEAD the line number of the start of the
> function
> > [1]PETSC ERROR: is given.
> > [1]PETSC ERROR: [1] LAPACKgesvd line 42 /usr/dataC/home/valera/petsc/
> src/ksp/ksp/impls/gmres/gmreig.c
> > [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues_GMRES line 24
> /usr/dataC/home/valera/petsc/src/ksp/ksp/impls/gmres/gmreig.c
> > [1]PETSC ERROR: [1] KSPComputeExtremeSingularValues line 51
> /usr/dataC/home/valera/petsc/src/ksp/ksp/interface/itfunc.c
> > [1]PETSC ERROR: [1] PCGAMGOptProlongator_AGG line 1187
> /usr/dataC/home/valera/petsc/src/ksp/pc/impls/gamg/agg.c
> > [1]PETSC ERROR: [1] PCSetUp_GAMG line 472 /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/gamg.c
> > [1]PETSC ERROR: [1] PCSetUp line 930 /usr/dataC/home/valera/petsc/
> src/ksp/pc/interface/precon.c
> > [1]PETSC ERROR: [1] KSPSetUp line 305 /usr/dataC/home/valera/petsc/
> src/ksp/ksp/interface/itfunc.c
> > [0] PCGAMGOptProlongator_AGG line 1187 /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/agg.c
> > [0]PETSC ERROR: [0] PCSetUp_GAMG line 472 /usr/dataC/home/valera/petsc/
> src/ksp/pc/impls/gamg/gamg.c
> > [0]PETSC ERROR: [0] PCSetUp line 930 /usr/dataC/home/valera/petsc/
> src/ksp/pc/interface/precon.c
> > [0]PETSC ERROR: [0] KSPSetUp line 305 /usr/dataC/home/valera/petsc/
> src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> >
> > when i use -pc_type hypre it actually shows something different on
> -ksp_view :
> >
> > KSP Object: 2 MPI processes
> > type: gcr
> > GCR: restart = 30
> > GCR: restarts performed = 37
> > maximum iterations=10000, initial guess is zero
> > tolerances: relative=1e-14, absolute=1e-50, divergence=10000.
> > right preconditioning
> > using UNPRECONDITIONED norm type for convergence test
> > PC Object: 2 MPI processes
> > type: hypre
> > HYPRE BoomerAMG preconditioning
> > HYPRE BoomerAMG: Cycle type V
> > HYPRE BoomerAMG: Maximum number of levels 25
> > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
> > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
> > HYPRE BoomerAMG: Threshold for strong coupling 0.25
> > HYPRE BoomerAMG: Interpolation truncation factor 0.
> > HYPRE BoomerAMG: Interpolation: max elements per row 0
> > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
> > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
> > HYPRE BoomerAMG: Maximum row sums 0.9
> > HYPRE BoomerAMG: Sweeps down 1
> > HYPRE BoomerAMG: Sweeps up 1
> > HYPRE BoomerAMG: Sweeps on coarse 1
> > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
> > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
> > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
> > HYPRE BoomerAMG: Relax weight (all) 1.
> > HYPRE BoomerAMG: Outer relax weight (all) 1.
> > HYPRE BoomerAMG: Using CF-relaxation
> > HYPRE BoomerAMG: Not using more complex smoothers.
> > HYPRE BoomerAMG: Measure type local
> > HYPRE BoomerAMG: Coarsen type Falgout
> > HYPRE BoomerAMG: Interpolation type classical
> > HYPRE BoomerAMG: Using nodal coarsening (with
> HYPRE_BOOMERAMGSetNodal() 1
> > HYPRE BoomerAMG: HYPRE_BoomerAMGSetInterpVecVariant() 1
> > linear system matrix = precond matrix:
> > Mat Object: 2 MPI processes
> > type: mpiaij
> > rows=200000, cols=200000
> > total: nonzeros=3373340, allocated nonzeros=3373340
> > total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> >
> >
> > but still the timing is terrible.
> >
> >
> >
> >
> > On Sat, Jan 7, 2017 at 5:28 PM, Jed Brown <jed at jedbrown.org> wrote:
> > Manuel Valera <mvalera at mail.sdsu.edu> writes:
> >
> > > Awesome Matt and Jed,
> > >
> > > The GCR is used because the matrix is not invertible and because this
> was
> > > the algorithm that the previous library used,
> > >
> > > The Preconditioned im aiming to use is multigrid, i thought i
> configured
> > > the hypre-boomerAmg solver for this, but i agree in that it doesn't
> show in
> > > the log anywhere, how can i be sure is being used ? i sent -ksp_view
> log
> > > before in this thread
> >
> > Did you run with -pc_type hypre?
> >
> > > I had a problem with the matrix block sizes so i couldn't make the
> petsc
> > > native multigrid solver to work,
> >
> > What block sizes? If the only variable is pressure, the block size
> > would be 1 (default).
> >
> > > This is a nonhidrostatic pressure solver, it is an elliptic problem so
> > > multigrid is a must,
> >
> > Yes, multigrid should work well.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170108/d7144614/attachment-0001.html>
-------------- next part --------------
WARNING: -log_summary is being deprecated; switch to -log_view
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ucmsMR on a arch-linux2-c-debug named ocean with 1 processor, by valera Sun Jan 8 14:24:49 2017
Using Petsc Release Version 3.7.4, unknown
Max Max/Min Avg Total
Time (sec): 3.386e+01 1.00000 3.386e+01
Objects: 8.100e+01 1.00000 8.100e+01
Flops: 3.820e+10 1.00000 3.820e+10 3.820e+10
Flops/sec: 1.128e+09 1.00000 1.128e+09 1.128e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.3859e+01 100.0% 3.8199e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDotNorm2 1462 1.0 8.0159e-01 1.0 1.17e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 1459
VecMDot 1411 1.0 2.3061e+00 1.0 8.38e+09 1.0 0.0e+00 0.0e+00 0.0e+00 7 22 0 0 0 7 22 0 0 0 3633
VecNorm 1337 1.0 2.3786e-01 1.0 5.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2248
VecScale 2924 1.0 3.3265e-01 1.0 5.85e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1758
VecSet 1538 1.0 2.1733e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 2924 1.0 5.2265e-01 1.0 1.17e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 2238
VecAYPX 5 1.0 1.2798e-03 1.0 1.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 781
VecMAXPY 2822 1.0 4.9800e+00 1.0 1.68e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 44 0 0 0 15 44 0 0 0 3365
VecAssemblyBegin 7 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 7 1.0 7.1526e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 5 1.0 1.6944e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 1467 1.0 4.9397e+00 1.0 9.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 25 0 0 0 15 25 0 0 0 1944
MatConvert 2 1.0 6.1032e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 3 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 3 1.0 1.5736e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 2.3842e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLoad 1 1.0 6.9497e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 1.2220e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 1 1.0 5.7235e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 5 1.0 2.6259e+01 1.0 3.82e+10 1.0 0.0e+00 0.0e+00 0.0e+00 78100 0 0 0 78100 0 0 0 1455
PCSetUp 2 1.0 3.8668e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
PCApply 1463 1.0 1.2196e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 36 0 0 0 0 36 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 69 6 9609168 0.
Vector Scatter 1 1 656 0.
Matrix 3 1 803112 0.
Matrix Null Space 1 1 592 0.
Viewer 3 1 816 0.
Krylov Solver 1 0 0 0.
Preconditioner 2 1 1384 0.
Index Set 1 1 776 0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
#PETSc Option Table entries:
-log_summary
-matload_block_size 1
-pc_hypre_boomeramg_nodal_coarsen 1
-pc_hypre_boomeramg_vec_interp_variant 1
-pc_type hypre
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5 --download-netcdf --download-hypre --download-metis --download-parmetis --download-trillinos --with-debugging=no
-----------------------------------------
Libraries compiled on Sun Jan 8 14:06:45 2017 on ocean
Machine characteristics: Linux-3.10.0-327.36.3.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------
Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include -I/home/valera/petsc/include -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include
-----------------------------------------
Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack -lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl
-----------------------------------------
-------------- next part --------------
WARNING: -log_summary is being deprecated; switch to -log_view
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ucmsMR on a arch-linux2-c-debug named ocean with 1 processor, by valera Sun Jan 8 14:33:19 2017
Using Petsc Release Version 3.7.4, unknown
Max Max/Min Avg Total
Time (sec): 9.016e+00 1.00000 9.016e+00
Objects: 8.300e+01 1.00000 8.300e+01
Flops: 5.021e+09 1.00000 5.021e+09 5.021e+09
Flops/sec: 5.569e+08 1.00000 5.569e+08 5.569e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 9.0160e+00 100.0% 5.0209e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDotNorm2 155 1.0 8.4427e-02 1.0 1.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1469
VecMDot 145 1.0 2.3651e-01 1.0 8.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 17 0 0 0 3 17 0 0 0 3679
VecNorm 30 1.0 5.3427e-03 1.0 1.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2246
VecScale 310 1.0 3.4916e-02 1.0 6.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1776
VecSet 74 1.0 4.6141e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 310 1.0 5.4786e-02 1.0 1.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 2263
VecAYPX 5 1.0 1.1373e-03 1.0 1.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 879
VecMAXPY 290 1.0 5.1323e-01 1.0 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 35 0 0 0 6 35 0 0 0 3390
VecAssemblyBegin 7 1.0 2.3842e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 7 1.0 7.1526e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 5 1.0 1.0016e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 160 1.0 5.3934e-01 1.0 1.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 21 0 0 0 6 21 0 0 0 1942
MatSolve 155 1.0 6.4364e-01 1.0 1.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 7 20 0 0 0 7 20 0 0 0 1577
MatLUFactorNum 1 1.0 3.0440e-02 1.0 2.57e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 843
MatILUFactorSym 1 1.0 1.4438e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 1 1.0 1.7564e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 3 1.0 7.1526e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 3 1.0 1.3664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 1.4305e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.2245e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLoad 1 1.0 6.2274e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatView 1 1.0 1.1229e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
KSPSetUp 1 1.0 4.2639e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 5 1.0 2.1151e+00 1.0 5.00e+09 1.0 0.0e+00 0.0e+00 0.0e+00 23 99 0 0 0 23 99 0 0 0 2362
PCSetUp 2 1.0 2.8751e-01 1.0 2.57e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 3 1 0 0 0 89
PCApply 156 1.0 6.9180e-01 1.0 1.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 8 20 0 0 0 8 20 0 0 0 1467
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 67 4 6406112 0.
Vector Scatter 1 1 656 0.
Matrix 4 1 803112 0.
Matrix Null Space 1 1 592 0.
Viewer 3 1 816 0.
Krylov Solver 1 0 0 0.
Preconditioner 2 1 1384 0.
Index Set 4 1 776 0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
#PETSc Option Table entries:
-log_summary
-matload_block_size 1
-pc_hypre_boomeramg_nodal_coarsen 1
-pc_hypre_boomeramg_vec_interp_variant 1
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5 --download-netcdf --download-hypre --download-metis --download-parmetis --download-trillinos --with-debugging=no
-----------------------------------------
Libraries compiled on Sun Jan 8 14:06:45 2017 on ocean
Machine characteristics: Linux-3.10.0-327.36.3.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------
Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include -I/home/valera/petsc/include -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include
-----------------------------------------
Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack -lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl
-----------------------------------------
-------------- next part --------------
WARNING: -log_summary is being deprecated; switch to -log_view
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ucmsMR on a arch-linux2-c-debug named ocean with 2 processors, by valera Sun Jan 8 14:27:52 2017
Using Petsc Release Version 3.7.4, unknown
Max Max/Min Avg Total
Time (sec): 2.558e+01 1.01638 2.537e+01
Objects: 8.700e+01 1.00000 8.700e+01
Flops: 2.296e+10 1.00000 2.296e+10 4.592e+10
Flops/sec: 9.123e+08 1.01638 9.050e+08 1.810e+09
MPI Messages: 1.768e+03 1.00000 1.768e+03 3.535e+03
MPI Message Lengths: 4.961e+07 1.00000 2.807e+04 9.922e+07
MPI Reductions: 5.153e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.5372e+01 100.0% 4.5918e+10 100.0% 3.535e+03 100.0% 2.807e+04 100.0% 5.152e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDotNorm2 1759 1.0 7.5034e-01 1.1 7.04e+08 1.0 0.0e+00 0.0e+00 1.8e+03 3 3 0 0 34 3 3 0 0 34 1875
VecMDot 1698 1.0 2.1292e+00 1.2 5.03e+09 1.0 0.0e+00 0.0e+00 1.7e+03 8 22 0 0 33 8 22 0 0 33 4727
VecNorm 1634 1.0 2.0453e-01 1.1 3.27e+08 1.0 0.0e+00 0.0e+00 1.6e+03 1 1 0 0 32 1 1 0 0 32 3196
VecScale 3518 1.0 2.1820e-01 1.0 3.52e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 3225
VecSet 1769 1.0 1.5442e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 3518 1.0 3.4378e-01 1.0 7.04e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 4093
VecAYPX 5 1.0 1.2600e-03 2.1 5.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 794
VecMAXPY 3396 1.0 3.4249e+00 1.0 1.01e+10 1.0 0.0e+00 0.0e+00 0.0e+00 13 44 0 0 0 13 44 0 0 0 5878
VecAssemblyBegin 12 1.7 2.9197e-03 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 12 1.7 1.1683e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 1774 1.0 2.3620e+0037.3 0.00e+00 0.0 3.5e+03 2.2e+04 1.0e+01 5 0100 79 0 5 0100 79 0 0
VecScatterEnd 1764 1.0 8.7893e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 1764 1.0 3.4805e+00 1.0 5.77e+09 1.0 3.5e+03 2.0e+04 0.0e+00 14 25100 71 0 14 25100 71 0 3318
MatConvert 2 1.0 2.9602e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 6.1677e-02384.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 8.9786e-03 1.0 0.00e+00 0.0 4.0e+00 5.0e+03 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 4 1.0 2.6226e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLoad 1 1.0 1.7685e-01 1.0 0.00e+00 0.0 7.0e+00 3.0e+06 1.3e+01 1 0 0 21 0 1 0 0 21 0 0
KSPSetUp 1 1.0 1.9739e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 5 1.0 1.8353e+01 1.0 2.30e+10 1.0 3.5e+03 2.0e+04 5.1e+03 72100100 71 99 72100100 71 99 2502
PCSetUp 2 1.0 6.5126e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 3 0 0 0 0 3 0 0 0 0 0
PCApply 1760 1.0 8.2935e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 32 0 0 0 0 32 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 72 8 7212944 0.
Vector Scatter 3 2 1312 0.
Matrix 3 0 0 0.
Viewer 2 0 0 0.
Index Set 4 4 13104 0.
Krylov Solver 1 0 0 0.
Preconditioner 2 1 1384 0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
Average time for MPI_Barrier(): 1.99318e-05
Average time for zero size MPI_Send(): 9.41753e-06
#PETSc Option Table entries:
-log_summary
-matload_block_size 1
-pc_hypre_boomeramg_nodal_coarsen 1
-pc_hypre_boomeramg_vec_interp_variant 1
-pc_type hypre
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5 --download-netcdf --download-hypre --download-metis --download-parmetis --download-trillinos --with-debugging=no
-----------------------------------------
Libraries compiled on Sun Jan 8 14:06:45 2017 on ocean
Machine characteristics: Linux-3.10.0-327.36.3.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------
Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include -I/home/valera/petsc/include -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include
-----------------------------------------
Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack -lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl
-------------- next part --------------
WARNING: -log_summary is being deprecated; switch to -log_view
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ucmsMR on a arch-linux2-c-debug named ocean with 2 processors, by valera Sun Jan 8 14:32:12 2017
Using Petsc Release Version 3.7.4, unknown
Max Max/Min Avg Total
Time (sec): 1.241e+01 1.03508 1.220e+01
Objects: 9.300e+01 1.00000 9.300e+01
Flops: 8.662e+09 1.00000 8.662e+09 1.732e+10
Flops/sec: 7.222e+08 1.03508 7.100e+08 1.420e+09
MPI Messages: 5.535e+02 1.00000 5.535e+02 1.107e+03
MPI Message Lengths: 2.533e+07 1.00000 4.576e+04 5.066e+07
MPI Reductions: 1.548e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.2204e+01 100.0% 1.7325e+10 100.0% 1.107e+03 100.0% 4.576e+04 100.0% 1.547e+03 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDotNorm2 545 1.0 4.1106e-01 1.4 2.18e+08 1.0 0.0e+00 0.0e+00 5.4e+02 3 3 0 0 35 3 3 0 0 35 1061
VecMDot 525 1.0 9.6894e-01 1.5 1.48e+09 1.0 0.0e+00 0.0e+00 5.2e+02 7 17 0 0 34 7 17 0 0 34 3061
VecNorm 420 1.0 8.5726e-02 1.4 8.40e+07 1.0 0.0e+00 0.0e+00 4.2e+02 1 1 0 0 27 1 1 0 0 27 1960
VecScale 1090 1.0 8.5441e-02 1.0 1.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2551
VecSet 555 1.0 5.4937e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1090 1.0 1.2735e-01 1.1 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 3424
VecAYPX 5 1.0 1.4422e-03 2.4 5.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 693
VecMAXPY 1050 1.0 1.1876e+00 1.1 2.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00 9 34 0 0 0 9 34 0 0 0 4994
VecAssemblyBegin 12 1.7 3.0236e-03 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.1e+01 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 12 1.7 1.1206e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 560 1.0 2.3305e+0099.8 0.00e+00 0.0 1.1e+03 2.7e+04 1.0e+01 10 0 99 59 1 10 0 99 59 1 0
VecScatterEnd 550 1.0 5.8907e-02 5.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 550 1.0 1.2700e+00 1.1 1.80e+09 1.0 1.1e+03 2.0e+04 0.0e+00 10 21 99 43 0 10 21 99 43 0 2835
MatSolve 545 1.0 1.4914e+00 1.2 1.77e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 20 0 0 0 11 20 0 0 0 2375
MatLUFactorNum 1 1.0 3.4873e-02 2.3 1.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 728
MatILUFactorSym 1 1.0 1.3488e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 1 1.0 1.7024e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 6.1171e-02344.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 9.1953e-03 1.0 0.00e+00 0.0 4.0e+00 5.0e+03 8.0e+00 0 0 0 0 1 0 0 0 0 1 0
MatGetRowIJ 3 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.6165e-03 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLoad 1 1.0 1.7585e-01 1.0 0.00e+00 0.0 7.0e+00 3.0e+06 1.3e+01 1 0 1 41 1 1 0 1 41 1 0
KSPSetUp 2 1.0 1.9348e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 5 1.0 5.2705e+00 1.0 8.66e+09 1.0 1.1e+03 2.0e+04 1.5e+03 43100 99 43 96 43100 99 43 96 3287
PCSetUp 3 1.0 6.7521e-01 1.0 1.27e+07 1.0 0.0e+00 0.0e+00 4.0e+00 5 0 0 0 0 5 0 0 0 0 38
PCSetUpOnBlocks 5 1.0 5.0066e-02 2.2 1.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 507
PCApply 546 1.0 1.5895e+00 1.2 1.77e+09 1.0 0.0e+00 0.0e+00 0.0e+00 12 20 0 0 0 12 20 0 0 0 2229
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 72 6 5609648 0.
Vector Scatter 3 2 1312 0.
Matrix 4 0 0 0.
Viewer 2 0 0 0.
Index Set 7 4 13104 0.
Krylov Solver 2 0 0 0.
Preconditioner 3 1 1384 0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
Average time for MPI_Barrier(): 1.95503e-05
Average time for zero size MPI_Send(): 1.03712e-05
#PETSc Option Table entries:
-log_summary
-matload_block_size 1
-pc_hypre_boomeramg_nodal_coarsen 1
-pc_hypre_boomeramg_vec_interp_variant 1
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich --download-hdf5 --download-netcdf --download-hypre --download-metis --download-parmetis --download-trillinos --with-debugging=no
-----------------------------------------
Libraries compiled on Sun Jan 8 14:06:45 2017 on ocean
Machine characteristics: Linux-3.10.0-327.36.3.el7.x86_64-x86_64-with-centos-7.2.1511-Core
Using PETSc directory: /home/valera/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------
Using C compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/valera/petsc/arch-linux2-c-debug/include -I/home/valera/petsc/include -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include
-----------------------------------------
Using C linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/valera/petsc/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -lparmetis -lmetis -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lmpicxx -lstdc++ -lflapack -lfblas -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lpthread -lm -lmpifort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -L/home/valera/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -ldl -Wl,-rpath,/home/valera/petsc/arch-linux2-c-debug/lib -lmpi -lgcc_s -ldl
-----------------------------------------
-------------- next part --------------
[valera at ocean petsc]$ make stream NPMAX=4
cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory PETSC_DIR=/home/valera/petsc PETSC_ARCH=arch-linux2-c-debug stream
/home/valera/petsc/arch-linux2-c-debug/bin/mpicc -o MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include `pwd`/MPIVersion.c
Running streams with '/home/valera/petsc/arch-linux2-c-debug/bin/mpiexec ' using 'NPMAX=4'
Number of MPI processes 1 Processor names ocean
Triad: 5998.4683 Rate (MB/s)
Number of MPI processes 2 Processor names ocean ocean
Triad: 23010.7259 Rate (MB/s)
Number of MPI processes 3 Processor names ocean ocean ocean
Triad: 6295.2156 Rate (MB/s)
Number of MPI processes 4 Processor names ocean ocean ocean ocean
Triad: 7019.8170 Rate (MB/s)
------------------------------------------------
np speedup
1 1.0
2 3.84
3 1.05
4 1.17
Estimation of possible speedup of MPI programs based on Streams benchmark.
It appears you have 1 node(s)
Unable to open matplotlib to plot speedup
-------------- next part --------------
[valera at ocean petsc]$ make PETSC_DIR=/home/valera/petsc PETSC_ARCH=arch-linux2-c-debug streams
cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory PETSC_DIR=/home/valera/petsc PETSC_ARCH=arch-linux2-c-debug streams
/home/valera/petsc/arch-linux2-c-debug/bin/mpicc -o MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O -I/home/valera/petsc/include -I/home/valera/petsc/arch-linux2-c-debug/include `pwd`/MPIVersion.c
Running streams with '/home/valera/petsc/arch-linux2-c-debug/bin/mpiexec ' using 'NPMAX=32'
Number of MPI processes 1 Processor names ocean
Triad: 11830.2146 Rate (MB/s)
Number of MPI processes 2 Processor names ocean ocean
Triad: 23111.7734 Rate (MB/s)
Number of MPI processes 3 Processor names ocean ocean ocean
Triad: 6692.7679 Rate (MB/s)
Number of MPI processes 4 Processor names ocean ocean ocean ocean
Triad: 7043.7175 Rate (MB/s)
Number of MPI processes 5 Processor names ocean ocean ocean ocean ocean
Triad: 33053.3434 Rate (MB/s)
Number of MPI processes 6 Processor names ocean ocean ocean ocean ocean ocean
Triad: 33129.8788 Rate (MB/s)
Number of MPI processes 7 Processor names ocean ocean ocean ocean ocean ocean ocean
Triad: 32379.8370 Rate (MB/s)
Number of MPI processes 8 Processor names ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 31644.3971 Rate (MB/s)
Number of MPI processes 9 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 30214.8803 Rate (MB/s)
Number of MPI processes 10 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 31700.6859 Rate (MB/s)
Number of MPI processes 11 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 32369.1251 Rate (MB/s)
Number of MPI processes 12 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 7677.4204 Rate (MB/s)
Number of MPI processes 13 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 33298.0308 Rate (MB/s)
Number of MPI processes 14 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 33220.6717 Rate (MB/s)
Number of MPI processes 15 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 7334.5064 Rate (MB/s)
Number of MPI processes 16 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 7463.6337 Rate (MB/s)
Number of MPI processes 17 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 14108.2617 Rate (MB/s)
Number of MPI processes 18 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 29450.3077 Rate (MB/s)
Number of MPI processes 19 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 8997.3655 Rate (MB/s)
Number of MPI processes 20 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 9334.5314 Rate (MB/s)
Number of MPI processes 21 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 30470.9315 Rate (MB/s)
Number of MPI processes 22 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 19822.1616 Rate (MB/s)
Number of MPI processes 23 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 32290.0731 Rate (MB/s)
Number of MPI processes 24 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 11822.5303 Rate (MB/s)
Number of MPI processes 25 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 11310.1869 Rate (MB/s)
Number of MPI processes 26 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 33472.8900 Rate (MB/s)
Number of MPI processes 27 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 30328.8841 Rate (MB/s)
Number of MPI processes 28 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 31779.9057 Rate (MB/s)
Number of MPI processes 29 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 28275.6109 Rate (MB/s)
Number of MPI processes 30 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 16085.2815 Rate (MB/s)
Number of MPI processes 31 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 13092.9246 Rate (MB/s)
Number of MPI processes 32 Processor names ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean ocean
Triad: 18967.6909 Rate (MB/s)
------------------------------------------------
np speedup
1 1.0
2 1.95
3 0.57
4 0.6
5 2.79
6 2.8
7 2.74
8 2.67
9 2.55
10 2.68
11 2.74
12 0.65
13 2.81
14 2.81
15 0.62
16 0.63
17 1.19
18 2.49
19 0.76
20 0.79
21 2.58
22 1.68
23 2.73
24 1.0
25 0.96
26 2.83
27 2.56
28 2.69
29 2.39
30 1.36
31 1.11
32 1.6
Estimation of possible speedup of MPI programs based on Streams benchmark.
It appears you have 1 node(s)
Unable to open matplotlib to plot speedup
More information about the petsc-users
mailing list