[petsc-users] Poor weak scaling when solving successive linear systems
Michael Becker
Michael.Becker at physik.uni-giessen.de
Thu May 24 00:24:18 CDT 2018
Hello,
I added a PETSc solver class to our particle-in-cell simulation code and
all calculations seem to be correct. However, some weak scaling tests I
did are rather disappointing because the solver's runtime keeps
increasing with system size although the number of cores are scaled up
accordingly. As a result, the solver's share of the total runtime
becomes more and more dominant and the system sizes we aim for are
unfeasible.
It's a simple 3D Poisson problem on a structured grid with Dirichlet
boundaries inside the domain, for which I found the cg/gamg combo to
work the fastest. Since KSPsolve() is called during every timestep of
the simulation to solve the same system with a new rhs vector,
assembling the matrix and other PETSc objects should further not be a
determining factor.
What puzzles me is that the convergence rate is actually good (the
residual decreases by an order of magnitude for every KSP iteration) and
the number of KSP iterations remains constant over the course of a
simulation and is equal for all tested systems.
I even increased the (fixed) system size per processor to 30^3 unknowns
(which is significantly more than the recommended 10,000), but runtime
is still not even close to being constant.
This leads me to the conclusion that either I configured PETSc wrong, I
don't call the correct PETSc-related functions, or something goes
terribly wrong with communication.
Could you have a look at the attached log_view files and tell me if
something is particularly odd? The system size per processor is 30^3 and
the simulation ran over 1000 timesteps, which means KSPsolve() was
called equally often. I introduced two new logging states - one for the
first solve and the final setup and one for the remaining solves.
The repeatedly called code segment is
PetscScalar *b_array;
VecGetArray(b, &b_array);
get_b(b_array);
VecRestoreArray(b, &barray);
KSPSetTolerances(ksp,reltol,1E-50,1E5,1E4);
PetscScalar *x_array;
VecGetArray(x, &x_array);
for (int i = 0; i < N_local; i++)
x_array[i] = x_array_prev[i];
VecRestoreArray(x, &x_array);
KSPSolve(ksp,b,x);
KSPGetSolution(ksp,&x);
for (int i = 0; i < N_local; i++)
x_array_prev[i] = x_array[i];
set_x(x_array);
I noticed that for every individual KSP iteration, six vector objects
are created and destroyed (with CG, more with e.g. GMRES). This seems
kind of wasteful, is this supposed to be like this? Is this even the
reason for my problems? Apart from that, everything seems quite normal
to me (but I'm not the expert here).
Thanks in advance.
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180524/8a6eb9bf/attachment-0001.html>
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/home/ritsat/beckerm/ppp_test/plasmapic on a arch-linux-amd-opt named node2-007 with 125 processors, by beckerm Wed May 23 15:15:54 2018
Using Petsc Release Version 3.9.1, unknown
Max Max/Min Avg Total
Time (sec): 2.567e+02 1.00000 2.567e+02
Objects: 2.438e+04 1.00004 2.438e+04
Flop: 2.125e+10 1.27708 1.963e+10 2.454e+12
Flop/sec: 8.278e+07 1.27708 7.648e+07 9.560e+09
MPI Messages: 1.042e+06 3.36140 7.129e+05 8.911e+07
MPI Message Lengths: 1.344e+09 2.32209 1.439e+03 1.282e+11
MPI Reductions: 2.250e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.9829e+00 2.7% 0.0000e+00 0.0% 3.000e+03 0.0% 3.178e+03 0.0% 1.700e+01 0.1%
1: First Solve: 2.7562e+00 1.1% 3.6885e+09 0.2% 3.549e+05 0.4% 3.736e+03 1.0% 5.500e+02 2.4%
2: Remaining Solves: 2.4695e+02 96.2% 2.4504e+12 99.8% 8.875e+07 99.6% 1.430e+03 99.0% 2.192e+04 97.4%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 3 1.0 5.6386e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 1.7128e-02 2.1 0.00e+00 0.0 8.8e+03 4.0e+00 0.0e+00 0 0 0 0 0 1 0 2 0 0 0
BuildTwoSidedF 30 1.0 3.3218e-01 3.8 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00 0 0 0 0 0 7 0 2 5 0 0
KSPSetUp 9 1.0 3.9077e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 2.7586e+00 1.0 3.26e+07 1.4 3.5e+05 3.7e+03 5.5e+02 1 0 0 1 2 100100100100100 1337
VecTDot 8 1.0 1.9397e-02 3.6 4.32e+05 1.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 1 0 0 1 2784
VecNorm 6 1.0 6.3949e-03 1.6 3.24e+05 1.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 1 0 0 1 6333
VecScale 24 1.0 1.2732e-04 2.1 5.43e+04 2.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 40434
VecCopy 1 1.0 1.5807e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 115 1.0 8.6141e-04 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 8 1.0 1.4498e-03 2.4 4.32e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 37246
VecAYPX 28 1.0 1.3914e-03 2.2 3.58e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 31519
VecAssemblyBegin 2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 103 1.0 6.5608e-03 3.1 0.00e+00 0.0 8.9e+04 1.4e+03 0.0e+00 0 0 0 0 0 0 0 25 9 0 0
VecScatterEnd 103 1.0 6.5023e-02 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatMult 29 1.0 4.5694e-02 1.8 6.14e+06 1.2 3.0e+04 2.1e+03 0.0e+00 0 0 0 0 0 1 19 8 5 0 15687
MatMultAdd 24 1.0 2.1485e-02 3.6 1.37e+06 1.6 1.6e+04 6.5e+02 0.0e+00 0 0 0 0 0 1 4 5 1 0 7032
MatMultTranspose 24 1.0 1.6713e-02 2.6 1.37e+06 1.6 1.6e+04 6.5e+02 0.0e+00 0 0 0 0 0 0 4 5 1 0 9040
MatSolve 4 0.0 2.2173e-05 0.0 2.64e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 12
MatSOR 48 1.0 8.0235e-02 1.8 1.09e+07 1.3 2.7e+04 1.5e+03 8.0e+00 0 0 0 0 0 3 34 8 3 1 15626
MatLUFactorSym 1 1.0 5.4121e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 1.4782e-05 5.2 1.29e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9
MatResidual 24 1.0 3.5083e-02 2.0 4.55e+06 1.3 2.7e+04 1.5e+03 0.0e+00 0 0 0 0 0 1 14 8 3 0 14794
MatAssemblyBegin 94 1.0 3.3449e-01 3.5 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00 0 0 0 0 0 7 0 2 5 0 0
MatAssemblyEnd 94 1.0 1.4880e-01 1.1 0.00e+00 0.0 6.3e+04 2.1e+02 2.3e+02 0 0 0 0 1 5 0 18 1 42 0
MatGetRow 3102093 1.3 4.6411e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 15 0 0 0 0 0
MatGetRowIJ 1 0.0 7.8678e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 4.4495e-01 2.1 0.00e+00 0.0 5.5e+04 1.7e+04 1.2e+01 0 0 0 1 0 12 0 15 71 2 0
MatCreateSubMat 4 1.0 3.7476e-02 1.0 0.00e+00 0.0 2.9e+03 2.7e+02 6.4e+01 0 0 0 0 0 1 0 1 0 12 0
MatGetOrdering 1 0.0 1.3685e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 6.0548e-02 1.2 0.00e+00 0.0 2.7e+04 1.0e+03 1.2e+01 0 0 0 0 0 2 0 8 2 2 0
MatCoarsen 6 1.0 3.5690e-02 1.1 0.00e+00 0.0 5.3e+04 5.8e+02 3.3e+01 0 0 0 0 0 1 0 15 2 6 0
MatZeroEntries 6 1.0 3.4430e-03 7.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 2.8406e-01 1.0 1.13e+07 1.6 6.3e+04 2.6e+03 9.2e+01 0 0 0 0 0 10 33 18 13 17 4307
MatPtAPSymbolic 6 1.0 1.6401e-01 1.0 0.00e+00 0.0 3.4e+04 2.7e+03 4.2e+01 0 0 0 0 0 6 0 10 7 8 0
MatPtAPNumeric 6 1.0 1.2070e-01 1.0 1.13e+07 1.6 2.9e+04 2.6e+03 4.8e+01 0 0 0 0 0 4 33 8 6 9 10136
MatGetLocalMat 6 1.0 4.4053e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 1.0330e-02 1.9 0.00e+00 0.0 2.0e+04 3.5e+03 0.0e+00 0 0 0 0 0 0 0 6 5 0 0
SFSetGraph 12 1.0 1.5497e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 2.5882e-02 1.5 0.00e+00 0.0 2.6e+04 6.2e+02 0.0e+00 0 0 0 0 0 1 0 7 1 0 0
SFBcastBegin 45 1.0 2.1088e-03 2.5 0.00e+00 0.0 5.4e+04 6.9e+02 0.0e+00 0 0 0 0 0 0 0 15 3 0 0
SFBcastEnd 45 1.0 2.0310e-02 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 2.2022e+00 1.0 0.00e+00 0.0 2.0e+05 5.2e+03 2.8e+02 1 0 0 1 1 80 0 56 78 52 0
GAMG: partLevel 6 1.0 3.2547e-01 1.0 1.13e+07 1.6 6.6e+04 2.5e+03 1.9e+02 0 0 0 0 1 12 33 19 13 35 3759
repartition 2 1.0 1.2660e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Invert-Sort 2 1.0 9.5701e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0
Move A 2 1.0 2.2409e-02 1.0 0.00e+00 0.0 1.4e+03 5.3e+02 3.4e+01 0 0 0 0 0 1 0 0 0 6 0
Move P 2 1.0 1.6271e-02 1.0 0.00e+00 0.0 1.4e+03 1.3e+01 3.4e+01 0 0 0 0 0 1 0 0 0 6 0
PCSetUp 2 1.0 2.5381e+00 1.0 1.13e+07 1.6 2.7e+05 4.5e+03 5.1e+02 1 0 0 1 2 92 33 75 90 93 482
PCSetUpOnBlocks 4 1.0 3.4523e-04 1.9 1.29e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 4 1.0 1.2703e-01 1.1 1.82e+07 1.3 8.6e+04 1.2e+03 8.0e+00 0 0 0 0 0 4 56 24 8 1 16335
--- Event Stage 2: Remaining Solves
KSPSolve 999 1.0 1.2762e+02 1.0 2.12e+10 1.3 8.8e+07 1.4e+03 2.2e+04 48100 99 97 97 50100 99 98100 19200
VecTDot 7968 1.0 1.0869e+01 6.1 4.30e+08 1.0 0.0e+00 0.0e+00 8.0e+03 2 2 0 0 35 2 2 0 0 36 4948
VecNorm 5982 1.0 4.5561e+00 3.6 3.23e+08 1.0 0.0e+00 0.0e+00 6.0e+03 1 2 0 0 27 1 2 0 0 27 8863
VecScale 23904 1.0 1.1319e-01 2.2 5.40e+07 2.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 45298
VecCopy 999 1.0 1.6182e-01 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 83664 1.0 8.0856e-01 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 7968 1.0 1.3577e+00 2.3 4.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 39613
VecAYPX 27888 1.0 1.3048e+00 2.2 3.56e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 33468
VecScatterBegin 100599 1.0 6.5181e+00 3.4 0.00e+00 0.0 8.8e+07 1.4e+03 0.0e+00 2 0 99 97 0 2 0 99 98 0 0
VecScatterEnd 100599 1.0 5.5370e+01 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0
MatMult 28887 1.0 4.2860e+01 1.8 6.12e+09 1.2 3.0e+07 2.1e+03 0.0e+00 11 29 33 49 0 12 29 33 49 0 16661
MatMultAdd 23904 1.0 1.4803e+01 2.6 1.37e+09 1.6 1.6e+07 6.5e+02 0.0e+00 4 6 18 8 0 4 6 18 8 0 10166
MatMultTranspose 23904 1.0 1.5364e+01 2.4 1.37e+09 1.6 1.6e+07 6.5e+02 0.0e+00 4 6 18 8 0 4 6 18 8 0 9795
MatSolve 3984 0.0 1.9884e-02 0.0 2.63e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 13
MatSOR 47808 1.0 6.8888e+01 1.7 1.08e+10 1.3 2.7e+07 1.5e+03 8.0e+03 25 51 30 32 35 26 51 30 32 36 18054
MatResidual 23904 1.0 3.1872e+01 1.9 4.54e+09 1.3 2.7e+07 1.5e+03 0.0e+00 8 21 30 32 0 8 21 30 32 0 16219
PCSetUpOnBlocks 3984 1.0 4.9551e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3984 1.0 1.0819e+02 1.1 1.81e+10 1.3 8.5e+07 1.2e+03 8.0e+03 42 84 96 80 35 43 84 96 81 36 19056
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11424 0.
DMKSP interface 1 0 0 0.
Vector 5 52 2371496 0.
Matrix 0 72 14138216 0.
Distributed Mesh 1 0 0 0.
Index Set 2 12 133768 0.
IS L to G Mapping 1 0 0 0.
Star Forest Graph 2 0 0 0.
Discrete System 1 0 0 0.
Vec Scatter 1 13 16016 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 140 92 2204792 0.
Matrix 140 68 21738552 0.
Matrix Coarsen 6 6 3816 0.
Index Set 110 100 543240 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 31 18 22176 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 23904 23904 1295501184 0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 2.26021e-05
Average time for zero size MPI_Send(): 1.52473e-05
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_type cg
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-has-attribute-aligned=1 PETSC_ARCH=arch-linux-amd-opt --download-f2cblaslapack --with-mpi-dir=/cm/shared/apps/mvapich2/intel-17.0.1/2.0 --download-hypre --download-ml --with-fc=0 --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 --with-batch --with-x --known-mpi-shared-libraries=1 --known-64-bit-blas-indices=4
-----------------------------------------
Libraries compiled on 2018-05-03 16:11:18 on node52-021
Machine characteristics: Linux-2.6.32-696.18.7.el6.x86_64-x86_64-with-redhat-6.6-Carbon
Using PETSc directory: /home/ritsat/beckerm/petsc
Using PETSc arch: arch-linux-amd-opt
-----------------------------------------
Using C compiler: /cm/shared/apps/mvapich2/intel-17.0.1/2.0/bin/mpicc -fPIC -wd1572 -O3
-----------------------------------------
Using include paths: -I/home/ritsat/beckerm/petsc/include -I/home/ritsat/beckerm/petsc/arch-linux-amd-opt/include -I/cm/shared/apps/mvapich2/intel-17.0.1/2.0/include
-----------------------------------------
Using C linker: /cm/shared/apps/mvapich2/intel-17.0.1/2.0/bin/mpicc
Using libraries: -Wl,-rpath,/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -L/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -lpetsc -Wl,-rpath,/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -L/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -lHYPRE -lml -lf2clapack -lf2cblas -lX11 -ldl
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/home/ritsat/beckerm/ppp_test/plasmapic on a arch-linux-amd-opt named node1-017 with 1000 processors, by beckerm Wed May 23 23:30:46 2018
Using Petsc Release Version 3.9.1, unknown
Max Max/Min Avg Total
Time (sec): 2.915e+02 1.00000 2.915e+02
Objects: 2.127e+04 1.00005 2.127e+04
Flop: 1.922e+10 1.26227 1.851e+10 1.851e+13
Flop/sec: 6.595e+07 1.26227 6.349e+07 6.349e+10
MPI Messages: 1.075e+06 3.98874 7.375e+05 7.375e+08
MPI Message Lengths: 1.175e+09 2.32017 1.403e+03 1.034e+12
MPI Reductions: 1.199e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.5171e+01 8.6% 0.0000e+00 0.0% 2.700e+04 0.0% 3.178e+03 0.0% 1.700e+01 0.1%
1: First Solve: 3.3123e+00 1.1% 3.1911e+10 0.2% 3.675e+06 0.5% 3.508e+03 1.2% 6.090e+02 5.1%
2: Remaining Solves: 2.6301e+02 90.2% 1.8475e+13 99.8% 7.338e+08 99.5% 1.392e+03 98.7% 1.135e+04 94.7%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 3 1.0 4.4584e-04 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 2.2965e-02 1.3 0.00e+00 0.0 8.9e+04 4.0e+00 0.0e+00 0 0 0 0 0 1 0 2 0 0 0
BuildTwoSidedF 30 1.0 4.6278e-01 3.1 0.00e+00 0.0 6.5e+04 1.0e+04 0.0e+00 0 0 0 0 0 9 0 2 5 0 0
KSPSetUp 9 1.0 2.1761e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 1 0 0 0 2 0
KSPSolve 1 1.0 3.3169e+00 1.0 3.35e+07 1.4 3.7e+06 3.5e+03 6.1e+02 1 0 0 1 5 100100100100100 9621
VecDotNorm2 4 1.0 4.2942e-03 3.1 4.32e+05 1.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 1 0 0 1 100602
VecMDot 3 1.0 3.0557e-0222.9 3.24e+05 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 1 1 0 0 0 10603
VecNorm 6 1.0 2.8319e-03 2.3 3.24e+05 1.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 1 0 0 1 114409
VecScale 32 1.0 6.2370e-04 1.4 2.70e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 422668
VecSet 124 1.0 5.1708e-03 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 8 1.0 1.8489e-03 2.6 4.32e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 233648
VecAYPX 25 1.0 3.2370e-03 6.2 1.96e+05 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 59376
VecMAXPY 6 1.0 1.8075e-03 1.9 6.48e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 358516
VecAssemblyBegin 3 1.0 4.0531e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 3 1.0 1.0014e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 108 1.0 1.7932e-02 6.6 0.00e+00 0.0 8.4e+05 1.4e+03 0.0e+00 0 0 0 0 0 0 0 23 9 0 0
VecScatterEnd 108 1.0 8.1686e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
MatMult 29 1.0 6.4687e-02 2.6 6.14e+06 1.2 2.8e+05 2.0e+03 0.0e+00 0 0 0 0 0 1 19 8 4 0 91734
MatMultAdd 24 1.0 5.0761e-02 6.0 1.37e+06 1.6 1.5e+05 6.5e+02 0.0e+00 0 0 0 0 0 1 4 4 1 0 25391
MatMultTranspose 24 1.0 2.7226e-02 4.8 1.37e+06 1.6 1.5e+05 6.5e+02 0.0e+00 0 0 0 0 0 0 4 4 1 0 47340
MatSolve 4 0.0 4.7922e-05 0.0 1.10e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 229
MatSOR 48 1.0 9.3626e-02 1.9 1.09e+07 1.3 2.6e+05 1.5e+03 0.0e+00 0 0 0 0 0 2 33 7 3 0 111703
MatLUFactorSym 1 1.0 9.6083e-05 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 7.1049e-0537.2 3.29e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 463
MatResidual 24 1.0 4.1369e-02 2.3 4.55e+06 1.3 2.6e+05 1.5e+03 0.0e+00 0 0 0 0 0 1 14 7 3 0 105139
MatAssemblyBegin 102 1.0 4.6537e-01 2.9 0.00e+00 0.0 6.5e+04 1.0e+04 0.0e+00 0 0 0 0 0 9 0 2 5 0 0
MatAssemblyEnd 102 1.0 1.4218e-01 1.1 0.00e+00 0.0 6.2e+05 2.0e+02 2.5e+02 0 0 0 0 2 4 0 17 1 41 0
MatGetRow 3102093 1.3 5.4764e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 13 0 0 0 0 0
MatGetRowIJ 1 0.0 1.5974e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 4.6659e-01 2.1 0.00e+00 0.0 5.7e+05 1.6e+04 1.2e+01 0 0 0 1 0 10 0 15 72 2 0
MatCreateSubMat 6 1.0 2.8245e-02 1.0 0.00e+00 0.0 2.2e+04 3.3e+02 9.4e+01 0 0 0 0 1 1 0 1 0 15 0
MatGetOrdering 1 0.0 1.4687e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 1.1661e-01 1.1 0.00e+00 0.0 2.6e+05 9.9e+02 1.2e+01 0 0 0 0 0 3 0 7 2 2 0
MatCoarsen 6 1.0 5.6789e-02 1.0 0.00e+00 0.0 7.1e+05 4.4e+02 5.6e+01 0 0 0 0 0 2 0 19 2 9 0
MatZeroEntries 6 1.0 3.5298e-03 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 3.7699e-01 1.0 1.11e+07 1.6 6.3e+05 2.5e+03 9.2e+01 0 0 0 0 1 11 33 17 12 15 27514
MatPtAPSymbolic 6 1.0 2.2081e-01 1.0 0.00e+00 0.0 3.2e+05 2.7e+03 4.2e+01 0 0 0 0 0 7 0 9 7 7 0
MatPtAPNumeric 6 1.0 1.5378e-01 1.0 1.11e+07 1.6 3.0e+05 2.3e+03 4.8e+01 0 0 0 0 0 5 33 8 6 8 67450
MatGetLocalMat 6 1.0 4.8461e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 1.6591e-02 2.4 0.00e+00 0.0 1.9e+05 3.4e+03 0.0e+00 0 0 0 0 0 0 0 5 5 0 0
SFSetGraph 12 1.0 4.1962e-05 8.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 3.2055e-02 1.2 0.00e+00 0.0 2.7e+05 5.8e+02 0.0e+00 0 0 0 0 0 1 0 7 1 0 0
SFBcastBegin 68 1.0 2.7685e-03 2.8 0.00e+00 0.0 7.2e+05 5.1e+02 0.0e+00 0 0 0 0 0 0 0 20 3 0 0
SFBcastEnd 68 1.0 3.0165e-02 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 2.5855e+00 1.0 0.00e+00 0.0 2.2e+06 4.7e+03 3.1e+02 1 0 0 1 3 78 0 59 79 51 0
GAMG: partLevel 6 1.0 4.1722e-01 1.0 1.11e+07 1.6 6.5e+05 2.4e+03 2.4e+02 0 0 0 0 2 13 33 18 12 40 24861
repartition 3 1.0 3.8280e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
Invert-Sort 3 1.0 3.2971e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Move A 3 1.0 1.6580e-02 1.1 0.00e+00 0.0 9.5e+03 7.4e+02 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
Move P 3 1.0 1.4499e-02 1.1 0.00e+00 0.0 1.3e+04 1.3e+01 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
PCSetUp 2 1.0 3.0173e+00 1.0 1.11e+07 1.6 2.8e+06 4.2e+03 5.8e+02 1 0 0 1 5 91 33 77 91 96 3438
PCSetUpOnBlocks 4 1.0 4.0102e-04 2.7 3.29e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 82
PCApply 4 1.0 1.6476e-01 1.3 1.82e+07 1.3 8.2e+05 1.2e+03 0.0e+00 0 0 0 0 0 4 54 22 7 0 105522
--- Event Stage 2: Remaining Solves
KSPSolve 999 1.0 1.3831e+02 1.1 1.92e+10 1.3 7.3e+08 1.4e+03 1.1e+04 46100 99 97 95 51100 99 98100 133578
VecDotNorm2 3450 1.0 5.0804e+00 2.2 3.73e+08 1.0 0.0e+00 0.0e+00 3.4e+03 1 2 0 0 29 1 2 0 0 30 73340
VecMDot 2451 1.0 9.5447e+00 3.6 2.35e+08 1.0 0.0e+00 0.0e+00 2.5e+03 2 1 0 0 20 2 1 0 0 22 24644
VecNorm 5448 1.0 8.6350e+00 3.0 2.94e+08 1.0 0.0e+00 0.0e+00 5.4e+03 2 2 0 0 45 2 2 0 0 48 34070
VecScale 27600 1.0 5.1987e-01 1.4 2.33e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 437362
VecSet 72450 1.0 8.8635e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 6900 1.0 8.0184e-01 1.4 3.73e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 464680
VecAYPX 21699 1.0 1.0895e+00 2.3 1.72e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 155541
VecMAXPY 4902 1.0 1.0245e+00 1.4 4.70e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 459209
VecScatterBegin 87249 1.0 6.4859e+00 2.9 0.00e+00 0.0 7.3e+08 1.4e+03 0.0e+00 2 0 99 97 0 2 0 99 98 0 0
VecScatterEnd 87249 1.0 5.9416e+01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 13 0 0 0 0 0
MatMult 25149 1.0 3.5269e+01 1.6 5.34e+09 1.2 2.5e+08 2.0e+03 0.0e+00 9 28 33 48 0 10 28 34 49 0 146467
MatMultAdd 20700 1.0 2.7336e+01 3.9 1.19e+09 1.6 1.3e+08 6.5e+02 0.0e+00 7 6 18 8 0 8 6 18 8 0 40666
MatMultTranspose 20700 1.0 1.6038e+01 3.0 1.19e+09 1.6 1.3e+08 6.5e+02 0.0e+00 3 6 18 8 0 3 6 18 8 0 69313
MatSolve 3450 0.0 3.8933e-02 0.0 9.47e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 243
MatSOR 41400 1.0 6.3412e+01 1.6 9.37e+09 1.3 2.2e+08 1.5e+03 0.0e+00 20 49 30 32 0 22 49 30 32 0 141687
MatResidual 20700 1.0 2.5046e+01 1.6 3.93e+09 1.3 2.2e+08 1.5e+03 0.0e+00 6 20 30 32 0 7 20 30 32 0 149782
PCSetUpOnBlocks 3450 1.0 5.4419e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 3450 1.0 1.1329e+02 1.1 1.57e+10 1.3 7.0e+08 1.2e+03 0.0e+00 38 81 96 80 0 42 81 96 81 0 132043
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11416 0.
DMKSP interface 1 0 0 0.
Vector 5 110 15006256 0.
Matrix 0 65 14780672 0.
Distributed Mesh 1 0 0 0.
Index Set 2 18 171852 0.
IS L to G Mapping 1 0 0 0.
Star Forest Graph 2 0 0 0.
Discrete System 1 0 0 0.
Vec Scatter 1 13 16016 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 210 104 2238504 0.
Matrix 148 83 22951356 0.
Matrix Coarsen 6 6 3816 0.
Index Set 128 112 590828 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 34 21 25872 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 20700 20700 1128260400 0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 3.46184e-05
Average time for zero size MPI_Send(): 1.66161e-05
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-ksp_norm_type unpreconditioned
-ksp_type gcr
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1 --known-mpi-c-double-complex=1 --known-has-attribute-aligned=1 PETSC_ARCH=arch-linux-amd-opt --download-f2cblaslapack --with-mpi-dir=/cm/shared/apps/mvapich2/intel-17.0.1/2.0 --download-hypre --download-ml --with-fc=0 --with-debugging=0 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 --with-batch --with-x --known-mpi-shared-libraries=1 --known-64-bit-blas-indices=4
-----------------------------------------
Libraries compiled on 2018-05-03 16:11:18 on node52-021
Machine characteristics: Linux-2.6.32-696.18.7.el6.x86_64-x86_64-with-redhat-6.6-Carbon
Using PETSc directory: /home/ritsat/beckerm/petsc
Using PETSc arch: arch-linux-amd-opt
-----------------------------------------
Using C compiler: /cm/shared/apps/mvapich2/intel-17.0.1/2.0/bin/mpicc -fPIC -wd1572 -O3
-----------------------------------------
Using include paths: -I/home/ritsat/beckerm/petsc/include -I/home/ritsat/beckerm/petsc/arch-linux-amd-opt/include -I/cm/shared/apps/mvapich2/intel-17.0.1/2.0/include
-----------------------------------------
Using C linker: /cm/shared/apps/mvapich2/intel-17.0.1/2.0/bin/mpicc
Using libraries: -Wl,-rpath,/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -L/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -lpetsc -Wl,-rpath,/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -L/home/ritsat/beckerm/petsc/arch-linux-amd-opt/lib -lHYPRE -lml -lf2clapack -lf2cblas -lX11 -ldl
-----------------------------------------
More information about the petsc-users
mailing list