[petsc-users] strong-scaling vs weak-scaling
Justin Chang
jychang48 at gmail.com
Wed Aug 31 02:01:50 CDT 2016
Attached is the -log_view output (from firedrake). Event Stage 1:
Linear_solver is where I assemble and solve the linear system of equations.
I am using the HYPRE BoomerAMG preconditioner so log_view cannot "see into"
the exact steps, but based on what it can see, how do I distinguish between
these various setup and timing phases?
For example, when I look at these lines:
PCSetUp 1 1.0 2.2858e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 9 0 0 0 0 11 0 0 0 0 0
PCApply 38 1.0 1.4102e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 56 0 0 0 0 66 0 0 0 0 0
KSPSetUp 1 1.0 9.9111e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 1.7529e+01 1.0 2.44e+09 1.0 0.0e+00 0.0e+00
0.0e+00 70 7 0 0 0 82 7 0 0 0 139
SNESSolve 1 1.0 2.1056e+01 1.0 3.75e+10 1.0 0.0e+00 0.0e+00
0.0e+00 84100 0 0 0 99100 0 0 0 1781
SNESFunctionEval 1 1.0 1.0763e+00 1.0 1.07e+10 1.0 0.0e+00 0.0e+00
0.0e+00 4 29 0 0 0 5 29 0 0 0 9954
SNESJacobianEval 1 1.0 2.4495e+00 1.0 2.43e+10 1.0 0.0e+00 0.0e+00
0.0e+00 10 65 0 0 0 12 65 0 0 0 9937
So how do I break down "mesh setup", "matrix setup", and "solve time"
phases? I am guessing "PCSetUp" has to do with one of the first two phases,
but how would I categorize the rest of the events? I see that HYPRE doesn't
have as much information as the other PCs like GAMG and ML but can one
still breakdown the timing phases through log_view alone?
Thanks,
Justin
On Tue, Aug 30, 2016 at 11:14 PM, Jed Brown <jed at jedbrown.org> wrote:
> Mark Adams <mfadams at lbl.gov> writes:
>
> >>
> >>
> >> Anyway, what I really wanted to say is, it's good to know that these
> >> "dynamic range/performance spectrum/static scaling" plots are designed
> to
> >> go past the sweet spots. I also agree that it would be interesting to
> see a
> >> time vs dofs*iterations/time plot. Would it then also be useful to look
> at
> >> the step to setting up the preconditioner?
> >>
> >>
> > Yes, I generally split up timing between "mesh setup" (symbolic
> > factorization of LU), "matrix setup" (eg, factorizations), and solve
> time.
> > The degree of amortization that you get for the two setup phases depends
> on
> > your problem and so it is useful to separate them.
>
> Right, there is nothing wrong with splitting up the phases, but if you
> never show a spectrum for the total, then I will be suspicious. And if
> you only show "per iteration" instead of for a complete solve, then I
> will assume that you're only doing that because convergence is unusably
> slow.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160831/f5a8a36e/attachment-0001.html>
-------------- next part --------------
Residual norms for linear_ solve.
0 KSP Residual norm 5.261660052036e+02
1 KSP Residual norm 1.356995663739e+02
2 KSP Residual norm 4.098866223191e+01
3 KSP Residual norm 1.600475709119e+01
4 KSP Residual norm 6.956667251063e+00
5 KSP Residual norm 3.861942754258e+00
6 KSP Residual norm 2.331981130299e+00
7 KSP Residual norm 1.404876311943e+00
8 KSP Residual norm 8.215556397889e-01
9 KSP Residual norm 5.226439657305e-01
10 KSP Residual norm 3.421520551962e-01
11 KSP Residual norm 2.382992002722e-01
12 KSP Residual norm 1.743249670147e-01
13 KSP Residual norm 1.277911689618e-01
14 KSP Residual norm 9.453802371730e-02
15 KSP Residual norm 7.022732618304e-02
16 KSP Residual norm 5.276835142527e-02
17 KSP Residual norm 3.966717849679e-02
18 KSP Residual norm 2.987708356527e-02
19 KSP Residual norm 2.221046390150e-02
20 KSP Residual norm 1.631262945106e-02
21 KSP Residual norm 1.188030506469e-02
22 KSP Residual norm 8.655984108945e-03
23 KSP Residual norm 6.239072936196e-03
24 KSP Residual norm 4.455419528387e-03
25 KSP Residual norm 3.235023376588e-03
26 KSP Residual norm 2.345588803418e-03
27 KSP Residual norm 1.668600898579e-03
28 KSP Residual norm 1.180578845647e-03
29 KSP Residual norm 8.327223711005e-04
30 KSP Residual norm 5.853054571413e-04
31 KSP Residual norm 4.038722556707e-04
32 KSP Residual norm 2.731786184181e-04
33 KSP Residual norm 1.853188978548e-04
34 KSP Residual norm 1.277834040044e-04
35 KSP Residual norm 8.853670330190e-05
36 KSP Residual norm 6.151569062192e-05
37 KSP Residual norm 4.247283089736e-05
Linear linear_ solve converged due to CONVERGED_RTOL iterations 37
Wall-clock time: 2.126e+01 seconds
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
3D_ex1.py on a arch-python-linux-x86_64 named pacotaco-xps with 1 processor, by justin Tue Aug 30 23:34:47 2016
Using Petsc Development GIT revision: v3.4.2-13575-gc28f300 GIT Date: 2016-07-10 20:22:41 -0500
Max Max/Min Avg Total
Time (sec): 2.497e+01 1.00000 2.497e+01
Objects: 1.310e+02 1.00000 1.310e+02
Flops: 3.749e+10 1.00000 3.749e+10 3.749e+10
Flops/sec: 1.502e+09 1.00000 1.502e+09 1.502e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.7065e+00 14.8% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
1: Linear_solver: 2.1265e+01 85.2% 3.7494e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
2: Nonlinear_solver: 9.5367e-07 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 4 1.0 3.1900e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 2.7848e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
DMPlexInterp 1 1.0 1.2410e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexStratify 2 1.0 4.7100e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 7 1.0 2.5320e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
CreateMesh 8 1.0 1.2974e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
CreateExtMesh 1 1.0 4.5982e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 12 0 0 0 0 0
Mesh: reorder 1 1.0 1.7152e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
Mesh: numbering 1 1.0 7.5190e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
CreateFunctionSpace 5 1.0 4.4637e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 12 0 0 0 0 0
Trace: eval 4 1.0 1.3766e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 37 0 0 0 0 0
ParLoopExecute 2 1.0 1.3765e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 37 0 0 0 0 0
ParLoopCKernel 6 1.0 1.3747e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 37 0 0 0 0 0
ParLoopReductionBegin 2 1.0 9.0599e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopReductionEnd 2 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
CreateSparsity 1 1.0 6.4163e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 17 0 0 0 0 0
MatZeroInitial 1 1.0 6.9048e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 19 0 0 0 0 0
--- Event Stage 1: Linear_solver
VecTDot 74 1.0 6.9256e-02 1.0 1.52e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2202
VecNorm 38 1.0 1.8549e-02 1.0 7.83e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4221
VecCopy 4 1.0 4.4966e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 45 1.0 1.4026e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 75 1.0 9.5319e-02 1.0 1.55e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1621
VecAYPX 36 1.0 4.9965e-02 1.0 7.42e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1485
MatMult 37 1.0 9.0438e-01 1.0 1.98e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 5 0 0 0 4 5 0 0 0 2189
MatConvert 1 1.0 1.0125e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.6134e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 1 1.0 8.9929e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 1 1.0 2.2858e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 11 0 0 0 0 0
PCApply 38 1.0 1.4102e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 56 0 0 0 0 66 0 0 0 0 0
KSPSetUp 1 1.0 9.9111e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 1.7529e+01 1.0 2.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 70 7 0 0 0 82 7 0 0 0 139
SNESSolve 1 1.0 2.1056e+01 1.0 3.75e+10 1.0 0.0e+00 0.0e+00 0.0e+00 84100 0 0 0 99100 0 0 0 1781
SNESFunctionEval 1 1.0 1.0763e+00 1.0 1.07e+10 1.0 0.0e+00 0.0e+00 0.0e+00 4 29 0 0 0 5 29 0 0 0 9954
SNESJacobianEval 1 1.0 2.4495e+00 1.0 2.43e+10 1.0 0.0e+00 0.0e+00 0.0e+00 10 65 0 0 0 12 65 0 0 0 9937
Trace: eval 11 1.0 3.6623e+00 1.0 3.51e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 93 0 0 0 17 93 0 0 0 9572
ParLoopExecute 14 1.0 3.6407e+00 1.0 3.51e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 93 0 0 0 17 93 0 0 0 9629
ParLoopCKernel 31 1.0 3.6314e+00 1.0 3.51e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 93 0 0 0 17 93 0 0 0 9653
ParLoopReductionBegin 14 1.0 4.6015e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ParLoopReductionEnd 14 1.0 2.2411e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
ApplyBC 6 1.0 1.6722e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 2: Nonlinear_solver
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 10 8 4608 0.
Viewer 2 0 0 0.
Index Set 36 27 21384 0.
IS L to G Mapping 1 0 0 0.
Section 24 8 5568 0.
Vector 5 4 32975776 0.
Matrix 2 0 0 0.
Preconditioner 1 1 1400 0.
Krylov Solver 1 1 1248 0.
SNES 1 1 1344 0.
SNESLineSearch 1 1 992 0.
DMSNES 1 0 0 0.
Distributed Mesh 8 4 19008 0.
GraphPartitioner 2 1 612 0.
Star Forest Bipartite Graph 19 11 8888 0.
Discrete System 8 4 3520 0.
--- Event Stage 1: Linear_solver
Vector 8 2 16487888 0.
DMKSP interface 1 0 0 0.
--- Event Stage 2: Nonlinear_solver
========================================================================================================================
Average time to get PetscTime(): 0.
#PETSc Option Table entries:
-ksp_rtol 1e-3
-linear_ksp_atol 1e-50
-linear_ksp_converged_reason
-linear_ksp_monitor
-linear_ksp_rtol 1e-7
-linear_ksp_type cg
-linear_pc_hypre_boomeramg_agg_nl 2
-linear_pc_hypre_boomeramg_strong_threshold 0.75
-linear_pc_hypre_type boomeramg
-linear_pc_type hypre
-linear_snes_atol 1e-8
-linear_snes_type ksponly
-log_view
-snes_converged_reason
-snes_max_it 1000
-tao_converged_reason
-tao_max_it 1000
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc PETSC_ARCH=arch-python-linux-x86_64 --with-shared-libraries=1 --with-debugging=0 --with-c2html=0 --with-cc=/usr/bin/mpicc --with-cxx=/usr/bin/mpicxx --with-fc=/usr/bin/mpif90 --download-ml --download-ctetgen --download-triangle --download-chaco --download-metis --download-parmetis --download-scalapack --download-hypre --download-mumps --download-netcdf --download-hdf5 --download-exodusii
-----------------------------------------
Libraries compiled on Fri Aug 5 02:51:37 2016 on pacotaco-xps
Machine characteristics: Linux-4.4.0-31-generic-x86_64-with-Ubuntu-16.04-xenial
Using PETSc directory: /tmp/pip-kA7m2r-build
Using PETSc arch: arch-python-linux-x86_64
-----------------------------------------
Using C compiler: /usr/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /usr/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/include -I/tmp/pip-kA7m2r-build/include -I/tmp/pip-kA7m2r-build/include -I/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/include -I/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc/include -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------
Using C linker: /usr/bin/mpicc
Using Fortran linker: /usr/bin/mpif90
Using libraries: -Wl,-rpath,/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/lib -L/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/lib -lpetsc -Wl,-rpath,/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc/lib -L/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc/lib -lHYPRE -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lml -lmpi_cxx -lstdc++ -llapack -lblas -lparmetis -lmetis -lexoIIv2for -lexodus -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -ltriangle -lX11 -lhwloc -lctetgen -lssl -lcrypto -lchaco -lm -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi -lgcc_s -lpthread -ldl
-----------------------------------------
More information about the petsc-users
mailing list