[petsc-users] Interpreting -log_summary, amount of communication
Åsmund Ervik
asmund.ervik at ntnu.no
Fri Jun 13 06:57:35 CDT 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dear PETSc,
First of all, bug report for the manual (for petsc-current): in Fig.
20 and 21, something has not gone well with \href and listings, so I
can't understand those figures properly.
I read the chapter in the manual, 12.1, but it din't answer my
question. When I run a parallel code with -log_summary, how can I see
details of the time spent on communication? To be more specific: I
have some code that does communication at the start of each time step,
and I guess KSP has to do some communications when it is solving my
Poisson equation. If I understand correctly, both these communications
are listed under "VecAssemblyEnd", but how do I tell the division of
the time between those two? Do I have to register some stages, etc.?
Below is the output from the performance summary. Is this bad in terms
of the time spent on communication? This is using 4 nodes, 4 cores per
node on a small cluster with Intel E5-2670 8-core CPUs. The Streams
benchmark indicated that I can't really utilize more than 4 cores per
node. The interconnect is 1 Gb/s ethernet. The speedup vs. 1 core is
9x. I'm solving incompressible Navier-Stokes on a 128^3 grid, with a
pressure Poisson equation. In this case I used SOR and BiCGStab.
This cluster is where I'm learning the ropes, and I will be using more
tightly-coupled systems in the future (Infiniband). Should I expect an
increase in speedup when I use those?
Best regards,
Åsmund Ervik
- ---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./run on a double-real named compute-3-11.local with 16 processors, by
asmunder Thu Jun 12 14:22:08 2014
Using Petsc Release Version 3.4.2, Jul, 02, 2013
Max Max/Min Avg Total
Time (sec): 3.246e+03 1.00000 3.246e+03
Objects: 9.800e+02 1.00000 9.800e+02
Flops: 1.667e+12 1.00447 1.663e+12 2.661e+13
Flops/sec: 5.134e+08 1.00447 5.122e+08 8.196e+09
MPI Messages: 6.163e+05 1.33327 5.393e+05 8.629e+06
MPI Message Lengths: 4.626e+10 1.49807 7.151e+04 6.171e+11
MPI Reductions: 4.576e+05 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length
N --> 2N flops
and VecAXPY() for complex vectors of
length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- ---
Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 3.2462e+03 100.0% 2.6605e+13 100.0% 8.629e+06
100.0% 7.151e+04 100.0% 4.576e+05 100.0%
-
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in
this phase
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in
this phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
-
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg
len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
-
------------------------------------------------------------------------------------------------------------------------
- --- Event Stage 0: Main Stage
PetscBarrier 9 1.0 1.3815e+02356592.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
VecDot 150040 1.0 5.0756e+01 2.1 3.93e+10 1.0 0.0e+00
0.0e+00 1.5e+05 1 2 0 0 33 1 2 0 0 33 12399
VecDotNorm2 75020 1.0 3.1857e+01 1.9 3.93e+10 1.0 0.0e+00
0.0e+00 7.5e+04 1 2 0 0 16 1 2 0 0 16 19754
VecNorm 76620 1.0 2.4263e+01 3.0 2.01e+10 1.0 0.0e+00
0.0e+00 7.7e+04 1 1 0 0 17 1 1 0 0 17 13245
VecCopy 1600 1.0 3.7060e-01 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1632 1.0 7.2079e-01 4.2 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 800 1.0 3.7573e-01 1.7 2.10e+08 1.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8931
VecAXPBYCZ 150040 1.0 3.9948e+01 1.1 7.87e+10 1.0 0.0e+00
0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 31507
VecWAXPY 150040 1.0 3.6130e+01 1.2 3.93e+10 1.0 0.0e+00
0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 17418
VecAssemblyBegin 801 1.0 8.6365e+00 8.5 0.00e+00 0.0 0.0e+00
0.0e+00 2.4e+03 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 801 1.0 1.2031e-03 1.6 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 157298 1.0 2.1938e+01 1.8 0.00e+00 0.0 8.6e+06
7.1e+04 2.7e+01 1 0100100 0 1 0100100 0 0
VecScatterEnd 157271 1.0 1.1521e+03 3.1 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 25 0 0 0 0 25 0 0 0 0 0
MatMult 150840 1.0 1.7150e+03 1.9 7.24e+11 1.0 8.4e+06
7.0e+04 0.0e+00 42 43 98 96 0 42 43 98 96 0 6721
MatSOR 151640 1.0 1.0433e+03 1.4 7.25e+11 1.0 0.0e+00
0.0e+00 0.0e+00 26 44 0 0 0 26 44 0 0 0 11126
MatAssemblyBegin 2 1.0 7.9169e-03 9.9 0.00e+00 0.0 0.0e+00
0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 2.1147e-02 1.2 0.00e+00 0.0 1.1e+02
1.8e+04 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 1 1.0 3.4809e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 800 1.0 2.9176e+03 1.0 1.67e+12 1.0 8.4e+06
7.0e+04 4.6e+05 90100 98 96100 90100 98 96100 9119
PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 151640 1.0 1.0436e+03 1.4 7.25e+11 1.0 0.0e+00
0.0e+00 0.0e+00 26 44 0 0 0 26 44 0 0 0 11123
-
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants'
Mem.
Reports information only for process 0.
- --- Event Stage 0: Main Stage
Vector 859 812 840120768 0
Vector Scatter 39 27 17388 0
Matrix 3 0 0 0
Matrix Null Space 1 0 0 0
Distributed Mesh 4 0 0 0
Bipartite Graph 8 0 0 0
Index Set 55 55 2636500 0
IS L to G Mapping 7 0 0 0
Krylov Solver 1 0 0 0
DMKSP interface 1 0 0 0
Preconditioner 1 0 0 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000110197
Average time for zero size MPI_Send(): 2.65092e-05
#PETSc Option Table entries:
- -log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Fri Sep 13 14:49:18 2013
Configure options: PETSC_ARCH=double-real --with-precision=double
- --with-scalar-type=real --with-cc=mpicc --with-cxx=mpicxx
- --with-fc=mpif90 --with-mpiexec=mpiexec --with-debugging=0
- --COPTFLAGS="-O2 -fp-model extended" --FOPTFLAGS="-O2 -fltconsistency"
-
--with-blas-lapack-dir=/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
- --with-64-bit-indices=0 --with-clanguage=c++ --with-shared-libraries=0
- --download-ml --download-hypre
- -----------------------------------------
Libraries compiled on Fri Sep 13 14:49:18 2013 on rocks.hpc.ntnu.no
Machine characteristics:
Linux-2.6.18-308.1.1.el5-x86_64-with-redhat-5.6-Tikanga
Using PETSc directory: /share/apps/modulessoftware/petsc/petsc-3.4.2
Using PETSc arch: double-real
- -----------------------------------------
Using C compiler: mpicxx -wd1572 -O3 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -O2 -fltconsistency ${FOPTFLAGS}
${FFLAGS}
- -----------------------------------------
Using include paths:
- -I/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/include
- -I/share/apps/modulessoftware/petsc/petsc-3.4.2/include
- -I/share/apps/modulessoftware/petsc/petsc-3.4.2/include
- -I/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/include
- -I/share/apps/modulessoftware/openmpi/openmpi-1.7.2-intel/include
- -----------------------------------------
Using C linker: mpicxx
Using Fortran linker: mpif90
Using libraries:
- -Wl,-rpath,/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
- -L/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
- -lpetsc
- -Wl,-rpath,/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
- -L/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib -lml
- -lHYPRE
-
-Wl,-rpath,/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
- -L/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
- -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11
- -lpthread
- -L/share/apps/modulessoftware/openmpi/openmpi-1.7.2-intel/lib
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/compiler/lib/intel64
-
-L/share/apps/modulessoftware/intel/compilers/opt/intel/mic/coi/host-linux-release/lib
- -L/share/apps/modulessoftware/intel/compilers/opt/intel/mic/myo/lib
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/mpirt/lib/intel64
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/ipp/lib/intel64
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/mkl/lib/intel64
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/tbb/lib/intel64
-
-L/gpfs/shareapps/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/compiler/lib/intel64
- -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lmpi_usempif08
- -lmpi_usempi_ignore_tkr -lmpi_mpifh -lifport -lifcore -lm -lm
- -lmpi_cxx -ldl -lmpi -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts
- -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -ldl
- -----------------------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJTmucvAAoJED+FDAHgGz19vsYH/3E+g74VQFYIAIcf4tN/99WR
c2ofaByZyXbU9e7NiQyn0gbqwBjDtYKOWe8vMRkWx7AdVBgS0z2ChjpZHK5TrtlF
tW2JNztHBB7hgTisd5/2N5toNiCQWxUJu4/8jzbvjoaXrfU+aV3igLTLNbcT/2Rz
KSmPxxc77JYj55vd4v8E8yxA1sfwppMCcyTwzlOSGRO8yiie1fgaDvQFySeoNEL5
ZMBwicNH4YBFYmEI8TH0DP6AjElW9mQOsEM+ktpupxmoFxwG3ciMKxrzpt3ID8Dw
X6gv+F8F73tzsLN09SPkjmz/vPtoS03om9ZnkQYm+qaLQ+n1wz6RcnpG/Bo3y6Q=
=DTtk
-----END PGP SIGNATURE-----
More information about the petsc-users
mailing list