[petsc-users] Interpreting -log_summary, amount of communication
Barry Smith
bsmith at mcs.anl.gov
Fri Jun 13 15:18:43 CDT 2014
On Jun 13, 2014, at 6:57 AM, Åsmund Ervik <asmund.ervik at ntnu.no> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dear PETSc,
>
> First of all, bug report for the manual (for petsc-current): in Fig.
> 20 and 21, something has not gone well with \href and listings, so I
> can't understand those figures properly.
>
> I read the chapter in the manual, 12.1, but it din't answer my
> question. When I run a parallel code with -log_summary, how can I see
> details of the time spent on communication? To be more specific: I
> have some code that does communication at the start of each time step,
> and I guess KSP has to do some communications when it is solving my
> Poisson equation. If I understand correctly, both these communications
> are listed under "VecAssemblyEnd", but how do I tell the division of
> the time between those two? Do I have to register some stages, etc.?
It is not really possibly to say “oh 80% of the time is in computation and 20% in communication”
From your log file
VecScatterEnd 157271 1.0 1.1521e+03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 25 0 0 0 0 25 0 0 0 0 0
See the 25 near the end, this is the percent of total time going into the VecScatter end which is the communication (except for reductions) in the KSPSolve(). So lots of time in the communication but this is expected with an ethernet network. When you switch to a better network this will drop a good amount.
>
> Below is the output from the performance summary. Is this bad in terms
> of the time spent on communication? This is using 4 nodes, 4 cores per
> node on a small cluster with Intel E5-2670 8-core CPUs. The Streams
> benchmark indicated that I can't really utilize more than 4 cores per
> node. The interconnect is 1 Gb/s ethernet. The speedup vs. 1 core is
> 9x. I'm solving incompressible Navier-Stokes on a 128^3 grid, with a
> pressure Poisson equation. In this case I used SOR and BiCGStab.
Likely you will want to use -pc_type gamg for this solve
>
> This cluster is where I'm learning the ropes, and I will be using more
> tightly-coupled systems in the future (Infiniband). Should I expect an
> increase in speedup when I use those?
Yes. Your current machine is terrible for parallelism.
> Average time for MPI_Barrier(): 0.000110197
> Average time for zero size MPI_Send(): 2.65092e-05
From my perspective your code is running reasonably on this machine.
Barry
>
> Best regards,
> Åsmund Ervik
>
>
>
>
> - ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
>
> ./run on a double-real named compute-3-11.local with 16 processors, by
> asmunder Thu Jun 12 14:22:08 2014
> Using Petsc Release Version 3.4.2, Jul, 02, 2013
>
> Max Max/Min Avg Total
> Time (sec): 3.246e+03 1.00000 3.246e+03
> Objects: 9.800e+02 1.00000 9.800e+02
> Flops: 1.667e+12 1.00447 1.663e+12 2.661e+13
> Flops/sec: 5.134e+08 1.00447 5.122e+08 8.196e+09
> MPI Messages: 6.163e+05 1.33327 5.393e+05 8.629e+06
> MPI Message Lengths: 4.626e+10 1.49807 7.151e+04 6.171e+11
> MPI Reductions: 4.576e+05 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length
> N --> 2N flops
> and VecAXPY() for complex vectors of
> length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 3.2462e+03 100.0% 2.6605e+13 100.0% 8.629e+06
> 100.0% 7.151e+04 100.0% 4.576e+05 100.0%
>
> -
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
> %T - percent time in this phase %f - percent flops in
> this phase
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
> %T - percent time in this phase %f - percent flops in
> this phase
> %M - percent messages in this phase %L - percent message
> lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> -
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg
> len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
> -
> ------------------------------------------------------------------------------------------------------------------------
>
> - --- Event Stage 0: Main Stage
>
> PetscBarrier 9 1.0 1.3815e+02356592.4 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
> VecDot 150040 1.0 5.0756e+01 2.1 3.93e+10 1.0 0.0e+00
> 0.0e+00 1.5e+05 1 2 0 0 33 1 2 0 0 33 12399
> VecDotNorm2 75020 1.0 3.1857e+01 1.9 3.93e+10 1.0 0.0e+00
> 0.0e+00 7.5e+04 1 2 0 0 16 1 2 0 0 16 19754
> VecNorm 76620 1.0 2.4263e+01 3.0 2.01e+10 1.0 0.0e+00
> 0.0e+00 7.7e+04 1 1 0 0 17 1 1 0 0 17 13245
> VecCopy 1600 1.0 3.7060e-01 1.4 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 1632 1.0 7.2079e-01 4.2 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 800 1.0 3.7573e-01 1.7 2.10e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8931
> VecAXPBYCZ 150040 1.0 3.9948e+01 1.1 7.87e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 31507
> VecWAXPY 150040 1.0 3.6130e+01 1.2 3.93e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 17418
> VecAssemblyBegin 801 1.0 8.6365e+00 8.5 0.00e+00 0.0 0.0e+00
> 0.0e+00 2.4e+03 0 0 0 0 1 0 0 0 0 1 0
> VecAssemblyEnd 801 1.0 1.2031e-03 1.6 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 157298 1.0 2.1938e+01 1.8 0.00e+00 0.0 8.6e+06
> 7.1e+04 2.7e+01 1 0100100 0 1 0100100 0 0
> VecScatterEnd 157271 1.0 1.1521e+03 3.1 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 25 0 0 0 0 25 0 0 0 0 0
> MatMult 150840 1.0 1.7150e+03 1.9 7.24e+11 1.0 8.4e+06
> 7.0e+04 0.0e+00 42 43 98 96 0 42 43 98 96 0 6721
> MatSOR 151640 1.0 1.0433e+03 1.4 7.25e+11 1.0 0.0e+00
> 0.0e+00 0.0e+00 26 44 0 0 0 26 44 0 0 0 11126
> MatAssemblyBegin 2 1.0 7.9169e-03 9.9 0.00e+00 0.0 0.0e+00
> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 2 1.0 2.1147e-02 1.2 0.00e+00 0.0 1.1e+02
> 1.8e+04 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSetUp 1 1.0 3.4809e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 800 1.0 2.9176e+03 1.0 1.67e+12 1.0 8.4e+06
> 7.0e+04 4.6e+05 90100 98 96100 90100 98 96100 9119
> PCSetUp 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 151640 1.0 1.0436e+03 1.4 7.25e+11 1.0 0.0e+00
> 0.0e+00 0.0e+00 26 44 0 0 0 26 44 0 0 0 11123
> -
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants'
> Mem.
> Reports information only for process 0.
>
> - --- Event Stage 0: Main Stage
>
> Vector 859 812 840120768 0
> Vector Scatter 39 27 17388 0
> Matrix 3 0 0 0
> Matrix Null Space 1 0 0 0
> Distributed Mesh 4 0 0 0
> Bipartite Graph 8 0 0 0
> Index Set 55 55 2636500 0
> IS L to G Mapping 7 0 0 0
> Krylov Solver 1 0 0 0
> DMKSP interface 1 0 0 0
> Preconditioner 1 0 0 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Average time for MPI_Barrier(): 0.000110197
> Average time for zero size MPI_Send(): 2.65092e-05
> #PETSc Option Table entries:
> - -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Fri Sep 13 14:49:18 2013
> Configure options: PETSC_ARCH=double-real --with-precision=double
> - --with-scalar-type=real --with-cc=mpicc --with-cxx=mpicxx
> - --with-fc=mpif90 --with-mpiexec=mpiexec --with-debugging=0
> - --COPTFLAGS="-O2 -fp-model extended" --FOPTFLAGS="-O2 -fltconsistency"
> -
> --with-blas-lapack-dir=/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
> - --with-64-bit-indices=0 --with-clanguage=c++ --with-shared-libraries=0
> - --download-ml --download-hypre
> - -----------------------------------------
> Libraries compiled on Fri Sep 13 14:49:18 2013 on rocks.hpc.ntnu.no
> Machine characteristics:
> Linux-2.6.18-308.1.1.el5-x86_64-with-redhat-5.6-Tikanga
> Using PETSc directory: /share/apps/modulessoftware/petsc/petsc-3.4.2
> Using PETSc arch: double-real
> - -----------------------------------------
>
> Using C compiler: mpicxx -wd1572 -O3 ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90 -O2 -fltconsistency ${FOPTFLAGS}
> ${FFLAGS}
> - -----------------------------------------
> Using include paths:
> - -I/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/include
> - -I/share/apps/modulessoftware/petsc/petsc-3.4.2/include
> - -I/share/apps/modulessoftware/petsc/petsc-3.4.2/include
> - -I/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/include
> - -I/share/apps/modulessoftware/openmpi/openmpi-1.7.2-intel/include
> - -----------------------------------------
>
> Using C linker: mpicxx
> Using Fortran linker: mpif90
> Using libraries:
> - -Wl,-rpath,/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
> - -L/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
> - -lpetsc
> - -Wl,-rpath,/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
> - -L/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib -lml
> - -lHYPRE
> -
> -Wl,-rpath,/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
> - -L/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
> - -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11
> - -lpthread
> - -L/share/apps/modulessoftware/openmpi/openmpi-1.7.2-intel/lib
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/compiler/lib/intel64
> -
> -L/share/apps/modulessoftware/intel/compilers/opt/intel/mic/coi/host-linux-release/lib
> - -L/share/apps/modulessoftware/intel/compilers/opt/intel/mic/myo/lib
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/mpirt/lib/intel64
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/ipp/lib/intel64
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/mkl/lib/intel64
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/tbb/lib/intel64
> -
> -L/gpfs/shareapps/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/compiler/lib/intel64
> - -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lmpi_usempif08
> - -lmpi_usempi_ignore_tkr -lmpi_mpifh -lifport -lifcore -lm -lm
> - -lmpi_cxx -ldl -lmpi -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts
> - -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -ldl
> - -----------------------------------------
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJTmucvAAoJED+FDAHgGz19vsYH/3E+g74VQFYIAIcf4tN/99WR
> c2ofaByZyXbU9e7NiQyn0gbqwBjDtYKOWe8vMRkWx7AdVBgS0z2ChjpZHK5TrtlF
> tW2JNztHBB7hgTisd5/2N5toNiCQWxUJu4/8jzbvjoaXrfU+aV3igLTLNbcT/2Rz
> KSmPxxc77JYj55vd4v8E8yxA1sfwppMCcyTwzlOSGRO8yiie1fgaDvQFySeoNEL5
> ZMBwicNH4YBFYmEI8TH0DP6AjElW9mQOsEM+ktpupxmoFxwG3ciMKxrzpt3ID8Dw
> X6gv+F8F73tzsLN09SPkjmz/vPtoS03om9ZnkQYm+qaLQ+n1wz6RcnpG/Bo3y6Q=
> =DTtk
> -----END PGP SIGNATURE-----
More information about the petsc-users
mailing list