[petsc-users] Interpreting -log_summary, amount of communication

Fri Jun 13 15:18:43 CDT 2014

On Jun 13, 2014, at 6:57 AM, Åsmund Ervik <asmund.ervik at ntnu.no> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Dear PETSc,
> 
> First of all, bug report for the manual (for petsc-current): in Fig.
> 20 and 21, something has not gone well with \href and listings, so I
> can't understand those figures properly.
> 
> I read the chapter in the manual, 12.1, but it din't answer my
> question. When I run a parallel code with -log_summary, how can I see
> details of the time spent on communication? To be more specific: I
> have some code that does communication at the start of each time step,
> and I guess KSP has to do some communications when it is solving my
> Poisson equation. If I understand correctly, both these communications
> are listed under "VecAssemblyEnd", but how do I tell the division of
> the time between those two? Do I have to register some stages, etc.?

   It is not really possibly to say “oh 80% of the time is in computation and 20% in communication” 

  From your log file

VecScatterEnd     157271 1.0 1.1521e+03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 25  0  0  0  0  25  0  0  0  0     0

See the 25 near the end, this is the percent of total time going into the VecScatter end which is the  communication (except for reductions) in the KSPSolve(). So lots of time in the communication but this is expected with an ethernet network. When you switch to a better network this will drop a good amount.

> 
> Below is the output from the performance summary. Is this bad in terms
> of the time spent on communication? This is using 4 nodes, 4 cores per
> node on a small cluster with Intel E5-2670 8-core CPUs. The Streams
> benchmark indicated that I can't really utilize more than 4 cores per
> node. The interconnect is 1 Gb/s ethernet. The speedup vs. 1 core is
> 9x. I'm solving incompressible Navier-Stokes on a 128^3 grid, with a
> pressure Poisson equation. In this case I used SOR and BiCGStab.

   Likely you will want to use -pc_type gamg for this solve

> 
> This cluster is where I'm learning the ropes, and I will be using more
> tightly-coupled systems in the future (Infiniband). Should I expect an
> increase in speedup when I use those?

  Yes. Your current machine is terrible for parallelism.

> Average time for MPI_Barrier(): 0.000110197
> Average time for zero size MPI_Send(): 2.65092e-05

   From my perspective your code is running reasonably on this machine.

Barry

> 
> Best regards,
> Åsmund Ervik
> 
> 
> 
> 
> - ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
> 
> ./run on a double-real named compute-3-11.local with 16 processors, by
> asmunder Thu Jun 12 14:22:08 2014
> Using Petsc Release Version 3.4.2, Jul, 02, 2013
> 
>                         Max       Max/Min        Avg      Total
> Time (sec):           3.246e+03      1.00000   3.246e+03
> Objects:              9.800e+02      1.00000   9.800e+02
> Flops:                1.667e+12      1.00447   1.663e+12  2.661e+13
> Flops/sec:            5.134e+08      1.00447   5.122e+08  8.196e+09
> MPI Messages:         6.163e+05      1.33327   5.393e+05  8.629e+06
> MPI Message Lengths:  4.626e+10      1.49807   7.151e+04  6.171e+11
> MPI Reductions:       4.576e+05      1.00000
> 
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                            e.g., VecAXPY() for real vectors of length
> N --> 2N flops
>                            and VecAXPY() for complex vectors of
> length N --> 8N flops
> 
> Summary of Stages:   ----- Time ------  ----- Flops -----  ---
> Messages ---  -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
> 0:      Main Stage: 3.2462e+03 100.0%  2.6605e+13 100.0%  8.629e+06
> 100.0%  7.151e+04      100.0%  4.576e+05 100.0%
> 
> -
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>   Count: number of times phase was executed
>   Time and Flops: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>   Mess: number of messages sent
>   Avg. len: average message length (bytes)
>   Reduct: number of global reductions
>   Global: entire computation
>   Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
>      %T - percent time in this phase         %f - percent flops in
> this phase
>   Avg. len: average message length (bytes)
>   Reduct: number of global reductions
>   Global: entire computation
>   Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
>      %T - percent time in this phase         %f - percent flops in
> this phase
>      %M - percent messages in this phase     %L - percent message
> lengths in this phase
>      %R - percent reductions in this phase
>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> -
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops
>           --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
> len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
> -
> ------------------------------------------------------------------------------------------------------------------------
> 
> - --- Event Stage 0: Main Stage
> 
> PetscBarrier           9 1.0 1.3815e+02356592.4 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
> VecDot            150040 1.0 5.0756e+01 2.1 3.93e+10 1.0 0.0e+00
> 0.0e+00 1.5e+05  1  2  0  0 33   1  2  0  0 33 12399
> VecDotNorm2        75020 1.0 3.1857e+01 1.9 3.93e+10 1.0 0.0e+00
> 0.0e+00 7.5e+04  1  2  0  0 16   1  2  0  0 16 19754
> VecNorm            76620 1.0 2.4263e+01 3.0 2.01e+10 1.0 0.0e+00
> 0.0e+00 7.7e+04  1  1  0  0 17   1  1  0  0 17 13245
> VecCopy             1600 1.0 3.7060e-01 1.4 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              1632 1.0 7.2079e-01 4.2 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              800 1.0 3.7573e-01 1.7 2.10e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  8931
> VecAXPBYCZ        150040 1.0 3.9948e+01 1.1 7.87e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0 31507
> VecWAXPY          150040 1.0 3.6130e+01 1.2 3.93e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 17418
> VecAssemblyBegin     801 1.0 8.6365e+00 8.5 0.00e+00 0.0 0.0e+00
> 0.0e+00 2.4e+03  0  0  0  0  1   0  0  0  0  1     0
> VecAssemblyEnd       801 1.0 1.2031e-03 1.6 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin   157298 1.0 2.1938e+01 1.8 0.00e+00 0.0 8.6e+06
> 7.1e+04 2.7e+01  1  0100100  0   1  0100100  0     0
> VecScatterEnd     157271 1.0 1.1521e+03 3.1 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 25  0  0  0  0  25  0  0  0  0     0
> MatMult           150840 1.0 1.7150e+03 1.9 7.24e+11 1.0 8.4e+06
> 7.0e+04 0.0e+00 42 43 98 96  0  42 43 98 96  0  6721
> MatSOR            151640 1.0 1.0433e+03 1.4 7.25e+11 1.0 0.0e+00
> 0.0e+00 0.0e+00 26 44  0  0  0  26 44  0  0  0 11126
> MatAssemblyBegin       2 1.0 7.9169e-03 9.9 0.00e+00 0.0 0.0e+00
> 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         2 1.0 2.1147e-02 1.2 0.00e+00 0.0 1.1e+02
> 1.8e+04 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSetUp               1 1.0 3.4809e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve             800 1.0 2.9176e+03 1.0 1.67e+12 1.0 8.4e+06
> 7.0e+04 4.6e+05 90100 98 96100  90100 98 96100  9119
> PCSetUp                1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply           151640 1.0 1.0436e+03 1.4 7.25e+11 1.0 0.0e+00
> 0.0e+00 0.0e+00 26 44  0  0  0  26 44  0  0  0 11123
> -
> ------------------------------------------------------------------------------------------------------------------------
> 
> Memory usage is given in bytes:
> 
> Object Type          Creations   Destructions     Memory  Descendants'
> Mem.
> Reports information only for process 0.
> 
> - --- Event Stage 0: Main Stage
> 
>              Vector   859            812    840120768     0
>      Vector Scatter    39             27        17388     0
>              Matrix     3              0            0     0
>   Matrix Null Space     1              0            0     0
>    Distributed Mesh     4              0            0     0
>     Bipartite Graph     8              0            0     0
>           Index Set    55             55      2636500     0
>   IS L to G Mapping     7              0            0     0
>       Krylov Solver     1              0            0     0
>     DMKSP interface     1              0            0     0
>      Preconditioner     1              0            0     0
>              Viewer     1              0            0     0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Average time for MPI_Barrier(): 0.000110197
> Average time for zero size MPI_Send(): 2.65092e-05
> #PETSc Option Table entries:
> - -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Fri Sep 13 14:49:18 2013
> Configure options: PETSC_ARCH=double-real --with-precision=double
> - --with-scalar-type=real --with-cc=mpicc --with-cxx=mpicxx
> - --with-fc=mpif90 --with-mpiexec=mpiexec --with-debugging=0
> - --COPTFLAGS="-O2 -fp-model extended" --FOPTFLAGS="-O2 -fltconsistency"
> -
> --with-blas-lapack-dir=/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
> - --with-64-bit-indices=0 --with-clanguage=c++ --with-shared-libraries=0
> - --download-ml --download-hypre
> - -----------------------------------------
> Libraries compiled on Fri Sep 13 14:49:18 2013 on rocks.hpc.ntnu.no
> Machine characteristics:
> Linux-2.6.18-308.1.1.el5-x86_64-with-redhat-5.6-Tikanga
> Using PETSc directory: /share/apps/modulessoftware/petsc/petsc-3.4.2
> Using PETSc arch: double-real
> - -----------------------------------------
> 
> Using C compiler: mpicxx  -wd1572 -O3     ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: mpif90  -O2 -fltconsistency   ${FOPTFLAGS}
> ${FFLAGS}
> - -----------------------------------------
> Using include paths:
> - -I/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/include
> - -I/share/apps/modulessoftware/petsc/petsc-3.4.2/include
> - -I/share/apps/modulessoftware/petsc/petsc-3.4.2/include
> - -I/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/include
> - -I/share/apps/modulessoftware/openmpi/openmpi-1.7.2-intel/include
> - -----------------------------------------
> 
> Using C linker: mpicxx
> Using Fortran linker: mpif90
> Using libraries:
> - -Wl,-rpath,/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
> - -L/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
> - -lpetsc
> - -Wl,-rpath,/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
> - -L/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib -lml
> - -lHYPRE
> -
> -Wl,-rpath,/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
> - -L/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
> - -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11
> - -lpthread
> - -L/share/apps/modulessoftware/openmpi/openmpi-1.7.2-intel/lib
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/compiler/lib/intel64
> -
> -L/share/apps/modulessoftware/intel/compilers/opt/intel/mic/coi/host-linux-release/lib
> - -L/share/apps/modulessoftware/intel/compilers/opt/intel/mic/myo/lib
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/mpirt/lib/intel64
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/ipp/lib/intel64
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/mkl/lib/intel64
> -
> -L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/tbb/lib/intel64
> -
> -L/gpfs/shareapps/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/compiler/lib/intel64
> - -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lmpi_usempif08
> - -lmpi_usempi_ignore_tkr -lmpi_mpifh -lifport -lifcore -lm -lm
> - -lmpi_cxx -ldl -lmpi -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts
> - -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -ldl
> - -----------------------------------------
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iQEcBAEBAgAGBQJTmucvAAoJED+FDAHgGz19vsYH/3E+g74VQFYIAIcf4tN/99WR
> c2ofaByZyXbU9e7NiQyn0gbqwBjDtYKOWe8vMRkWx7AdVBgS0z2ChjpZHK5TrtlF
> tW2JNztHBB7hgTisd5/2N5toNiCQWxUJu4/8jzbvjoaXrfU+aV3igLTLNbcT/2Rz
> KSmPxxc77JYj55vd4v8E8yxA1sfwppMCcyTwzlOSGRO8yiie1fgaDvQFySeoNEL5
> ZMBwicNH4YBFYmEI8TH0DP6AjElW9mQOsEM+ktpupxmoFxwG3ciMKxrzpt3ID8Dw
> X6gv+F8F73tzsLN09SPkjmz/vPtoS03om9ZnkQYm+qaLQ+n1wz6RcnpG/Bo3y6Q=
> =DTtk
> -----END PGP SIGNATURE-----