[petsc-users] Interpreting -log_summary, amount of communication

Åsmund Ervik asmund.ervik at ntnu.no
Fri Jun 13 06:57:35 CDT 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dear PETSc,

First of all, bug report for the manual (for petsc-current): in Fig.
20 and 21, something has not gone well with \href and listings, so I
can't understand those figures properly.

I read the chapter in the manual, 12.1, but it din't answer my
question. When I run a parallel code with -log_summary, how can I see
details of the time spent on communication? To be more specific: I
have some code that does communication at the start of each time step,
and I guess KSP has to do some communications when it is solving my
Poisson equation. If I understand correctly, both these communications
are listed under "VecAssemblyEnd", but how do I tell the division of
the time between those two? Do I have to register some stages, etc.?

Below is the output from the performance summary. Is this bad in terms
of the time spent on communication? This is using 4 nodes, 4 cores per
node on a small cluster with Intel E5-2670 8-core CPUs. The Streams
benchmark indicated that I can't really utilize more than 4 cores per
node. The interconnect is 1 Gb/s ethernet. The speedup vs. 1 core is
9x. I'm solving incompressible Navier-Stokes on a 128^3 grid, with a
pressure Poisson equation. In this case I used SOR and BiCGStab.

This cluster is where I'm learning the ropes, and I will be using more
tightly-coupled systems in the future (Infiniband). Should I expect an
increase in speedup when I use those?

Best regards,
Åsmund Ervik




- ---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------

./run on a double-real named compute-3-11.local with 16 processors, by
asmunder Thu Jun 12 14:22:08 2014
Using Petsc Release Version 3.4.2, Jul, 02, 2013

                         Max       Max/Min        Avg      Total
Time (sec):           3.246e+03      1.00000   3.246e+03
Objects:              9.800e+02      1.00000   9.800e+02
Flops:                1.667e+12      1.00447   1.663e+12  2.661e+13
Flops/sec:            5.134e+08      1.00447   5.122e+08  8.196e+09
MPI Messages:         6.163e+05      1.33327   5.393e+05  8.629e+06
MPI Message Lengths:  4.626e+10      1.49807   7.151e+04  6.171e+11
MPI Reductions:       4.576e+05      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length
N --> 2N flops
                            and VecAXPY() for complex vectors of
length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  ---
Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.2462e+03 100.0%  2.6605e+13 100.0%  8.629e+06
100.0%  7.151e+04      100.0%  4.576e+05 100.0%

-
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in
this phase
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in
this phase
      %M - percent messages in this phase     %L - percent message
lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
-
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops
           --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
-
------------------------------------------------------------------------------------------------------------------------

- --- Event Stage 0: Main Stage

PetscBarrier           9 1.0 1.3815e+02356592.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
VecDot            150040 1.0 5.0756e+01 2.1 3.93e+10 1.0 0.0e+00
0.0e+00 1.5e+05  1  2  0  0 33   1  2  0  0 33 12399
VecDotNorm2        75020 1.0 3.1857e+01 1.9 3.93e+10 1.0 0.0e+00
0.0e+00 7.5e+04  1  2  0  0 16   1  2  0  0 16 19754
VecNorm            76620 1.0 2.4263e+01 3.0 2.01e+10 1.0 0.0e+00
0.0e+00 7.7e+04  1  1  0  0 17   1  1  0  0 17 13245
VecCopy             1600 1.0 3.7060e-01 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1632 1.0 7.2079e-01 4.2 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              800 1.0 3.7573e-01 1.7 2.10e+08 1.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  8931
VecAXPBYCZ        150040 1.0 3.9948e+01 1.1 7.87e+10 1.0 0.0e+00
0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0 31507
VecWAXPY          150040 1.0 3.6130e+01 1.2 3.93e+10 1.0 0.0e+00
0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 17418
VecAssemblyBegin     801 1.0 8.6365e+00 8.5 0.00e+00 0.0 0.0e+00
0.0e+00 2.4e+03  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd       801 1.0 1.2031e-03 1.6 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin   157298 1.0 2.1938e+01 1.8 0.00e+00 0.0 8.6e+06
7.1e+04 2.7e+01  1  0100100  0   1  0100100  0     0
VecScatterEnd     157271 1.0 1.1521e+03 3.1 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 25  0  0  0  0  25  0  0  0  0     0
MatMult           150840 1.0 1.7150e+03 1.9 7.24e+11 1.0 8.4e+06
7.0e+04 0.0e+00 42 43 98 96  0  42 43 98 96  0  6721
MatSOR            151640 1.0 1.0433e+03 1.4 7.25e+11 1.0 0.0e+00
0.0e+00 0.0e+00 26 44  0  0  0  26 44  0  0  0 11126
MatAssemblyBegin       2 1.0 7.9169e-03 9.9 0.00e+00 0.0 0.0e+00
0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 2.1147e-02 1.2 0.00e+00 0.0 1.1e+02
1.8e+04 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 3.4809e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             800 1.0 2.9176e+03 1.0 1.67e+12 1.0 8.4e+06
7.0e+04 4.6e+05 90100 98 96100  90100 98 96100  9119
PCSetUp                1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply           151640 1.0 1.0436e+03 1.4 7.25e+11 1.0 0.0e+00
0.0e+00 0.0e+00 26 44  0  0  0  26 44  0  0  0 11123
-
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants'
Mem.
Reports information only for process 0.

- --- Event Stage 0: Main Stage

              Vector   859            812    840120768     0
      Vector Scatter    39             27        17388     0
              Matrix     3              0            0     0
   Matrix Null Space     1              0            0     0
    Distributed Mesh     4              0            0     0
     Bipartite Graph     8              0            0     0
           Index Set    55             55      2636500     0
   IS L to G Mapping     7              0            0     0
       Krylov Solver     1              0            0     0
     DMKSP interface     1              0            0     0
      Preconditioner     1              0            0     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000110197
Average time for zero size MPI_Send(): 2.65092e-05
#PETSc Option Table entries:
- -log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Fri Sep 13 14:49:18 2013
Configure options: PETSC_ARCH=double-real --with-precision=double
- --with-scalar-type=real --with-cc=mpicc --with-cxx=mpicxx
- --with-fc=mpif90 --with-mpiexec=mpiexec --with-debugging=0
- --COPTFLAGS="-O2 -fp-model extended" --FOPTFLAGS="-O2 -fltconsistency"
-
--with-blas-lapack-dir=/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
- --with-64-bit-indices=0 --with-clanguage=c++ --with-shared-libraries=0
- --download-ml --download-hypre
- -----------------------------------------
Libraries compiled on Fri Sep 13 14:49:18 2013 on rocks.hpc.ntnu.no
Machine characteristics:
Linux-2.6.18-308.1.1.el5-x86_64-with-redhat-5.6-Tikanga
Using PETSc directory: /share/apps/modulessoftware/petsc/petsc-3.4.2
Using PETSc arch: double-real
- -----------------------------------------

Using C compiler: mpicxx  -wd1572 -O3     ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -O2 -fltconsistency   ${FOPTFLAGS}
${FFLAGS}
- -----------------------------------------
Using include paths:
- -I/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/include
- -I/share/apps/modulessoftware/petsc/petsc-3.4.2/include
- -I/share/apps/modulessoftware/petsc/petsc-3.4.2/include
- -I/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/include
- -I/share/apps/modulessoftware/openmpi/openmpi-1.7.2-intel/include
- -----------------------------------------

Using C linker: mpicxx
Using Fortran linker: mpif90
Using libraries:
- -Wl,-rpath,/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
- -L/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
- -lpetsc
- -Wl,-rpath,/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib
- -L/share/apps/modulessoftware/petsc/petsc-3.4.2/double-real/lib -lml
- -lHYPRE
-
-Wl,-rpath,/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
- -L/share/apps/modulessoftware/intel/compilers/13.0.1/mkl/lib/intel64
- -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lX11
- -lpthread
- -L/share/apps/modulessoftware/openmpi/openmpi-1.7.2-intel/lib
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/compiler/lib/intel64
-
-L/share/apps/modulessoftware/intel/compilers/opt/intel/mic/coi/host-linux-release/lib
- -L/share/apps/modulessoftware/intel/compilers/opt/intel/mic/myo/lib
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/mpirt/lib/intel64
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/ipp/lib/intel64
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/mkl/lib/intel64
-
-L/share/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/tbb/lib/intel64
-
-L/gpfs/shareapps/apps/modulessoftware/intel/compilers/13.0.1/composer_xe_2013.1.117/compiler/lib/intel64
- -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lmpi_usempif08
- -lmpi_usempi_ignore_tkr -lmpi_mpifh -lifport -lifcore -lm -lm
- -lmpi_cxx -ldl -lmpi -limf -lsvml -lirng -lipgo -ldecimal -lcilkrts
- -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -ldl
- -----------------------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTmucvAAoJED+FDAHgGz19vsYH/3E+g74VQFYIAIcf4tN/99WR
c2ofaByZyXbU9e7NiQyn0gbqwBjDtYKOWe8vMRkWx7AdVBgS0z2ChjpZHK5TrtlF
tW2JNztHBB7hgTisd5/2N5toNiCQWxUJu4/8jzbvjoaXrfU+aV3igLTLNbcT/2Rz
KSmPxxc77JYj55vd4v8E8yxA1sfwppMCcyTwzlOSGRO8yiie1fgaDvQFySeoNEL5
ZMBwicNH4YBFYmEI8TH0DP6AjElW9mQOsEM+ktpupxmoFxwG3ciMKxrzpt3ID8Dw
X6gv+F8F73tzsLN09SPkjmz/vPtoS03om9ZnkQYm+qaLQ+n1wz6RcnpG/Bo3y6Q=
=DTtk
-----END PGP SIGNATURE-----


More information about the petsc-users mailing list