[petsc-users] KSPsolve performance tuning

Edoardo alinovi edoardo.alinovi at gmail.com
Thu Sep 13 12:58:14 CDT 2018


Hello PETSc's frieds,

It is a couple of weeks that I am trying to enhance the perforamance of my
code. Actually I am solving NS equation for a 3D problem of 220k cells with
4procs on my laptop (i7-7800k @ 2.3Ghz with dynamic overclocking). I have
installed petsc under linux suse 15 in a virtual machine (I do not know if
this is important or not).

After some profiling, I can  see that the bottle neck is inside KSPSolve
while solving pressure equation (solved with cg + hypre pc). For this
reason my code is running twice time slower than openFOAM and the gap is
only due the solution of pressure. Have you got some hints for me?  At this
point I am sure I am doing something wrong! I have attached the log of a
test simulation.

Thank you very much!

------

Edoardo Alinovi, Ph.D.

DICCA, Scuola Politecnica
Universita' di Genova
1, via Montallegro
16145 Genova, Italy

email: edoardo.alinovi at dicca.unige.it
Tel: +39 010 353 2540
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180913/e4053248/attachment.html>
-------------- next part --------------
Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 5.6276e+00 100.0%  1.4354e+08 100.0%  3.200e+02 100.0%  6.092e+03      100.0%  9.100e+01  91.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSidedF        23 1.0 5.5043e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatMult               30 1.0 3.8755e-02 1.1 1.38e+07 1.0 3.0e+02 6.4e+03 0.0e+00  1 38 94 98  0   1 38 94 98  0  1415
MatSolve              12 1.0 1.8769e-02 1.5 5.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0 15  0  0  0   0 15  0  0  0  1158
MatLUFactorNum         3 1.0 2.0071e-02 1.4 2.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  6  0  0  0   0  6  0  0  0   430
MatILUFactorSym        1 1.0 2.6951e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             2 1.0 7.3381e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  8   0  0  0  0  9     0
MatAssemblyBegin      10 1.0 5.3159e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatAssemblyEnd        10 1.0 1.7582e-02 1.1 0.00e+00 0.0 2.0e+01 1.6e+03 1.6e+01  0  0  6  2 16   0  0  6  2 18     0
MatGetRowIJ            5 1.0 1.8690e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 3.3339e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         8 1.0 3.5194e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecDot                 6 1.0 1.1794e-03 1.7 6.22e+05 1.0 0.0e+00 0.0e+00 6.0e+00  0  2  0  0  6   0  2  0  0  7  2105
VecDotNorm2            3 1.0 3.6940e-04 1.5 6.22e+05 1.0 0.0e+00 0.0e+00 3.0e+00  0  2  0  0  3   0  2  0  0  3  6720
VecTDot               32 1.0 2.6084e-03 1.1 3.32e+06 1.0 0.0e+00 0.0e+00 3.2e+01  0  9  0  0 32   0  9  0  0 35  5075
VecNorm               29 1.0 1.9163e-02 1.4 3.01e+06 1.0 0.0e+00 0.0e+00 2.9e+01  0  8  0  0 29   0  8  0  0 32   626
VecCopy                6 1.0 5.6139e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                48 1.0 9.9189e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               35 1.0 3.0097e-03 1.1 3.63e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0 10  0  0  0   0 10  0  0  0  4811
VecAYPX               16 1.0 1.2960e-03 1.1 1.56e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0  4788
VecAXPBYCZ             6 1.0 7.7555e-04 1.0 1.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  6401
VecWAXPY               6 1.0 6.5693e-04 1.1 6.22e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  3778
VecAssemblyBegin      13 1.0 2.1899e-03 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd        13 1.0 2.2537e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin       30 1.0 7.9062e-04 1.3 0.00e+00 0.0 3.0e+02 6.4e+03 0.0e+00  0  0 94 98  0   0  0 94 98  0     0
VecScatterEnd         30 1.0 1.8332e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               8 1.0 4.3321e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               5 1.0 3.3876e+00 1.0 3.46e+07 1.0 2.7e+02 6.4e+03 7.8e+01 60 96 84 89 78  60 96 84 89 86    41
PCSetUp                8 1.0 1.9194e+00 1.0 2.17e+06 1.0 0.0e+00 0.0e+00 8.0e+00 34  6  0  0  8  34  6  0  0  9     5
PCSetUpOnBlocks        3 1.0 2.3208e-02 1.4 2.17e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  6  0  0  0   0  6  0  0  0   372
PCApply               32 1.0 1.4126e+00 1.0 5.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 25 15  0  0  0  25 15  0  0  0    15
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     6              6     14642324     0.
              Vector    31             23      6261400     0.
       Krylov Solver     5              3         3816     0.
      Preconditioner     5              3         3424     0.
           Index Set     5              5       632560     0.
         Vec Scatter     1              1         1272     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 2.87e-08
Average time for MPI_Barrier(): 9.5102e-06
Average time for zero size MPI_Send(): 1.31203e-05
#PETSc Option Table entries:
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: FOPTFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 --prefix=/home/edo/software/petsc-3.9.3/ --with-debugging=no --with-mpi-dir=/home/edo/software/openMPI-3.1.1/ --download-fblaslapack=1 --download-superlu_dist --download-mumps --download-hypre --download-metis --download-parmetis --download-scalapack
-----------------------------------------
Libraries compiled on 2018-08-22 10:01:51 on linux-sypg 
Machine characteristics: Linux-4.12.14-lp150.11-default-x86_64-with-glibc2.2.5
Using PETSc directory: /home/edo/software/petsc-3.9.3
Using PETSc arch: 
-----------------------------------------

Using C compiler: /home/edo/software/openMPI-3.1.1/bin/mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3  
Using Fortran compiler: /home/edo/software/openMPI-3.1.1/bin/mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3    
-----------------------------------------

Using include paths: -I/home/edo/software/petsc-3.9.3/include -I/home/edo/software/petsc-3.9.3//include -I/home/edo/software/petsc-3.9.3-3.9.3/include -I/home/edo/software/openMPI-3.1.1/include
-----------------------------------------

Using C linker: /home/edo/software/openMPI-3.1.1/bin/mpicc
Using Fortran linker: /home/edo/software/openMPI-3.1.1/bin/mpif90
Using libraries: -Wl,-rpath,/home/edo/software/petsc-3.9.3/lib -L/home/edo/software/petsc-3.9.3/lib -lpetsc -Wl,-rpath,/home/edo/software/petsc-3.9.3/lib -L/home/edo/software/petsc-3.9.3/lib -Wl,-rpath,/home/edo/software/openMPI-3.1.1/lib -L/home/edo/software/openMPI-3.1.1/lib -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/7 -L/usr/lib64/gcc/x86_64-suse-linux/7 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lHYPRE -lflapack -lfblas -lparmetis -lmetis -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl
-----------------------------------------

edo at linux-sypg:~/FLUBIO_INC/exec> 



More information about the petsc-users mailing list