[petsc-users] Slower performance in multi-node system

Luciano Siqueira luciano.siqueira at usp.br
Wed Feb 3 14:40:04 CST 2021


Here are the (attached) output of -log_view for both cases. The 
beginning of the files has some info from the libmesh app.

Running in 1 node, 32 cores: 01_node_log_view.txt

Running in 20 nodes, 32 cores each (640 cores in total): 
01_node_log_view.txt

Thanks!

Luciano.

Em 03/02/2021 16:43, Matthew Knepley escreveu:
> On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira 
> <luciano.siqueira at usp.br <mailto:luciano.siqueira at usp.br>> wrote:
>
>     Hello,
>
>     I'm evaluating the performance of an application in a distributed
>     environment and I notice that it's much slower when running in many
>     nodes/cores when compared to a single node with a fewer cores.
>
>     When running the application in 20 nodes, the Main Stage time
>     reported
>     in PETSc's log is up to 10 times slower than it is when running
>     the same
>     application in only 1 node, even with fewer cores per node.
>
>     The application I'm running is an example code provided by libmesh:
>
>     http://libmesh.github.io/examples/introduction_ex4.html
>
>     The application runs inside a Singularity container, with
>     openmpi-4.0.3
>     and PETSc 3.14.3. The distributed processes are managed by slurm
>     17.02.11 and each node is equipped with two Intel CPU Xeon
>     E5-2695v2 Ivy
>     Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going
>     through
>     infiniband.
>
>     My questions are: Is the slowdown expected? Should the application be
>     specially tailored to work well in distributed environments?
>
>     Also, where (maybe in PETSc documentation/source-code) can I find
>     information on how PETSc handles MPI communications? Do the KSP
>     solvers
>     favor one-to-one process communication over broadcast messages or
>     vice-versa? I suspect inter-process communication must be the
>     cause of
>     the poor performance when using many nodes, but not as much as I'm
>     seeing.
>
>     Thank you in advance!
>
>
> We can't say anything about the performance without some data. Please 
> send us the output
> of -log_view for both cases.
>
>   Thanks,
>
>      Matt
>
>     Luciano.
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/ae76822e/attachment-0001.html>
-------------- next part --------------
Running ./experiment -d 3 -n 31 -mat_type aij -ksp_type gmres -pc_type bjacobi -log_view

 Mesh Information:
  elem_dimensions()={3}
  spatial_dimension()=3
  n_nodes()=250047
    n_local_nodes()=8955
  n_elem()=29791
    n_local_elem()=935
    n_active_elem()=29791
  n_subdomains()=1
  n_partitions()=32
  n_processors()=32
  n_threads()=1
  processor_id()=0

*** Warning, This code is untested, experimental, or likely to see future API changes: ./include/libmesh/mesh_base.h, line 1667, compiled Jan 12 2021 at 12:34:39 ***
 EquationSystems
  n_systems()=1
   System #0, "Poisson"
    Type "LinearImplicit"
    Variables="u" 
    Finite Element Types="LAGRANGE" 
    Approximation Orders="SECOND" 
    n_dofs()=250047
    n_local_dofs()=8955
    n_constrained_dofs()=23066
    n_local_constrained_dofs()=636
    n_vectors()=1
    n_matrices()=1
    DofMap Sparsity
      Average  On-Processor Bandwidth <= 56.5003
      Average Off-Processor Bandwidth <= 7.21882
      Maximum  On-Processor Bandwidth <= 136
      Maximum Off-Processor Bandwidth <= 140
    DofMap Constraints
      Number of DoF Constraints = 23066
      Number of Heterogenous Constraints= 22818
      Average DoF Constraint Length= 0

 Mesh Information:
  elem_dimensions()={3}
  spatial_dimension()=3
  n_nodes()=250047
    n_local_nodes()=8955
  n_elem()=29791
    n_local_elem()=935
    n_active_elem()=29791
  n_subdomains()=1
  n_partitions()=32
  n_processors()=32
  n_threads()=1
  processor_id()=0


 -----------------------------------------------------
| Processor id:   0                                   |
| Num Processors: 32                                  |
| Time:           Wed Feb  3 17:26:38 2021            |
| OS:             Linux                               |
| HostName:       sdumont6197                         |
| OS Release:     3.10.0-957.el7.x86_64               |
| OS Version:     #1 SMP Thu Oct 4 20:48:51 UTC 2018  |
| Machine:        x86_64                              |
| Username:       luciano.siqueira                    |
| Configuration:  ../configure  '--prefix=/usr/local' |
|  '--with-vtk-include=/usr/local/include/vtk-8.2'    |
|  '--with-vtk-lib=/usr/local/lib'                    |
|  '--enable-petsc=yes'                               |
|  '--enable-petsc-required'                          |
|  '--enable-slepc'                                   |
|  '--enable-slepc-required'                          |
|  'METHODS=opt'                                      |
|  'PETSC_DIR=/opt/petsc'                             |
|  'PETSC_ARCH=arch-linux2-c-opt'                     |
|  'SLEPC_DIR=/opt/petsc/arch-linux2-c-opt'           |
 -----------------------------------------------------
 ------------------------------------------------------------------------------------------------------------
| Matrix Assembly Performance: Alive time=0.158664, Active time=0.068175                                     |
 ------------------------------------------------------------------------------------------------------------
| Event                         nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                          w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|------------------------------------------------------------------------------------------------------------|
|                                                                                                            |
| Fe                            935        0.0084      0.000009    0.0084      0.000009    12.35    12.35    |
| Ke                            935        0.0395      0.000042    0.0395      0.000042    57.88    57.88    |
| elem init                     935        0.0203      0.000022    0.0203      0.000022    29.76    29.76    |
 ------------------------------------------------------------------------------------------------------------
| Totals:                       2805       0.0682                                          100.00            |
 ------------------------------------------------------------------------------------------------------------

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./experiment on a arch-linux2-c-opt named sdumont6197 with 32 processors, by luciano.siqueira Wed Feb  3 17:26:39 2021
Using 1 OpenMP threads
Using Petsc Development GIT revision: v3.14.3-435-gd1574ab4cd  GIT Date: 2021-01-11 15:13:43 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           2.792e+00     1.000   2.791e+00
Objects:              6.600e+01     1.000   6.600e+01
Flop:                 5.609e+08     1.478   4.731e+08  1.514e+10
Flop/sec:             2.009e+08     1.478   1.695e+08  5.424e+09
MPI Messages:         3.178e+03     3.446   1.835e+03  5.872e+04
MPI Message Lengths:  1.138e+07     1.910   4.579e+03  2.689e+08
MPI Reductions:       4.340e+02     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 2.7915e+00 100.0%  1.5140e+10 100.0%  5.872e+04 100.0%  4.579e+03      100.0%  4.270e+02  98.4%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          7 1.0 9.5138e-02 6.8 0.00e+00 0.0 8.1e+02 5.3e+00 7.0e+00  1  0  1  0  2   1  0  1  0  2     0
BuildTwoSidedF         5 1.0 8.6365e-02162.3 0.00e+00 0.0 6.6e+02 3.6e+04 5.0e+00  1  0  1  9  1   1  0  1  9  1     0
MatMult              198 1.0 4.5126e-01 1.2 2.27e+08 1.5 5.4e+04 3.1e+03 1.0e+00 14 40 92 63  0  14 40 92 63  0 13438
MatSolve             199 1.0 3.5615e-01 1.5 2.00e+08 1.6 0.0e+00 0.0e+00 0.0e+00 11 36  0  0  0  11 36  0  0  0 15134
MatLUFactorNum         1 1.0 5.2785e-02 1.5 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00  2  4  0  0  0   2  4  0  0  0 10833
MatILUFactorSym        1 1.0 8.4187e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 9.9590e-02 6.6 0.00e+00 0.0 2.7e+02 8.8e+04 2.0e+00  1  0  0  9  0   1  0  0  9  0     0
MatAssemblyEnd         2 1.0 2.3421e-02 1.0 3.94e+04 0.0 0.0e+00 0.0e+00 4.0e+00  1  0  0  0  1   1  0  0  0  1    21
MatGetRowIJ            1 1.0 1.8900e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2824e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         3 1.0 3.9527e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              191 1.0 1.5713e-01 6.0 5.11e+07 1.4 0.0e+00 0.0e+00 1.9e+02  4  9  0  0 44   4  9  0  0 45  9089
VecNorm              199 1.0 2.8923e-02 3.6 3.56e+06 1.4 0.0e+00 0.0e+00 2.0e+02  1  1  0  0 46   1  1  0  0 47  3441
VecScale             198 1.0 9.2115e-04 1.3 1.77e+06 1.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 53747
VecCopy                8 1.0 1.1197e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               211 1.0 2.1250e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               14 1.0 1.2501e-0312.0 2.51e+05 1.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5601
VecMAXPY             198 1.0 3.5915e-02 1.4 5.46e+07 1.4 0.0e+00 0.0e+00 0.0e+00  1 10  0  0  0   1 10  0  0  0 42427
VecAssemblyBegin       3 1.0 6.9581e-04 1.1 0.00e+00 0.0 4.0e+02 9.8e+02 3.0e+00  0  0  1  0  1   0  0  1  0  1     0
VecAssemblyEnd         3 1.0 9.0871e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      199 1.0 3.8270e-02 2.6 0.00e+00 0.0 5.5e+04 3.1e+03 2.0e+00  1  0 93 64  0   1  0 93 64  0     0
VecScatterEnd        199 1.0 1.0005e-0112.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecNormalize         198 1.0 2.8637e-02 3.4 5.32e+06 1.4 0.0e+00 0.0e+00 2.0e+02  1  1  0  0 46   1  1  0  0 46  5187
SFSetGraph             2 1.0 4.9453e-05 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                2 1.0 2.3974e-02 5.2 0.00e+00 0.0 1.1e+03 9.4e+02 2.0e+00  1  0  2  0  0   1  0  2  0  0     0
SFPack               199 1.0 7.6794e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack             199 1.0 1.1253e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               2 1.0 8.8754e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 8.9594e-01 1.0 5.38e+08 1.5 5.4e+04 3.1e+03 3.9e+02 32 96 92 63 90  32 96 92 63 92 16253
KSPGMRESOrthog       191 1.0 1.8209e-01 3.0 1.02e+08 1.4 0.0e+00 0.0e+00 1.9e+02  5 19  0  0 44   5 19  0  0 45 15687
PCSetUp                2 1.0 8.7059e-02 1.3 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00  3  4  0  0  0   3  4  0  0  0  6568
PCSetUpOnBlocks        1 1.0 6.1107e-02 1.5 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00  2  4  0  0  0   2  4  0  0  0  9358
PCApply              199 1.0 3.5986e-01 1.5 2.00e+08 1.6 0.0e+00 0.0e+00 0.0e+00 11 36  0  0  0  11 36  0  0  0 14978
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

    Distributed Mesh     1              1         5048     0.
              Matrix     4              4     14023836     0.
           Index Set     7              7       152940     0.
   IS L to G Mapping     1              1        53380     0.
              Vector    43             43      2874792     0.
   Star Forest Graph     4              4         4576     0.
       Krylov Solver     2              2        20184     0.
      Preconditioner     2              2         1944     0.
     Discrete System     1              1          960     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 1.507e-07
Average time for MPI_Barrier(): 9.6098e-06
Average time for zero size MPI_Send(): 4.25837e-06
#PETSc Option Table entries:
-d 3
-ksp_type gmres
-log_view
-mat_type aij
-n 31
-pc_type bjacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --with-openmp=1 --download-superlu_dist --download-mumps --download-hypre --download-scalapack --download-spai --download-parms --download-slepc --download-openmpi=yes COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS=
-----------------------------------------
Libraries compiled on 2021-01-12 11:28:56 on libmesh-cpu 
Machine characteristics: Linux-5.4.0-60-generic-x86_64-with-debian-10.7
Using PETSc directory: /opt/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /opt/petsc/arch-linux2-c-opt/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -fopenmp   
Using Fortran compiler: /opt/petsc/arch-linux2-c-opt/bin/mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -fopenmp    
-----------------------------------------

Using include paths: -I/opt/petsc/include -I/opt/petsc/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /opt/petsc/arch-linux2-c-opt/bin/mpicc
Using Fortran linker: /opt/petsc/arch-linux2-c-opt/bin/mpif90
Using libraries: -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lparms -lspai -llapack -lblas -lX11 -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
Running ./experiment -d 3 -n 31 -mat_type aij -ksp_type gmres -pc_type bjacobi -log_view

 Mesh Information:
  elem_dimensions()={3}
  spatial_dimension()=3
  n_nodes()=250047
    n_local_nodes()=569
  n_elem()=29791
    n_local_elem()=47
    n_active_elem()=29791
  n_subdomains()=1
  n_partitions()=640
  n_processors()=640
  n_threads()=1
  processor_id()=0

*** Warning, This code is untested, experimental, or likely to see future API changes: ./include/libmesh/mesh_base.h, line 1667, compiled Jan 12 2021 at 12:34:39 ***
 EquationSystems
  n_systems()=1
   System #0, "Poisson"
    Type "LinearImplicit"
    Variables="u" 
    Finite Element Types="LAGRANGE" 
    Approximation Orders="SECOND" 
    n_dofs()=250047
    n_local_dofs()=569
    n_constrained_dofs()=23066
    n_local_constrained_dofs()=149
    n_vectors()=1
    n_matrices()=1
    DofMap Sparsity
      Average  On-Processor Bandwidth <= 44.9841
      Average Off-Processor Bandwidth <= 23.7024
      Maximum  On-Processor Bandwidth <= 145
      Maximum Off-Processor Bandwidth <= 158
    DofMap Constraints
      Number of DoF Constraints = 23066
      Number of Heterogenous Constraints= 22818
      Average DoF Constraint Length= 0

 Mesh Information:
  elem_dimensions()={3}
  spatial_dimension()=3
  n_nodes()=250047
    n_local_nodes()=569
  n_elem()=29791
    n_local_elem()=47
    n_active_elem()=29791
  n_subdomains()=1
  n_partitions()=640
  n_processors()=640
  n_threads()=1
  processor_id()=0


 -----------------------------------------------------
| Processor id:   0                                   |
| Num Processors: 640                                 |
| Time:           Mon Feb  1 18:53:04 2021            |
| OS:             Linux                               |
| HostName:       sdumont6170                         |
| OS Release:     3.10.0-957.el7.x86_64               |
| OS Version:     #1 SMP Thu Oct 4 20:48:51 UTC 2018  |
| Machine:        x86_64                              |
| Username:       luciano.siqueira                    |
| Configuration:  ../configure  '--prefix=/usr/local' |
|  '--with-vtk-include=/usr/local/include/vtk-8.2'    |
|  '--with-vtk-lib=/usr/local/lib'                    |
|  '--enable-petsc=yes'                               |
|  '--enable-petsc-required'                          |
|  '--enable-slepc'                                   |
|  '--enable-slepc-required'                          |
|  'METHODS=opt'                                      |
|  'PETSC_DIR=/opt/petsc'                             |
|  'PETSC_ARCH=arch-linux2-c-opt'                     |
|  'SLEPC_DIR=/opt/petsc/arch-linux2-c-opt'           |
 -----------------------------------------------------
 ------------------------------------------------------------------------------------------------------------
| Matrix Assembly Performance: Alive time=0.056831, Active time=0.006895                                     |
 ------------------------------------------------------------------------------------------------------------
| Event                         nCalls     Total Time  Avg Time    Total Time  Avg Time    % of Active Time  |
|                                          w/o Sub     w/o Sub     With Sub    With Sub    w/o S    With S   |
|------------------------------------------------------------------------------------------------------------|
|                                                                                                            |
| Fe                            47         0.0004      0.000009    0.0004      0.000009    6.42     6.42     |
| Ke                            47         0.0020      0.000042    0.0020      0.000042    28.83    28.83    |
| elem init                     47         0.0045      0.000095    0.0045      0.000095    64.74    64.74    |
 ------------------------------------------------------------------------------------------------------------
| Totals:                       141        0.0069                                          100.00            |
 ------------------------------------------------------------------------------------------------------------

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./experiment on a arch-linux2-c-opt named sdumont6170 with 640 processors, by luciano.siqueira Mon Feb  1 18:53:07 2021
Using 1 OpenMP threads
Using Petsc Development GIT revision: v3.14.3-435-gd1574ab4cd  GIT Date: 2021-01-11 15:13:43 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           1.968e+02     1.000   1.968e+02
Objects:              6.600e+01     1.000   6.600e+01
Flop:                 4.131e+07     4.553   2.385e+07  1.526e+10
Flop/sec:             2.099e+05     4.553   1.212e+05  7.756e+07
MPI Messages:         8.425e+03     2.949   5.414e+03  3.465e+06
MPI Message Lengths:  5.026e+06     1.669   7.080e+02  2.453e+09
MPI Reductions:       4.890e+02     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 1.9678e+02 100.0%  1.5262e+10 100.0%  3.465e+06 100.0%  7.080e+02      100.0%  4.820e+02  98.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          7 1.0 2.8366e-01 5.9 0.00e+00 0.0 2.7e+04 5.2e+00 7.0e+00  0  0  1  0  1   0  0  1  0  1     0
BuildTwoSidedF         5 1.0 2.5666e-01 9.5 0.00e+00 0.0 2.0e+04 4.0e+03 5.0e+00  0  0  1  3  1   0  0  1  3  1     0
MatMult              226 1.0 6.4752e-01 3.1 1.87e+07 4.2 2.2e+06 3.9e+02 1.0e+00  0 45 63 35  0   0 45 63 35  0 10689
MatSolve             227 1.0 2.6350e-02 6.7 1.28e+07 7.0 0.0e+00 0.0e+00 0.0e+00  0 29  0  0  0   0 29  0  0  0 168471
MatLUFactorNum         1 1.0 3.0115e-0310.8 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 106202
MatILUFactorSym        1 1.0 1.7141e-0219.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 2.3641e-0120.6 0.00e+00 0.0 8.1e+03 9.7e+03 2.0e+00  0  0  0  3  0   0  0  0  3  0     0
MatAssemblyEnd         2 1.0 4.7616e-01 1.8 8.30e+03 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  1   0  0  0  0  1     4
MatGetRowIJ            1 1.0 2.4430e-06 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 4.3698e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         3 1.0 2.4557e-04 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              218 1.0 1.2456e+00 1.7 3.98e+06 3.2 0.0e+00 0.0e+00 2.2e+02  0 11  0  0 45   0 11  0  0 45  1320
VecNorm              227 1.0 1.3911e+00 1.6 2.75e+05 3.2 0.0e+00 0.0e+00 2.3e+02  1  1  0  0 46   1  1  0  0 47    82
VecScale             226 1.0 7.1863e-04 2.2 1.37e+05 3.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 78637
VecCopy                9 1.0 2.7855e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               240 1.0 3.9133e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               16 1.0 6.4706e-03293.5 1.94e+04 3.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1237
VecMAXPY             226 1.0 3.9906e-03 2.6 4.25e+06 3.2 0.0e+00 0.0e+00 0.0e+00  0 11  0  0  0   0 11  0  0  0 439741
VecAssemblyBegin       3 1.0 2.2735e-02 1.4 0.00e+00 0.0 1.2e+04 1.2e+02 3.0e+00  0  0  0  0  1   0  0  0  0  1     0
VecAssemblyEnd         3 1.0 2.9396e-03277.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      227 1.0 6.0738e-02 1.8 0.00e+00 0.0 2.2e+06 3.9e+02 2.0e+00  0  0 64 35  0   0  0 64 35  0     0
VecScatterEnd        227 1.0 5.8930e-01 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize         226 1.0 1.3851e+00 1.6 4.10e+05 3.2 0.0e+00 0.0e+00 2.3e+02  1  1  0  0 46   1  1  0  0 47   122
SFSetGraph             2 1.0 2.1940e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                2 1.0 4.2313e-02 1.7 0.00e+00 0.0 3.8e+04 1.1e+02 2.0e+00  0  0  1  0  0   0  0  1  0  0     0
SFPack               227 1.0 1.7886e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack             227 1.0 2.3074e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               2 1.0 6.7246e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.5118e+00 1.0 4.02e+07 4.5 2.2e+06 3.9e+02 4.5e+02  1 98 63 35 91   1 98 63 35 93  5947
KSPGMRESOrthog       218 1.0 1.2489e+00 1.7 7.96e+06 3.2 0.0e+00 0.0e+00 2.2e+02  0 22  0  0 45   0 22  0  0 45  2634
PCSetUp                2 1.0 4.3814e-02 1.6 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  7300
PCSetUpOnBlocks        1 1.0 1.8862e-02 8.3 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 16956
PCApply              227 1.0 2.9083e-02 5.1 1.28e+07 7.0 0.0e+00 0.0e+00 0.0e+00  0 29  0  0  0   0 29  0  0  0 152639
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

    Distributed Mesh     1              1         5048     0.
              Matrix     4              4       842592     0.
           Index Set     7              7        18132     0.
   IS L to G Mapping     1              1         4972     0.
              Vector    43             43       257096     0.
   Star Forest Graph     4              4         4576     0.
       Krylov Solver     2              2        20184     0.
      Preconditioner     2              2         1944     0.
     Discrete System     1              1          960     0.
              Viewer     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 3.933e-07
Average time for MPI_Barrier(): 0.00498015
Average time for zero size MPI_Send(): 0.000194207
#PETSc Option Table entries:
-d 3
-ksp_type gmres
-log_view
-mat_type aij
-n 31
-pc_type bjacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --with-openmp=1 --download-superlu_dist --download-mumps --download-hypre --download-scalapack --download-spai --download-parms --download-slepc --download-openmpi=yes COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS=
-----------------------------------------
Libraries compiled on 2021-01-12 11:28:56 on libmesh-cpu 
Machine characteristics: Linux-5.4.0-60-generic-x86_64-with-debian-10.7
Using PETSc directory: /opt/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /opt/petsc/arch-linux2-c-opt/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -fopenmp   
Using Fortran compiler: /opt/petsc/arch-linux2-c-opt/bin/mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -fopenmp    
-----------------------------------------

Using include paths: -I/opt/petsc/include -I/opt/petsc/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /opt/petsc/arch-linux2-c-opt/bin/mpicc
Using Fortran linker: /opt/petsc/arch-linux2-c-opt/bin/mpif90
Using libraries: -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lparms -lspai -llapack -lblas -lX11 -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl
-----------------------------------------


More information about the petsc-users mailing list