[petsc-users] Slower performance in multi-node system
Luciano Siqueira
luciano.siqueira at usp.br
Wed Feb 3 14:40:04 CST 2021
Here are the (attached) output of -log_view for both cases. The
beginning of the files has some info from the libmesh app.
Running in 1 node, 32 cores: 01_node_log_view.txt
Running in 20 nodes, 32 cores each (640 cores in total):
01_node_log_view.txt
Thanks!
Luciano.
Em 03/02/2021 16:43, Matthew Knepley escreveu:
> On Wed, Feb 3, 2021 at 2:42 PM Luciano Siqueira
> <luciano.siqueira at usp.br <mailto:luciano.siqueira at usp.br>> wrote:
>
> Hello,
>
> I'm evaluating the performance of an application in a distributed
> environment and I notice that it's much slower when running in many
> nodes/cores when compared to a single node with a fewer cores.
>
> When running the application in 20 nodes, the Main Stage time
> reported
> in PETSc's log is up to 10 times slower than it is when running
> the same
> application in only 1 node, even with fewer cores per node.
>
> The application I'm running is an example code provided by libmesh:
>
> http://libmesh.github.io/examples/introduction_ex4.html
>
> The application runs inside a Singularity container, with
> openmpi-4.0.3
> and PETSc 3.14.3. The distributed processes are managed by slurm
> 17.02.11 and each node is equipped with two Intel CPU Xeon
> E5-2695v2 Ivy
> Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going
> through
> infiniband.
>
> My questions are: Is the slowdown expected? Should the application be
> specially tailored to work well in distributed environments?
>
> Also, where (maybe in PETSc documentation/source-code) can I find
> information on how PETSc handles MPI communications? Do the KSP
> solvers
> favor one-to-one process communication over broadcast messages or
> vice-versa? I suspect inter-process communication must be the
> cause of
> the poor performance when using many nodes, but not as much as I'm
> seeing.
>
> Thank you in advance!
>
>
> We can't say anything about the performance without some data. Please
> send us the output
> of -log_view for both cases.
>
> Thanks,
>
> Matt
>
> Luciano.
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210203/ae76822e/attachment-0001.html>
-------------- next part --------------
Running ./experiment -d 3 -n 31 -mat_type aij -ksp_type gmres -pc_type bjacobi -log_view
Mesh Information:
elem_dimensions()={3}
spatial_dimension()=3
n_nodes()=250047
n_local_nodes()=8955
n_elem()=29791
n_local_elem()=935
n_active_elem()=29791
n_subdomains()=1
n_partitions()=32
n_processors()=32
n_threads()=1
processor_id()=0
*** Warning, This code is untested, experimental, or likely to see future API changes: ./include/libmesh/mesh_base.h, line 1667, compiled Jan 12 2021 at 12:34:39 ***
EquationSystems
n_systems()=1
System #0, "Poisson"
Type "LinearImplicit"
Variables="u"
Finite Element Types="LAGRANGE"
Approximation Orders="SECOND"
n_dofs()=250047
n_local_dofs()=8955
n_constrained_dofs()=23066
n_local_constrained_dofs()=636
n_vectors()=1
n_matrices()=1
DofMap Sparsity
Average On-Processor Bandwidth <= 56.5003
Average Off-Processor Bandwidth <= 7.21882
Maximum On-Processor Bandwidth <= 136
Maximum Off-Processor Bandwidth <= 140
DofMap Constraints
Number of DoF Constraints = 23066
Number of Heterogenous Constraints= 22818
Average DoF Constraint Length= 0
Mesh Information:
elem_dimensions()={3}
spatial_dimension()=3
n_nodes()=250047
n_local_nodes()=8955
n_elem()=29791
n_local_elem()=935
n_active_elem()=29791
n_subdomains()=1
n_partitions()=32
n_processors()=32
n_threads()=1
processor_id()=0
-----------------------------------------------------
| Processor id: 0 |
| Num Processors: 32 |
| Time: Wed Feb 3 17:26:38 2021 |
| OS: Linux |
| HostName: sdumont6197 |
| OS Release: 3.10.0-957.el7.x86_64 |
| OS Version: #1 SMP Thu Oct 4 20:48:51 UTC 2018 |
| Machine: x86_64 |
| Username: luciano.siqueira |
| Configuration: ../configure '--prefix=/usr/local' |
| '--with-vtk-include=/usr/local/include/vtk-8.2' |
| '--with-vtk-lib=/usr/local/lib' |
| '--enable-petsc=yes' |
| '--enable-petsc-required' |
| '--enable-slepc' |
| '--enable-slepc-required' |
| 'METHODS=opt' |
| 'PETSC_DIR=/opt/petsc' |
| 'PETSC_ARCH=arch-linux2-c-opt' |
| 'SLEPC_DIR=/opt/petsc/arch-linux2-c-opt' |
-----------------------------------------------------
------------------------------------------------------------------------------------------------------------
| Matrix Assembly Performance: Alive time=0.158664, Active time=0.068175 |
------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|------------------------------------------------------------------------------------------------------------|
| |
| Fe 935 0.0084 0.000009 0.0084 0.000009 12.35 12.35 |
| Ke 935 0.0395 0.000042 0.0395 0.000042 57.88 57.88 |
| elem init 935 0.0203 0.000022 0.0203 0.000022 29.76 29.76 |
------------------------------------------------------------------------------------------------------------
| Totals: 2805 0.0682 100.00 |
------------------------------------------------------------------------------------------------------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./experiment on a arch-linux2-c-opt named sdumont6197 with 32 processors, by luciano.siqueira Wed Feb 3 17:26:39 2021
Using 1 OpenMP threads
Using Petsc Development GIT revision: v3.14.3-435-gd1574ab4cd GIT Date: 2021-01-11 15:13:43 +0000
Max Max/Min Avg Total
Time (sec): 2.792e+00 1.000 2.791e+00
Objects: 6.600e+01 1.000 6.600e+01
Flop: 5.609e+08 1.478 4.731e+08 1.514e+10
Flop/sec: 2.009e+08 1.478 1.695e+08 5.424e+09
MPI Messages: 3.178e+03 3.446 1.835e+03 5.872e+04
MPI Message Lengths: 1.138e+07 1.910 4.579e+03 2.689e+08
MPI Reductions: 4.340e+02 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 2.7915e+00 100.0% 1.5140e+10 100.0% 5.872e+04 100.0% 4.579e+03 100.0% 4.270e+02 98.4%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 7 1.0 9.5138e-02 6.8 0.00e+00 0.0 8.1e+02 5.3e+00 7.0e+00 1 0 1 0 2 1 0 1 0 2 0
BuildTwoSidedF 5 1.0 8.6365e-02162.3 0.00e+00 0.0 6.6e+02 3.6e+04 5.0e+00 1 0 1 9 1 1 0 1 9 1 0
MatMult 198 1.0 4.5126e-01 1.2 2.27e+08 1.5 5.4e+04 3.1e+03 1.0e+00 14 40 92 63 0 14 40 92 63 0 13438
MatSolve 199 1.0 3.5615e-01 1.5 2.00e+08 1.6 0.0e+00 0.0e+00 0.0e+00 11 36 0 0 0 11 36 0 0 0 15134
MatLUFactorNum 1 1.0 5.2785e-02 1.5 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00 2 4 0 0 0 2 4 0 0 0 10833
MatILUFactorSym 1 1.0 8.4187e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 9.9590e-02 6.6 0.00e+00 0.0 2.7e+02 8.8e+04 2.0e+00 1 0 0 9 0 1 0 0 9 0 0
MatAssemblyEnd 2 1.0 2.3421e-02 1.0 3.94e+04 0.0 0.0e+00 0.0e+00 4.0e+00 1 0 0 0 1 1 0 0 0 1 21
MatGetRowIJ 1 1.0 1.8900e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.2824e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 3 1.0 3.9527e-03 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 191 1.0 1.5713e-01 6.0 5.11e+07 1.4 0.0e+00 0.0e+00 1.9e+02 4 9 0 0 44 4 9 0 0 45 9089
VecNorm 199 1.0 2.8923e-02 3.6 3.56e+06 1.4 0.0e+00 0.0e+00 2.0e+02 1 1 0 0 46 1 1 0 0 47 3441
VecScale 198 1.0 9.2115e-04 1.3 1.77e+06 1.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 53747
VecCopy 8 1.0 1.1197e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 211 1.0 2.1250e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 14 1.0 1.2501e-0312.0 2.51e+05 1.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5601
VecMAXPY 198 1.0 3.5915e-02 1.4 5.46e+07 1.4 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 10 0 0 0 42427
VecAssemblyBegin 3 1.0 6.9581e-04 1.1 0.00e+00 0.0 4.0e+02 9.8e+02 3.0e+00 0 0 1 0 1 0 0 1 0 1 0
VecAssemblyEnd 3 1.0 9.0871e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 199 1.0 3.8270e-02 2.6 0.00e+00 0.0 5.5e+04 3.1e+03 2.0e+00 1 0 93 64 0 1 0 93 64 0 0
VecScatterEnd 199 1.0 1.0005e-0112.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecNormalize 198 1.0 2.8637e-02 3.4 5.32e+06 1.4 0.0e+00 0.0e+00 2.0e+02 1 1 0 0 46 1 1 0 0 46 5187
SFSetGraph 2 1.0 4.9453e-05 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 2 1.0 2.3974e-02 5.2 0.00e+00 0.0 1.1e+03 9.4e+02 2.0e+00 1 0 2 0 0 1 0 2 0 0 0
SFPack 199 1.0 7.6794e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFUnpack 199 1.0 1.1253e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 2 1.0 8.8754e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 8.9594e-01 1.0 5.38e+08 1.5 5.4e+04 3.1e+03 3.9e+02 32 96 92 63 90 32 96 92 63 92 16253
KSPGMRESOrthog 191 1.0 1.8209e-01 3.0 1.02e+08 1.4 0.0e+00 0.0e+00 1.9e+02 5 19 0 0 44 5 19 0 0 45 15687
PCSetUp 2 1.0 8.7059e-02 1.3 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00 3 4 0 0 0 3 4 0 0 0 6568
PCSetUpOnBlocks 1 1.0 6.1107e-02 1.5 2.24e+07 1.5 0.0e+00 0.0e+00 0.0e+00 2 4 0 0 0 2 4 0 0 0 9358
PCApply 199 1.0 3.5986e-01 1.5 2.00e+08 1.6 0.0e+00 0.0e+00 0.0e+00 11 36 0 0 0 11 36 0 0 0 14978
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Distributed Mesh 1 1 5048 0.
Matrix 4 4 14023836 0.
Index Set 7 7 152940 0.
IS L to G Mapping 1 1 53380 0.
Vector 43 43 2874792 0.
Star Forest Graph 4 4 4576 0.
Krylov Solver 2 2 20184 0.
Preconditioner 2 2 1944 0.
Discrete System 1 1 960 0.
Viewer 1 0 0 0.
========================================================================================================================
Average time to get PetscTime(): 1.507e-07
Average time for MPI_Barrier(): 9.6098e-06
Average time for zero size MPI_Send(): 4.25837e-06
#PETSc Option Table entries:
-d 3
-ksp_type gmres
-log_view
-mat_type aij
-n 31
-pc_type bjacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --with-openmp=1 --download-superlu_dist --download-mumps --download-hypre --download-scalapack --download-spai --download-parms --download-slepc --download-openmpi=yes COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS=
-----------------------------------------
Libraries compiled on 2021-01-12 11:28:56 on libmesh-cpu
Machine characteristics: Linux-5.4.0-60-generic-x86_64-with-debian-10.7
Using PETSc directory: /opt/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /opt/petsc/arch-linux2-c-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -fopenmp
Using Fortran compiler: /opt/petsc/arch-linux2-c-opt/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -fopenmp
-----------------------------------------
Using include paths: -I/opt/petsc/include -I/opt/petsc/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: /opt/petsc/arch-linux2-c-opt/bin/mpicc
Using Fortran linker: /opt/petsc/arch-linux2-c-opt/bin/mpif90
Using libraries: -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lparms -lspai -llapack -lblas -lX11 -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
Running ./experiment -d 3 -n 31 -mat_type aij -ksp_type gmres -pc_type bjacobi -log_view
Mesh Information:
elem_dimensions()={3}
spatial_dimension()=3
n_nodes()=250047
n_local_nodes()=569
n_elem()=29791
n_local_elem()=47
n_active_elem()=29791
n_subdomains()=1
n_partitions()=640
n_processors()=640
n_threads()=1
processor_id()=0
*** Warning, This code is untested, experimental, or likely to see future API changes: ./include/libmesh/mesh_base.h, line 1667, compiled Jan 12 2021 at 12:34:39 ***
EquationSystems
n_systems()=1
System #0, "Poisson"
Type "LinearImplicit"
Variables="u"
Finite Element Types="LAGRANGE"
Approximation Orders="SECOND"
n_dofs()=250047
n_local_dofs()=569
n_constrained_dofs()=23066
n_local_constrained_dofs()=149
n_vectors()=1
n_matrices()=1
DofMap Sparsity
Average On-Processor Bandwidth <= 44.9841
Average Off-Processor Bandwidth <= 23.7024
Maximum On-Processor Bandwidth <= 145
Maximum Off-Processor Bandwidth <= 158
DofMap Constraints
Number of DoF Constraints = 23066
Number of Heterogenous Constraints= 22818
Average DoF Constraint Length= 0
Mesh Information:
elem_dimensions()={3}
spatial_dimension()=3
n_nodes()=250047
n_local_nodes()=569
n_elem()=29791
n_local_elem()=47
n_active_elem()=29791
n_subdomains()=1
n_partitions()=640
n_processors()=640
n_threads()=1
processor_id()=0
-----------------------------------------------------
| Processor id: 0 |
| Num Processors: 640 |
| Time: Mon Feb 1 18:53:04 2021 |
| OS: Linux |
| HostName: sdumont6170 |
| OS Release: 3.10.0-957.el7.x86_64 |
| OS Version: #1 SMP Thu Oct 4 20:48:51 UTC 2018 |
| Machine: x86_64 |
| Username: luciano.siqueira |
| Configuration: ../configure '--prefix=/usr/local' |
| '--with-vtk-include=/usr/local/include/vtk-8.2' |
| '--with-vtk-lib=/usr/local/lib' |
| '--enable-petsc=yes' |
| '--enable-petsc-required' |
| '--enable-slepc' |
| '--enable-slepc-required' |
| 'METHODS=opt' |
| 'PETSC_DIR=/opt/petsc' |
| 'PETSC_ARCH=arch-linux2-c-opt' |
| 'SLEPC_DIR=/opt/petsc/arch-linux2-c-opt' |
-----------------------------------------------------
------------------------------------------------------------------------------------------------------------
| Matrix Assembly Performance: Alive time=0.056831, Active time=0.006895 |
------------------------------------------------------------------------------------------------------------
| Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time |
| w/o Sub w/o Sub With Sub With Sub w/o S With S |
|------------------------------------------------------------------------------------------------------------|
| |
| Fe 47 0.0004 0.000009 0.0004 0.000009 6.42 6.42 |
| Ke 47 0.0020 0.000042 0.0020 0.000042 28.83 28.83 |
| elem init 47 0.0045 0.000095 0.0045 0.000095 64.74 64.74 |
------------------------------------------------------------------------------------------------------------
| Totals: 141 0.0069 100.00 |
------------------------------------------------------------------------------------------------------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./experiment on a arch-linux2-c-opt named sdumont6170 with 640 processors, by luciano.siqueira Mon Feb 1 18:53:07 2021
Using 1 OpenMP threads
Using Petsc Development GIT revision: v3.14.3-435-gd1574ab4cd GIT Date: 2021-01-11 15:13:43 +0000
Max Max/Min Avg Total
Time (sec): 1.968e+02 1.000 1.968e+02
Objects: 6.600e+01 1.000 6.600e+01
Flop: 4.131e+07 4.553 2.385e+07 1.526e+10
Flop/sec: 2.099e+05 4.553 1.212e+05 7.756e+07
MPI Messages: 8.425e+03 2.949 5.414e+03 3.465e+06
MPI Message Lengths: 5.026e+06 1.669 7.080e+02 2.453e+09
MPI Reductions: 4.890e+02 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 1.9678e+02 100.0% 1.5262e+10 100.0% 3.465e+06 100.0% 7.080e+02 100.0% 4.820e+02 98.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 7 1.0 2.8366e-01 5.9 0.00e+00 0.0 2.7e+04 5.2e+00 7.0e+00 0 0 1 0 1 0 0 1 0 1 0
BuildTwoSidedF 5 1.0 2.5666e-01 9.5 0.00e+00 0.0 2.0e+04 4.0e+03 5.0e+00 0 0 1 3 1 0 0 1 3 1 0
MatMult 226 1.0 6.4752e-01 3.1 1.87e+07 4.2 2.2e+06 3.9e+02 1.0e+00 0 45 63 35 0 0 45 63 35 0 10689
MatSolve 227 1.0 2.6350e-02 6.7 1.28e+07 7.0 0.0e+00 0.0e+00 0.0e+00 0 29 0 0 0 0 29 0 0 0 168471
MatLUFactorNum 1 1.0 3.0115e-0310.8 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 106202
MatILUFactorSym 1 1.0 1.7141e-0219.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 2.3641e-0120.6 0.00e+00 0.0 8.1e+03 9.7e+03 2.0e+00 0 0 0 3 0 0 0 0 3 0 0
MatAssemblyEnd 2 1.0 4.7616e-01 1.8 8.30e+03 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 1 0 0 0 0 1 4
MatGetRowIJ 1 1.0 2.4430e-06 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 4.3698e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 3 1.0 2.4557e-04 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 218 1.0 1.2456e+00 1.7 3.98e+06 3.2 0.0e+00 0.0e+00 2.2e+02 0 11 0 0 45 0 11 0 0 45 1320
VecNorm 227 1.0 1.3911e+00 1.6 2.75e+05 3.2 0.0e+00 0.0e+00 2.3e+02 1 1 0 0 46 1 1 0 0 47 82
VecScale 226 1.0 7.1863e-04 2.2 1.37e+05 3.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 78637
VecCopy 9 1.0 2.7855e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 240 1.0 3.9133e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 16 1.0 6.4706e-03293.5 1.94e+04 3.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1237
VecMAXPY 226 1.0 3.9906e-03 2.6 4.25e+06 3.2 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 439741
VecAssemblyBegin 3 1.0 2.2735e-02 1.4 0.00e+00 0.0 1.2e+04 1.2e+02 3.0e+00 0 0 0 0 1 0 0 0 0 1 0
VecAssemblyEnd 3 1.0 2.9396e-03277.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 227 1.0 6.0738e-02 1.8 0.00e+00 0.0 2.2e+06 3.9e+02 2.0e+00 0 0 64 35 0 0 0 64 35 0 0
VecScatterEnd 227 1.0 5.8930e-01 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 226 1.0 1.3851e+00 1.6 4.10e+05 3.2 0.0e+00 0.0e+00 2.3e+02 1 1 0 0 46 1 1 0 0 47 122
SFSetGraph 2 1.0 2.1940e-05 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 2 1.0 4.2313e-02 1.7 0.00e+00 0.0 3.8e+04 1.1e+02 2.0e+00 0 0 1 0 0 0 0 1 0 0 0
SFPack 227 1.0 1.7886e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFUnpack 227 1.0 2.3074e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 2 1.0 6.7246e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.5118e+00 1.0 4.02e+07 4.5 2.2e+06 3.9e+02 4.5e+02 1 98 63 35 91 1 98 63 35 93 5947
KSPGMRESOrthog 218 1.0 1.2489e+00 1.7 7.96e+06 3.2 0.0e+00 0.0e+00 2.2e+02 0 22 0 0 45 0 22 0 0 45 2634
PCSetUp 2 1.0 4.3814e-02 1.6 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 7300
PCSetUpOnBlocks 1 1.0 1.8862e-02 8.3 1.18e+0613.9 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 16956
PCApply 227 1.0 2.9083e-02 5.1 1.28e+07 7.0 0.0e+00 0.0e+00 0.0e+00 0 29 0 0 0 0 29 0 0 0 152639
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Distributed Mesh 1 1 5048 0.
Matrix 4 4 842592 0.
Index Set 7 7 18132 0.
IS L to G Mapping 1 1 4972 0.
Vector 43 43 257096 0.
Star Forest Graph 4 4 4576 0.
Krylov Solver 2 2 20184 0.
Preconditioner 2 2 1944 0.
Discrete System 1 1 960 0.
Viewer 1 0 0 0.
========================================================================================================================
Average time to get PetscTime(): 3.933e-07
Average time for MPI_Barrier(): 0.00498015
Average time for zero size MPI_Send(): 0.000194207
#PETSc Option Table entries:
-d 3
-ksp_type gmres
-log_view
-mat_type aij
-n 31
-pc_type bjacobi
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --with-openmp=1 --download-superlu_dist --download-mumps --download-hypre --download-scalapack --download-spai --download-parms --download-slepc --download-openmpi=yes COPTFLAGS= CXXOPTFLAGS= FOPTFLAGS=
-----------------------------------------
Libraries compiled on 2021-01-12 11:28:56 on libmesh-cpu
Machine characteristics: Linux-5.4.0-60-generic-x86_64-with-debian-10.7
Using PETSc directory: /opt/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /opt/petsc/arch-linux2-c-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -fopenmp
Using Fortran compiler: /opt/petsc/arch-linux2-c-opt/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -fopenmp
-----------------------------------------
Using include paths: -I/opt/petsc/include -I/opt/petsc/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: /opt/petsc/arch-linux2-c-opt/bin/mpicc
Using Fortran linker: /opt/petsc/arch-linux2-c-opt/bin/mpif90
Using libraries: -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/opt/petsc/arch-linux2-c-opt/lib -L/opt/petsc/arch-linux2-c-opt/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/8 -L/usr/lib/gcc/x86_64-linux-gnu/8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lparms -lspai -llapack -lblas -lX11 -lm -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lpthread -lstdc++ -ldl
-----------------------------------------
More information about the petsc-users
mailing list