[petsc-dev] Kokkos/Crusher perforance
Mark Adams
mfadams at lbl.gov
Fri Jan 21 20:46:49 CST 2022
>
>
> But in particular look at the VecTDot and VecNorm CPU flop
> rates compared to the GPU, much lower, this tells me the MPI_Allreduce is
> likely hurting performance in there also a great deal. It would be good to
> see a single MPI rank job to compare to see performance without the MPI
> overhead.
>
Here are two single processor runs, with a whole GPU. It's not clear
of --ntasks-per-gpu=1 refers to the GPU socket (4 of them) or the GPUs (8).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220121/8bc396fe/attachment-0001.html>
-------------- next part --------------
DM Object: box 1 MPI processes
type: plex
box in 3 dimensions:
Number of 0-cells per rank: 35937
Number of 1-cells per rank: 104544
Number of 2-cells per rank: 101376
Number of 3-cells per rank: 32768
Labels:
celltype: 4 strata with value/size (0 (35937), 1 (104544), 4 (101376), 7 (32768))
depth: 4 strata with value/size (0 (35937), 1 (104544), 2 (101376), 3 (32768))
marker: 1 strata with value/size (1 (24480))
Face Sets: 6 strata with value/size (6 (3600), 5 (3600), 3 (3600), 4 (3600), 1 (3600), 2 (3600))
Linear solve converged due to CONVERGED_RTOL iterations 122
KSP Object: 1 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaijkokkos
rows=250047, cols=250047
total: nonzeros=15069223, allocated nonzeros=15069223
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Linear solve converged due to CONVERGED_RTOL iterations 122
KSP Object: 1 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaijkokkos
rows=250047, cols=250047
total: nonzeros=15069223, allocated nonzeros=15069223
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Linear solve converged due to CONVERGED_RTOL iterations 122
KSP Object: 1 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaijkokkos
rows=250047, cols=250047
total: nonzeros=15069223, allocated nonzeros=15069223
total number of mallocs used during MatSetValues calls=0
not using I-node routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher003 with 1 processor, by adams Fri Jan 21 21:30:02 2022
Using Petsc Development GIT revision: v3.16.3-665-g1012189b9a GIT Date: 2022-01-21 16:28:20 +0000
Max Max/Min Avg Total
Time (sec): 5.916e+01 1.000 5.916e+01
Objects: 1.637e+03 1.000 1.637e+03
Flop: 1.454e+10 1.000 1.454e+10 1.454e+10
Flop/sec: 2.459e+08 1.000 2.459e+08 2.459e+08
MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00
MPI Message Lengths: 1.800e+01 1.000 0.000e+00 1.800e+01
MPI Reductions: 9.000e+00 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 5.8503e+01 98.9% 6.3978e+09 44.0% 0.000e+00 0.0% 0.000e+00 100.0% 9.000e+00 100.0%
1: PCSetUp: 2.0318e-02 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
2: KSP Solve only: 6.3347e-01 1.1% 8.1469e+09 56.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
CpuToGpu Count: total number of CPU to GPU copies per processor
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
GpuToCpu Count: total number of GPU to CPU copies per processor
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 3 1.0 2.1114e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSided 3 1.0 2.3745e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSidedF 1 1.0 1.8245e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatMult 23195 1.0 5.5017e-02 1.0 3.68e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 25 0 0 0 0 57 0 0 0 66844 0 0 0.00e+00 0 0.00e+00 100
MatAssemblyBegin 43 1.0 4.3796e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyEnd 43 1.0 2.8367e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatZeroEntries 3 1.0 3.5872e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatView 1 1.0 4.7812e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSetUp 1 1.0 9.1753e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 1 1.0 3.5479e-01 1.0 4.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 28 0 0 0 1 64 0 0 0 11481 121319 0 0.00e+00 0 0.00e+00 100
SNESSolve 1 1.0 2.5371e+01 1.0 5.26e+09 1.0 0.0e+00 0.0e+00 0.0e+00 43 36 0 0 0 43 82 0 0 0 207 117727 1 2.00e+00 2 4.00e+00 77
SNESSetUp 1 1.0 8.4125e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SNESFunctionEval 2 1.0 3.5801e+00 1.0 8.04e+08 1.0 0.0e+00 0.0e+00 0.0e+00 6 6 0 0 0 6 13 0 0 0 225 468 2 4.00e+00 2 4.00e+00 0
SNESJacobianEval 2 1.0 4.5842e+01 1.0 1.52e+09 1.0 0.0e+00 0.0e+00 0.0e+00 77 10 0 0 0 78 24 0 0 0 33 0 0 0.00e+00 2 4.00e+00 0
DMCreateInterp 1 1.0 1.2704e-02 1.0 8.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 7 0 0 0.00e+00 0 0.00e+00 0
DMCreateMat 1 1.0 8.4118e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterp 19 1.0 6.4033e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexStratify 30 1.0 6.8263e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexSymmetrize 30 1.0 1.5020e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPrealloc 1 1.0 8.4045e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexResidualFE 2 1.0 3.0371e+00 1.0 7.87e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 5 0 0 0 5 12 0 0 0 259 0 0 0.00e+00 0 0.00e+00 0
DMPlexJacobianFE 2 1.0 4.5560e+01 1.0 1.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 77 10 0 0 0 78 23 0 0 0 33 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterpFE 1 1.0 1.2681e-02 1.0 8.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 7 0 0 0.00e+00 0 0.00e+00 0
SFSetGraph 3 1.0 4.9756e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetUp 2 1.0 1.0113e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFBcastBegin 5 1.0 2.6223e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 4 8.00e+00 0
SFBcastEnd 5 1.0 8.5570e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFReduceBegin 2 1.0 2.3398e-01 1.0 5.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 0 2 4.00e+00 0 0.00e+00 100
SFReduceEnd 2 1.0 4.4490e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFPack 13 1.0 3.8856e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFUnpack 13 1.0 2.3328e-01 1.0 5.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 0 0 0.00e+00 0 0.00e+00 100
VecTDot 244 1.0 9.2238e-02 1.0 1.22e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 2 0 0 0 1323 12541 0 0.00e+00 0 0.00e+00 100
VecNorm 123 1.0 4.4787e-02 1.0 6.15e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 1373 9907 0 0.00e+00 0 0.00e+00 100
VecCopy 2 1.0 1.8176e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 58 1.0 8.4117e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 245 1.0 6.9616e-02 1.0 1.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 2 0 0 0 1760 15949 0 0.00e+00 0 0.00e+00 100
VecAYPX 121 1.0 2.8920e-02 1.0 6.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 2092 21415 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 122 1.0 5.1669e-02 1.0 3.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 590 7979 0 0.00e+00 0 0.00e+00 100
DualSpaceSetUp 2 1.0 2.6410e-03 1.0 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1 0 0 0.00e+00 0 0.00e+00 0
FESetUp 2 1.0 8.6078e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCSetUp 1 1.0 4.2690e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCApply 122 1.0 7.4229e-02 1.0 3.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 411 5106 0 0.00e+00 0 0.00e+00 100
--- Event Stage 1: PCSetUp
PCSetUp 1 1.0 2.0310e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 100 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
--- Event Stage 2: KSP Solve only
MatMult 244 1.0 7.4032e-02 1.0 7.35e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 51 0 0 0 12 90 0 0 0 99332 0 0 0.00e+00 0 0.00e+00 100
MatView 2 1.0 3.0267e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 2 1.0 6.3264e-01 1.0 8.15e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 56 0 0 0 100100 0 0 0 12878 156671 0 0.00e+00 0 0.00e+00 100
VecTDot 488 1.0 1.9612e-01 1.0 2.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 31 3 0 0 0 1244 13429 0 0.00e+00 0 0.00e+00 100
VecNorm 246 1.0 8.8144e-02 1.0 1.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 14 2 0 0 0 1396 13078 0 0.00e+00 0 0.00e+00 100
VecCopy 4 1.0 1.4585e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 4 1.0 1.4736e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 488 1.0 1.3101e-01 1.0 2.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 21 3 0 0 0 1863 20731 0 0.00e+00 0 0.00e+00 100
VecAYPX 242 1.0 7.5924e-02 1.0 1.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 12 1 0 0 0 1594 21455 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 244 1.0 6.4735e-02 1.0 6.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 10 1 0 0 0 942 10206 0 0.00e+00 0 0.00e+00 100
PCApply 244 1.0 6.4788e-02 1.0 6.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 10 1 0 0 0 942 10206 0 0.00e+00 0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 29 29 16704 0.
SNES 1 1 1540 0.
DMSNES 1 1 688 0.
Krylov Solver 1 1 1664 0.
DMKSP interface 1 1 656 0.
Matrix 68 68 186295972 0.
Distributed Mesh 64 64 7790176 0.
DM Label 143 143 90376 0.
Quadrature 148 148 87616 0.
Mesh Transform 3 3 2268 0.
Index Set 522 522 1870912 0.
IS L to G Mapping 1 1 1099172 0.
Section 208 208 148096 0.
Star Forest Graph 130 130 137712 0.
Discrete System 101 101 96964 0.
Weak Form 102 102 62832 0.
GraphPartitioner 30 30 20640 0.
Vector 47 47 19486664 0.
Linear Space 5 5 3416 0.
Dual Space 26 26 24336 0.
FE Space 2 2 1576 0.
Viewer 2 1 840 0.
Preconditioner 1 1 872 0.
Field over DM 1 1 704 0.
--- Event Stage 1: PCSetUp
--- Event Stage 2: KSP Solve only
========================================================================================================================
Average time to get PetscTime(): 3.61e-08
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 3
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --download-hypre-configure-arguments=--enable-unified-memory --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-21 19:20:56 on login2
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------
Using C compiler: cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3
Using Fortran compiler: ftn -fPIC
-----------------------------------------
Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/opt/rocm-4.5.0/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 3
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 14 unused database options. They are:
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01
-------------- next part --------------
DM Object: box 1 MPI processes
type: plex
box in 3 dimensions:
Number of 0-cells per rank: 274625
Number of 1-cells per rank: 811200
Number of 2-cells per rank: 798720
Number of 3-cells per rank: 262144
Labels:
celltype: 4 strata with value/size (0 (274625), 1 (811200), 4 (798720), 7 (262144))
depth: 4 strata with value/size (0 (274625), 1 (811200), 2 (798720), 3 (262144))
marker: 1 strata with value/size (1 (98208))
Face Sets: 6 strata with value/size (6 (15376), 5 (15376), 3 (15376), 4 (15376), 1 (15376), 2 (15376))
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 1 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaijkokkos
rows=2048383, cols=2048383
total: nonzeros=127263527, allocated nonzeros=127263527
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 1 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaijkokkos
rows=2048383, cols=2048383
total: nonzeros=127263527, allocated nonzeros=127263527
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 1 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaijkokkos
rows=2048383, cols=2048383
total: nonzeros=127263527, allocated nonzeros=127263527
total number of mallocs used during MatSetValues calls=0
not using I-node routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher003 with 1 processor, by adams Fri Jan 21 21:38:49 2022
Using Petsc Development GIT revision: v3.16.3-665-g1012189b9a GIT Date: 2022-01-21 16:28:20 +0000
Max Max/Min Avg Total
Time (sec): 4.693e+02 1.000 4.693e+02
Objects: 1.709e+03 1.000 1.709e+03
Flop: 1.872e+11 1.000 1.872e+11 1.872e+11
Flop/sec: 3.988e+08 1.000 3.988e+08 3.988e+08
MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00
MPI Message Lengths: 1.800e+01 1.000 0.000e+00 1.800e+01
MPI Reductions: 9.000e+00 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 4.6678e+02 99.5% 7.4706e+10 39.9% 0.000e+00 0.0% 0.000e+00 100.0% 9.000e+00 100.0%
1: PCSetUp: 1.6252e-01 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
2: KSP Solve only: 2.4002e+00 0.5% 1.1247e+11 60.1% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
CpuToGpu Count: total number of CPU to GPU copies per processor
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
GpuToCpu Count: total number of GPU to CPU copies per processor
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 4 1.0 1.6321e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSided 3 1.0 3.4567e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSidedF 1 1.0 1.6763e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatMult 95465 1.0 5.8604e-01 1.0 5.09e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 27 0 0 0 0 68 0 0 0 86868 0 0 0.00e+00 0 0.00e+00 100
MatAssemblyBegin 43 1.0 4.3304e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyEnd 43 1.0 1.7389e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatZeroEntries 3 1.0 2.1981e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatView 1 1.0 2.4758e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSetUp 1 1.0 4.9765e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 1 1.0 1.4162e+00 1.0 5.62e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 30 0 0 0 0 75 0 0 0 39711 211016 0 0.00e+00 0 0.00e+00 100
SNESSolve 1 1.0 1.9888e+02 1.0 6.56e+10 1.0 0.0e+00 0.0e+00 0.0e+00 42 35 0 0 0 43 88 0 0 0 330 210176 1 1.64e+01 2 3.28e+01 86
SNESSetUp 1 1.0 7.1536e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SNESFunctionEval 2 1.0 2.5677e+01 1.0 6.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 5 3 0 0 0 6 9 0 0 0 248 3529 2 3.28e+01 2 3.28e+01 0
SNESJacobianEval 2 1.0 3.6608e+02 1.0 1.21e+10 1.0 0.0e+00 0.0e+00 0.0e+00 78 6 0 0 0 78 16 0 0 0 33 0 0 0.00e+00 2 3.28e+01 0
DMCreateInterp 1 1.0 1.4078e-02 1.0 8.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6 0 0 0.00e+00 0 0.00e+00 0
DMCreateMat 1 1.0 7.1534e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterp 19 1.0 6.7295e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexStratify 31 1.0 5.0421e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexSymmetrize 31 1.0 1.1390e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPrealloc 1 1.0 7.1481e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexResidualFE 2 1.0 2.4087e+01 1.0 6.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00 5 3 0 0 0 5 8 0 0 0 261 0 0 0.00e+00 0 0.00e+00 0
DMPlexJacobianFE 2 1.0 3.6491e+02 1.0 1.20e+10 1.0 0.0e+00 0.0e+00 0.0e+00 78 6 0 0 0 78 16 0 0 0 33 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterpFE 1 1.0 1.4052e-02 1.0 8.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6 0 0 0.00e+00 0 0.00e+00 0
SFSetGraph 3 1.0 4.2462e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetUp 2 1.0 9.2410e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFBcastBegin 5 1.0 1.3387e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 4 6.55e+01 0
SFBcastEnd 5 1.0 9.5290e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFReduceBegin 2 1.0 2.1809e-01 1.0 4.10e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 19 0 2 3.28e+01 0 0.00e+00 100
SFReduceEnd 2 1.0 5.1390e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFPack 13 1.0 3.0536e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFUnpack 13 1.0 2.1868e-01 1.0 4.10e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 19 0 0 0.00e+00 0 0.00e+00 100
VecTDot 401 1.0 2.3875e-01 1.0 1.64e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 2 0 0 0 6881 15374 0 0.00e+00 0 0.00e+00 100
VecNorm 201 1.0 5.8171e-02 1.0 8.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 14156 54184 0 0.00e+00 0 0.00e+00 100
VecCopy 2 1.0 2.2576e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 59 1.0 1.8379e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 400 1.0 2.8259e-01 1.0 1.64e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 2 0 0 0 5799 14271 0 0.00e+00 0 0.00e+00 100
VecAYPX 199 1.0 5.3235e-02 1.0 8.15e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 15314 71257 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 201 1.0 5.6185e-02 1.0 4.12e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 7328 30559 0 0.00e+00 0 0.00e+00 100
DualSpaceSetUp 2 1.0 2.6483e-03 1.0 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1 0 0 0.00e+00 0 0.00e+00 0
FESetUp 2 1.0 8.7991e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCSetUp 1 1.0 4.6590e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCApply 201 1.0 2.2254e-01 1.0 4.12e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 1850 25150 0 0.00e+00 0 0.00e+00 100
--- Event Stage 1: PCSetUp
PCSetUp 1 1.0 1.6251e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 100 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
--- Event Stage 2: KSP Solve only
MatMult 400 1.0 1.0288e+00 1.0 1.02e+11 1.0 0.0e+00 0.0e+00 0.0e+00 0 54 0 0 0 43 91 0 0 0 98964 0 0 0.00e+00 0 0.00e+00 100
MatView 2 1.0 3.3745e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 2 1.0 2.3989e+00 1.0 1.12e+11 1.0 0.0e+00 0.0e+00 0.0e+00 1 60 0 0 0 100100 0 0 0 46887 220001 0 0.00e+00 0 0.00e+00 100
VecTDot 802 1.0 4.7745e-01 1.0 3.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 20 3 0 0 0 6882 15426 0 0.00e+00 0 0.00e+00 100
VecNorm 402 1.0 1.1532e-01 1.0 1.65e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 5 1 0 0 0 14281 62757 0 0.00e+00 0 0.00e+00 100
VecCopy 4 1.0 2.1859e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 4 1.0 2.1910e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 800 1.0 5.5739e-01 1.0 3.28e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 23 3 0 0 0 5880 14666 0 0.00e+00 0 0.00e+00 100
VecAYPX 398 1.0 1.0668e-01 1.0 1.63e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 4 1 0 0 0 15284 71218 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 402 1.0 1.0930e-01 1.0 8.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 1 0 0 0 7534 33579 0 0.00e+00 0 0.00e+00 100
PCApply 402 1.0 1.0940e-01 1.0 8.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 1 0 0 0 7527 33579 0 0.00e+00 0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 30 30 17280 0.
SNES 1 1 1540 0.
DMSNES 1 1 688 0.
Krylov Solver 1 1 1664 0.
DMKSP interface 1 1 656 0.
Matrix 69 69 1568597276 0.
Distributed Mesh 66 66 58921832 0.
DM Label 151 151 95432 0.
Quadrature 148 148 87616 0.
Mesh Transform 4 4 3024 0.
Index Set 559 559 6192364 0.
IS L to G Mapping 1 1 8587428 0.
Section 214 214 152368 0.
Star Forest Graph 134 134 141936 0.
Discrete System 106 106 101764 0.
Weak Form 107 107 65912 0.
GraphPartitioner 31 31 21328 0.
Vector 48 48 156739160 0.
Linear Space 5 5 3416 0.
Dual Space 26 26 24336 0.
FE Space 2 2 1576 0.
Viewer 2 1 840 0.
Preconditioner 1 1 872 0.
Field over DM 1 1 704 0.
--- Event Stage 1: PCSetUp
--- Event Stage 2: KSP Solve only
========================================================================================================================
Average time to get PetscTime(): 3.4e-08
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 4
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --download-hypre-configure-arguments=--enable-unified-memory --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-21 19:20:56 on login2
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------
Using C compiler: cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3
Using Fortran compiler: ftn -fPIC
-----------------------------------------
Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/opt/rocm-4.5.0/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 4
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 14 unused database options. They are:
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01
More information about the petsc-dev
mailing list