[petsc-dev] Kokkos/Crusher perforance

Mark Adams mfadams at lbl.gov
Fri Jan 21 20:46:49 CST 2022


>
>
>            But in particular look at the VecTDot and VecNorm CPU flop
> rates compared to the GPU, much lower, this tells me the MPI_Allreduce is
> likely hurting performance in there also a great deal. It would be good to
> see a single MPI rank job to compare to see performance without the MPI
> overhead.
>

Here are two single processor runs, with a whole GPU. It's not clear
of --ntasks-per-gpu=1 refers to the GPU socket (4 of them) or the GPUs (8).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220121/8bc396fe/attachment-0001.html>
-------------- next part --------------
DM Object: box 1 MPI processes
  type: plex
box in 3 dimensions:
  Number of 0-cells per rank: 35937
  Number of 1-cells per rank: 104544
  Number of 2-cells per rank: 101376
  Number of 3-cells per rank: 32768
Labels:
  celltype: 4 strata with value/size (0 (35937), 1 (104544), 4 (101376), 7 (32768))
  depth: 4 strata with value/size (0 (35937), 1 (104544), 2 (101376), 3 (32768))
  marker: 1 strata with value/size (1 (24480))
  Face Sets: 6 strata with value/size (6 (3600), 5 (3600), 3 (3600), 4 (3600), 1 (3600), 2 (3600))
  Linear solve converged due to CONVERGED_RTOL iterations 122
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: seqaijkokkos
    rows=250047, cols=250047
    total: nonzeros=15069223, allocated nonzeros=15069223
    total number of mallocs used during MatSetValues calls=0
      not using I-node routines
  Linear solve converged due to CONVERGED_RTOL iterations 122
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: seqaijkokkos
    rows=250047, cols=250047
    total: nonzeros=15069223, allocated nonzeros=15069223
    total number of mallocs used during MatSetValues calls=0
      not using I-node routines
  Linear solve converged due to CONVERGED_RTOL iterations 122
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: seqaijkokkos
    rows=250047, cols=250047
    total: nonzeros=15069223, allocated nonzeros=15069223
    total number of mallocs used during MatSetValues calls=0
      not using I-node routines
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher003 with 1 processor, by adams Fri Jan 21 21:30:02 2022
Using Petsc Development GIT revision: v3.16.3-665-g1012189b9a  GIT Date: 2022-01-21 16:28:20 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           5.916e+01     1.000   5.916e+01
Objects:              1.637e+03     1.000   1.637e+03
Flop:                 1.454e+10     1.000   1.454e+10  1.454e+10
Flop/sec:             2.459e+08     1.000   2.459e+08  2.459e+08
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  1.800e+01     1.000   0.000e+00  1.800e+01
MPI Reductions:       9.000e+00     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 5.8503e+01  98.9%  6.3978e+09  44.0%  0.000e+00   0.0%  0.000e+00      100.0%  9.000e+00 100.0%
 1:         PCSetUp: 2.0318e-02   0.0%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
 2:  KSP Solve only: 6.3347e-01   1.1%  8.1469e+09  56.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           3 1.0 2.1114e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSided          3 1.0 2.3745e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSidedF         1 1.0 1.8245e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatMult            23195 1.0 5.5017e-02 1.0 3.68e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0 25  0  0  0   0 57  0  0  0 66844       0      0 0.00e+00    0 0.00e+00 100
MatAssemblyBegin      43 1.0 4.3796e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd        43 1.0 2.8367e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatZeroEntries         3 1.0 3.5872e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatView                1 1.0 4.7812e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSetUp               1 1.0 9.1753e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               1 1.0 3.5479e-01 1.0 4.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1 28  0  0  0   1 64  0  0  0 11481   121319      0 0.00e+00    0 0.00e+00 100
SNESSolve              1 1.0 2.5371e+01 1.0 5.26e+09 1.0 0.0e+00 0.0e+00 0.0e+00 43 36  0  0  0  43 82  0  0  0   207   117727      1 2.00e+00    2 4.00e+00 77
SNESSetUp              1 1.0 8.4125e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  14  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SNESFunctionEval       2 1.0 3.5801e+00 1.0 8.04e+08 1.0 0.0e+00 0.0e+00 0.0e+00  6  6  0  0  0   6 13  0  0  0   225     468      2 4.00e+00    2 4.00e+00  0
SNESJacobianEval       2 1.0 4.5842e+01 1.0 1.52e+09 1.0 0.0e+00 0.0e+00 0.0e+00 77 10  0  0  0  78 24  0  0  0    33       0      0 0.00e+00    2 4.00e+00  0
DMCreateInterp         1 1.0 1.2704e-02 1.0 8.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     7       0      0 0.00e+00    0 0.00e+00  0
DMCreateMat            1 1.0 8.4118e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  14  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterp          19 1.0 6.4033e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexStratify        30 1.0 6.8263e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexSymmetrize      30 1.0 1.5020e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPrealloc         1 1.0 8.4045e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  14  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexResidualFE       2 1.0 3.0371e+00 1.0 7.87e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  5  0  0  0   5 12  0  0  0   259       0      0 0.00e+00    0 0.00e+00  0
DMPlexJacobianFE       2 1.0 4.5560e+01 1.0 1.50e+09 1.0 0.0e+00 0.0e+00 0.0e+00 77 10  0  0  0  78 23  0  0  0    33       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterpFE         1 1.0 1.2681e-02 1.0 8.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     7       0      0 0.00e+00    0 0.00e+00  0
SFSetGraph             3 1.0 4.9756e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetUp                2 1.0 1.0113e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFBcastBegin           5 1.0 2.6223e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    4 8.00e+00  0
SFBcastEnd             5 1.0 8.5570e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFReduceBegin          2 1.0 2.3398e-01 1.0 5.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     2       0      2 4.00e+00    0 0.00e+00 100
SFReduceEnd            2 1.0 4.4490e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFPack                13 1.0 3.8856e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFUnpack              13 1.0 2.3328e-01 1.0 5.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     2       0      0 0.00e+00    0 0.00e+00 100
VecTDot              244 1.0 9.2238e-02 1.0 1.22e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  2  0  0  0  1323   12541      0 0.00e+00    0 0.00e+00 100
VecNorm              123 1.0 4.4787e-02 1.0 6.15e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0  1373    9907      0 0.00e+00    0 0.00e+00 100
VecCopy                2 1.0 1.8176e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                58 1.0 8.4117e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              245 1.0 6.9616e-02 1.0 1.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  2  0  0  0  1760   15949      0 0.00e+00    0 0.00e+00 100
VecAYPX              121 1.0 2.8920e-02 1.0 6.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0  2092   21415      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     122 1.0 5.1669e-02 1.0 3.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   590    7979      0 0.00e+00    0 0.00e+00 100
DualSpaceSetUp         2 1.0 2.6410e-03 1.0 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1       0      0 0.00e+00    0 0.00e+00  0
FESetUp                2 1.0 8.6078e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCSetUp                1 1.0 4.2690e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply              122 1.0 7.4229e-02 1.0 3.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   411    5106      0 0.00e+00    0 0.00e+00 100

--- Event Stage 1: PCSetUp

PCSetUp                1 1.0 2.0310e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0 100  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

--- Event Stage 2: KSP Solve only

MatMult              244 1.0 7.4032e-02 1.0 7.35e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0 51  0  0  0  12 90  0  0  0 99332       0      0 0.00e+00    0 0.00e+00 100
MatView                2 1.0 3.0267e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               2 1.0 6.3264e-01 1.0 8.15e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1 56  0  0  0 100100  0  0  0 12878   156671      0 0.00e+00    0 0.00e+00 100
VecTDot              488 1.0 1.9612e-01 1.0 2.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0  31  3  0  0  0  1244   13429      0 0.00e+00    0 0.00e+00 100
VecNorm              246 1.0 8.8144e-02 1.0 1.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0  14  2  0  0  0  1396   13078      0 0.00e+00    0 0.00e+00 100
VecCopy                4 1.0 1.4585e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                 4 1.0 1.4736e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              488 1.0 1.3101e-01 1.0 2.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0  21  3  0  0  0  1863   20731      0 0.00e+00    0 0.00e+00 100
VecAYPX              242 1.0 7.5924e-02 1.0 1.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0  12  1  0  0  0  1594   21455      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     244 1.0 6.4735e-02 1.0 6.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  10  1  0  0  0   942   10206      0 0.00e+00    0 0.00e+00 100
PCApply              244 1.0 6.4788e-02 1.0 6.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  10  1  0  0  0   942   10206      0 0.00e+00    0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container    29             29        16704     0.
                SNES     1              1         1540     0.
              DMSNES     1              1          688     0.
       Krylov Solver     1              1         1664     0.
     DMKSP interface     1              1          656     0.
              Matrix    68             68    186295972     0.
    Distributed Mesh    64             64      7790176     0.
            DM Label   143            143        90376     0.
          Quadrature   148            148        87616     0.
      Mesh Transform     3              3         2268     0.
           Index Set   522            522      1870912     0.
   IS L to G Mapping     1              1      1099172     0.
             Section   208            208       148096     0.
   Star Forest Graph   130            130       137712     0.
     Discrete System   101            101        96964     0.
           Weak Form   102            102        62832     0.
    GraphPartitioner    30             30        20640     0.
              Vector    47             47     19486664     0.
        Linear Space     5              5         3416     0.
          Dual Space    26             26        24336     0.
            FE Space     2              2         1576     0.
              Viewer     2              1          840     0.
      Preconditioner     1              1          872     0.
       Field over DM     1              1          704     0.

--- Event Stage 1: PCSetUp


--- Event Stage 2: KSP Solve only

========================================================================================================================
Average time to get PetscTime(): 3.61e-08
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 3
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --download-hypre-configure-arguments=--enable-unified-memory --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-21 19:20:56 on login2 
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------

Using C compiler: cc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3   
Using Fortran compiler: ftn  -fPIC     
-----------------------------------------

Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/opt/rocm-4.5.0/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------

#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 3
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 14 unused database options. They are:
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01
-------------- next part --------------
DM Object: box 1 MPI processes
  type: plex
box in 3 dimensions:
  Number of 0-cells per rank: 274625
  Number of 1-cells per rank: 811200
  Number of 2-cells per rank: 798720
  Number of 3-cells per rank: 262144
Labels:
  celltype: 4 strata with value/size (0 (274625), 1 (811200), 4 (798720), 7 (262144))
  depth: 4 strata with value/size (0 (274625), 1 (811200), 2 (798720), 3 (262144))
  marker: 1 strata with value/size (1 (98208))
  Face Sets: 6 strata with value/size (6 (15376), 5 (15376), 3 (15376), 4 (15376), 1 (15376), 2 (15376))
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: seqaijkokkos
    rows=2048383, cols=2048383
    total: nonzeros=127263527, allocated nonzeros=127263527
    total number of mallocs used during MatSetValues calls=0
      not using I-node routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: seqaijkokkos
    rows=2048383, cols=2048383
    total: nonzeros=127263527, allocated nonzeros=127263527
    total number of mallocs used during MatSetValues calls=0
      not using I-node routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: seqaijkokkos
    rows=2048383, cols=2048383
    total: nonzeros=127263527, allocated nonzeros=127263527
    total number of mallocs used during MatSetValues calls=0
      not using I-node routines
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher003 with 1 processor, by adams Fri Jan 21 21:38:49 2022
Using Petsc Development GIT revision: v3.16.3-665-g1012189b9a  GIT Date: 2022-01-21 16:28:20 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           4.693e+02     1.000   4.693e+02
Objects:              1.709e+03     1.000   1.709e+03
Flop:                 1.872e+11     1.000   1.872e+11  1.872e+11
Flop/sec:             3.988e+08     1.000   3.988e+08  3.988e+08
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  1.800e+01     1.000   0.000e+00  1.800e+01
MPI Reductions:       9.000e+00     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 4.6678e+02  99.5%  7.4706e+10  39.9%  0.000e+00   0.0%  0.000e+00      100.0%  9.000e+00 100.0%
 1:         PCSetUp: 1.6252e-01   0.0%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
 2:  KSP Solve only: 2.4002e+00   0.5%  1.1247e+11  60.1%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           4 1.0 1.6321e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSided          3 1.0 3.4567e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSidedF         1 1.0 1.6763e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatMult            95465 1.0 5.8604e-01 1.0 5.09e+10 1.0 0.0e+00 0.0e+00 0.0e+00  0 27  0  0  0   0 68  0  0  0 86868       0      0 0.00e+00    0 0.00e+00 100
MatAssemblyBegin      43 1.0 4.3304e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd        43 1.0 1.7389e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatZeroEntries         3 1.0 2.1981e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatView                1 1.0 2.4758e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSetUp               1 1.0 4.9765e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               1 1.0 1.4162e+00 1.0 5.62e+10 1.0 0.0e+00 0.0e+00 0.0e+00  0 30  0  0  0   0 75  0  0  0 39711   211016      0 0.00e+00    0 0.00e+00 100
SNESSolve              1 1.0 1.9888e+02 1.0 6.56e+10 1.0 0.0e+00 0.0e+00 0.0e+00 42 35  0  0  0  43 88  0  0  0   330   210176      1 1.64e+01    2 3.28e+01 86
SNESSetUp              1 1.0 7.1536e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 15  0  0  0  0  15  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SNESFunctionEval       2 1.0 2.5677e+01 1.0 6.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00  5  3  0  0  0   6  9  0  0  0   248    3529      2 3.28e+01    2 3.28e+01  0
SNESJacobianEval       2 1.0 3.6608e+02 1.0 1.21e+10 1.0 0.0e+00 0.0e+00 0.0e+00 78  6  0  0  0  78 16  0  0  0    33       0      0 0.00e+00    2 3.28e+01  0
DMCreateInterp         1 1.0 1.4078e-02 1.0 8.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     6       0      0 0.00e+00    0 0.00e+00  0
DMCreateMat            1 1.0 7.1534e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 15  0  0  0  0  15  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterp          19 1.0 6.7295e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexStratify        31 1.0 5.0421e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexSymmetrize      31 1.0 1.1390e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPrealloc         1 1.0 7.1481e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 15  0  0  0  0  15  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexResidualFE       2 1.0 2.4087e+01 1.0 6.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00  5  3  0  0  0   5  8  0  0  0   261       0      0 0.00e+00    0 0.00e+00  0
DMPlexJacobianFE       2 1.0 3.6491e+02 1.0 1.20e+10 1.0 0.0e+00 0.0e+00 0.0e+00 78  6  0  0  0  78 16  0  0  0    33       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterpFE         1 1.0 1.4052e-02 1.0 8.29e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     6       0      0 0.00e+00    0 0.00e+00  0
SFSetGraph             3 1.0 4.2462e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetUp                2 1.0 9.2410e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFBcastBegin           5 1.0 1.3387e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    4 6.55e+01  0
SFBcastEnd             5 1.0 9.5290e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFReduceBegin          2 1.0 2.1809e-01 1.0 4.10e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    19       0      2 3.28e+01    0 0.00e+00 100
SFReduceEnd            2 1.0 5.1390e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFPack                13 1.0 3.0536e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFUnpack              13 1.0 2.1868e-01 1.0 4.10e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    19       0      0 0.00e+00    0 0.00e+00 100
VecTDot              401 1.0 2.3875e-01 1.0 1.64e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  2  0  0  0  6881   15374      0 0.00e+00    0 0.00e+00 100
VecNorm              201 1.0 5.8171e-02 1.0 8.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 14156   54184      0 0.00e+00    0 0.00e+00 100
VecCopy                2 1.0 2.2576e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                59 1.0 1.8379e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              400 1.0 2.8259e-01 1.0 1.64e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  2  0  0  0  5799   14271      0 0.00e+00    0 0.00e+00 100
VecAYPX              199 1.0 5.3235e-02 1.0 8.15e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 15314   71257      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     201 1.0 5.6185e-02 1.0 4.12e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0  7328   30559      0 0.00e+00    0 0.00e+00 100
DualSpaceSetUp         2 1.0 2.6483e-03 1.0 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1       0      0 0.00e+00    0 0.00e+00  0
FESetUp                2 1.0 8.7991e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCSetUp                1 1.0 4.6590e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply              201 1.0 2.2254e-01 1.0 4.12e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0  1850   25150      0 0.00e+00    0 0.00e+00 100

--- Event Stage 1: PCSetUp

PCSetUp                1 1.0 1.6251e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0 100  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

--- Event Stage 2: KSP Solve only

MatMult              400 1.0 1.0288e+00 1.0 1.02e+11 1.0 0.0e+00 0.0e+00 0.0e+00  0 54  0  0  0  43 91  0  0  0 98964       0      0 0.00e+00    0 0.00e+00 100
MatView                2 1.0 3.3745e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               2 1.0 2.3989e+00 1.0 1.12e+11 1.0 0.0e+00 0.0e+00 0.0e+00  1 60  0  0  0 100100  0  0  0 46887   220001      0 0.00e+00    0 0.00e+00 100
VecTDot              802 1.0 4.7745e-01 1.0 3.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0  20  3  0  0  0  6882   15426      0 0.00e+00    0 0.00e+00 100
VecNorm              402 1.0 1.1532e-01 1.0 1.65e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   5  1  0  0  0 14281   62757      0 0.00e+00    0 0.00e+00 100
VecCopy                4 1.0 2.1859e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                 4 1.0 2.1910e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              800 1.0 5.5739e-01 1.0 3.28e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0  23  3  0  0  0  5880   14666      0 0.00e+00    0 0.00e+00 100
VecAYPX              398 1.0 1.0668e-01 1.0 1.63e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   4  1  0  0  0 15284   71218      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     402 1.0 1.0930e-01 1.0 8.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5  1  0  0  0  7534   33579      0 0.00e+00    0 0.00e+00 100
PCApply              402 1.0 1.0940e-01 1.0 8.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5  1  0  0  0  7527   33579      0 0.00e+00    0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container    30             30        17280     0.
                SNES     1              1         1540     0.
              DMSNES     1              1          688     0.
       Krylov Solver     1              1         1664     0.
     DMKSP interface     1              1          656     0.
              Matrix    69             69   1568597276     0.
    Distributed Mesh    66             66     58921832     0.
            DM Label   151            151        95432     0.
          Quadrature   148            148        87616     0.
      Mesh Transform     4              4         3024     0.
           Index Set   559            559      6192364     0.
   IS L to G Mapping     1              1      8587428     0.
             Section   214            214       152368     0.
   Star Forest Graph   134            134       141936     0.
     Discrete System   106            106       101764     0.
           Weak Form   107            107        65912     0.
    GraphPartitioner    31             31        21328     0.
              Vector    48             48    156739160     0.
        Linear Space     5              5         3416     0.
          Dual Space    26             26        24336     0.
            FE Space     2              2         1576     0.
              Viewer     2              1          840     0.
      Preconditioner     1              1          872     0.
       Field over DM     1              1          704     0.

--- Event Stage 1: PCSetUp


--- Event Stage 2: KSP Solve only

========================================================================================================================
Average time to get PetscTime(): 3.4e-08
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 4
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --download-hypre-configure-arguments=--enable-unified-memory --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-21 19:20:56 on login2 
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------

Using C compiler: cc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3   
Using Fortran compiler: ftn  -fPIC     
-----------------------------------------

Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/opt/rocm-4.5.0/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------

#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 4
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 14 unused database options. They are:
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01


More information about the petsc-dev mailing list