[petsc-dev] Kokkos/Crusher perforance

Mark Adams mfadams at lbl.gov
Sat Jan 22 10:44:24 CST 2022


I am getting some funny timings and I'm trying to figure it out.
I figure the gPU flop rates are bit higher because the timers are inside of
the CPU timers, but *some are a lot bigger or inverted*

--- Event Stage 2: KSP Solve only

MatMult              400 1.0 1.0094e+01 1.2 1.07e+11 1.0 3.7e+05 6.1e+04
0.0e+00  2 55 62 54  0  68 91100100  0 671849   857147      0 0.00e+00    0
0.00e+00 100
MatView                2 1.0 4.5257e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
0.00e+00  0
KSPSolve               2 1.0 1.4591e+01 1.1 1.18e+11 1.0 3.7e+05 6.1e+04
1.2e+03  2 60 62 54 60 100100100100100 512399   804048      0 0.00e+00    0
0.00e+00 100
SFPack               400 1.0 2.4545e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
0.00e+00  0
SFUnpack             400 1.0 9.4637e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
0.00e+00  0
VecTDot              802 1.0 3.0577e+00 2.1 3.36e+09 1.0 0.0e+00 0.0e+00
8.0e+02  0  2  0  0 40  13  3  0  0 67 *69996   488328*      0 0.00e+00
 0 0.00e+00 100
VecNorm              402 1.0 1.9597e+00 3.4 1.69e+09 1.0 0.0e+00 0.0e+00
4.0e+02  0  1  0  0 20   6  1  0  0 33 54744   571507      0 0.00e+00    0
0.00e+00 100
VecCopy                4 1.0 1.7143e-0228.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
0.00e+00  0
VecSet                 4 1.0 3.8051e-0316.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
0.00e+00  0
VecAXPY              800 1.0 8.6160e-0113.6 3.36e+09 1.0 0.0e+00 0.0e+00
0.0e+00  0  2  0  0  0   6  3  0  0  0 *247787   448304*      0 0.00e+00
 0 0.00e+00 100
VecAYPX              398 1.0 1.6831e+0031.1 1.67e+09 1.0 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   5  1  0  0  0 63107   77030      0 0.00e+00    0
0.00e+00 100
VecPointwiseMult     402 1.0 3.8729e-01 9.3 8.43e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   2  1  0  0  0 138502   262413      0 0.00e+00    0
0.00e+00 100
VecScatterBegin      400 1.0 1.1947e+0035.1 0.00e+00 0.0 3.7e+05 6.1e+04
0.0e+00  0  0 62 54  0   5  0100100  0     0       0      0 0.00e+00    0
0.00e+00  0
VecScatterEnd        400 1.0 6.2969e+00 8.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0  10  0  0  0  0     0       0      0 0.00e+00    0
0.00e+00  0
PCApply              402 1.0 3.8758e-01 9.3 8.43e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   2  1  0  0  0 138396   262413      0 0.00e+00    0
0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------


On Sat, Jan 22, 2022 at 11:10 AM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

>
>
>
> On Sat, Jan 22, 2022 at 10:04 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Logging GPU flops should be inside of PetscLogGpuTimeBegin()/End()  right?
>>
> No, PetscLogGpuTime() does not know the flops of the caller.
>
>
>>
>> On Fri, Jan 21, 2022 at 9:47 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>   Mark,
>>>
>>>   Fix the logging before you run more. It will help with seeing the true
>>> disparity between the MatMult and the vector ops.
>>>
>>>
>>> On Jan 21, 2022, at 9:37 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> Here is one with 2M / GPU. Getting better.
>>>
>>> On Fri, Jan 21, 2022 at 9:17 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>>    Matt is correct, vectors are way too small.
>>>>
>>>>    BTW: Now would be a good time to run some of the Report I benchmarks
>>>> on Crusher to get a feel for the kernel launch times and performance on
>>>> VecOps.
>>>>
>>>>    Also Report 2.
>>>>
>>>>   Barry
>>>>
>>>>
>>>> On Jan 21, 2022, at 7:58 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>
>>>> On Fri, Jan 21, 2022 at 6:41 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>> I am looking at performance of a CG/Jacobi solve on a 3D Q2 Laplacian
>>>>> (ex13) on one Crusher node (8 GPUs on 4 GPU sockets, MI250X or is it
>>>>> MI200?).
>>>>> This is with a 16M equation problem. GPU-aware MPI and non GPU-aware
>>>>> MPI are similar (mat-vec is a little faster w/o, the total is about the
>>>>> same, call it noise)
>>>>>
>>>>> I found that MatMult was about 3x faster using 8 cores/GPU, that is
>>>>> all 64 cores on the node, then when using 1 core/GPU. With the same size
>>>>> problem of course.
>>>>> I was thinking MatMult should be faster with just one MPI process. Oh
>>>>> well, worry about that later.
>>>>>
>>>>> The bigger problem, and I have observed this to some extent with the
>>>>> Landau TS/SNES/GPU-solver on the V/A100s, is that the vector operations are
>>>>> expensive or crazy expensive.
>>>>> You can see (attached) and the times here that the solve is dominated
>>>>> by not-mat-vec:
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------------------------------------------------
>>>>> Event                Count      Time (sec)     Flop
>>>>>            --- Global ---  --- Stage ----  *Total   GPU *   -
>>>>> CpuToGpu -   - GpuToCpu - GPU
>>>>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess
>>>>> AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R *Mflop/s Mflop/s*
>>>>> Count   Size   Count   Size  %F
>>>>>
>>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> 17:15 main=
>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data$ grep "MatMult
>>>>>              400" jac_out_00*5_8_gpuawaremp*
>>>>> MatMult              400 1.0 *1.2507e+00* 1.3 1.34e+10 1.1 3.7e+05
>>>>> 1.6e+04 0.0e+00  1 55 62 54  0  27 91100100  0 *668874       0*
>>>>>  0 0.00e+00    0 0.00e+00 100
>>>>> 17:15 main=
>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data$ grep "KSPSolve
>>>>>               2" jac_out_001*_5_8_gpuawaremp*
>>>>> KSPSolve               2 1.0 *4.4173e+00* 1.0 1.48e+10 1.1 3.7e+05
>>>>> 1.6e+04 1.2e+03  4 60 62 54 61 100100100100100 *208923   1094405*
>>>>>  0 0.00e+00    0 0.00e+00 100
>>>>>
>>>>> Notes about flop counters here,
>>>>> * that MatMult flops are not logged as GPU flops but something is
>>>>> logged nonetheless.
>>>>> * The GPU flop rate is 5x the total flop rate  in KSPSolve :\
>>>>> * I think these nodes have an FP64 peak flop rate of 200 Tflops, so we
>>>>> are at < 1%.
>>>>>
>>>>
>>>> This looks complicated, so just a single remark:
>>>>
>>>> My understanding of the benchmarking of vector ops led by Hannah was
>>>> that you needed to be much
>>>> bigger than 16M to hit peak. I need to get the tech report, but on 8
>>>> GPUs I would think you would be
>>>> at 10% of peak or something right off the bat at these sizes. Barry, is
>>>> that right?
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>> Anway, not sure how to proceed but I thought I would share.
>>>>> Maybe ask the Kokkos guys if the have looked at Crusher.
>>>>>
>>>>> Mark
>>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>>
>>>> <jac_out_001_kokkos_Crusher_6_8_gpuawarempi.txt>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220122/0520b93b/attachment-0001.html>
-------------- next part --------------
DM Object: box 64 MPI processes
  type: plex
box in 3 dimensions:
  Number of 0-cells per rank: 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937 35937
  Number of 1-cells per rank: 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544 104544
  Number of 2-cells per rank: 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376 101376
  Number of 3-cells per rank: 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768 32768
Labels:
  celltype: 4 strata with value/size (0 (35937), 1 (104544), 4 (101376), 7 (32768))
  depth: 4 strata with value/size (0 (35937), 1 (104544), 2 (101376), 3 (32768))
  marker: 1 strata with value/size (1 (12474))
  Face Sets: 3 strata with value/size (1 (3969), 3 (3969), 6 (3969))
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 64 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 64 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 64 MPI processes
    type: mpiaijkokkos
    rows=16581375, cols=16581375
    total: nonzeros=1045678375, allocated nonzeros=1045678375
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 64 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 64 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 64 MPI processes
    type: mpiaijkokkos
    rows=16581375, cols=16581375
    total: nonzeros=1045678375, allocated nonzeros=1045678375
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 64 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 64 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 64 MPI processes
    type: mpiaijkokkos
    rows=16581375, cols=16581375
    total: nonzeros=1045678375, allocated nonzeros=1045678375
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher001 with 64 processors, by adams Sat Jan 22 10:03:28 2022
Using Petsc Development GIT revision: v3.16.3-682-g5f40ebe68c  GIT Date: 2022-01-22 09:12:56 -0500

                         Max       Max/Min     Avg       Total
Time (sec):           8.322e+01     1.000   8.322e+01
Objects:              2.088e+03     1.164   1.852e+03
Flop:                 2.448e+10     1.074   2.393e+10  1.532e+12
Flop/sec:             2.941e+08     1.074   2.876e+08  1.841e+10
MPI Messages:         1.651e+04     3.673   9.388e+03  6.009e+05
MPI Message Lengths:  2.278e+08     2.093   1.788e+04  1.074e+10
MPI Reductions:       1.988e+03     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 7.7539e+01  93.2%  6.0889e+11  39.8%  2.265e+05  37.7%  2.175e+04       45.8%  7.630e+02  38.4%
 1:         PCSetUp: 3.0132e-02   0.0%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
 2:  KSP Solve only: 5.6444e+00   6.8%  9.2287e+11  60.2%  3.744e+05  62.3%  1.554e+04       54.2%  1.206e+03  60.7%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           5 1.0 2.0206e-01 1.0 0.00e+00 0.0 1.1e+04 8.0e+02 1.8e+01  0  0  2  0  1   0  0  5  0  2     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSided         40 1.0 4.7150e+00 2.1 0.00e+00 0.0 9.9e+03 4.0e+00 4.0e+01  5  0  2  0  2   6  0  4  0  5     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSidedF         6 1.0 4.6524e+00 2.2 0.00e+00 0.0 2.2e+03 4.0e+05 6.0e+00  5  0  0  8  0   6  0  1 18  1     0       0      0 0.00e+00    0 0.00e+00  0
MatMult            1210960.2 1.5708e+00 2.0 6.71e+09 1.1 1.9e+05 1.5e+04 2.0e+00  1 27 32 27  0   1 69 85 59  0 266283   865349      1 1.14e-01    0 0.00e+00 100
MatAssemblyBegin      43 1.0 4.7266e+00 2.1 0.00e+00 0.0 2.2e+03 4.0e+05 6.0e+00  5  0  0  8  0   6  0  1 18  1     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd        43 1.0 4.6561e-01 2.5 1.18e+06 0.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  0   0  0  0  0  1   119       0      0 0.00e+00    0 0.00e+00  0
MatZeroEntries         3 1.0 6.5399e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatView                1 1.0 3.0994e-0235.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSetUp               1 1.0 1.7072e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               1 1.0 3.8576e+00 1.0 7.39e+09 1.1 1.9e+05 1.5e+04 6.0e+02  5 30 31 27 30   5 76 83 59 79 119616   505130      1 1.14e-01    0 0.00e+00 100
SNESSolve              1 1.0 3.0039e+01 1.0 8.56e+09 1.1 1.9e+05 1.8e+04 6.1e+02 36 35 32 31 31  39 88 84 69 80 17861   505077      3 2.36e+00    2 3.62e+00 86
SNESSetUp              1 1.0 7.8404e+00 1.0 0.00e+00 0.0 5.3e+03 1.9e+05 1.8e+01  9  0  1  9  1  10  0  2 21  2     0       0      0 0.00e+00    0 0.00e+00  0
SNESFunctionEval       2 1.0 6.0919e+00 1.0 7.96e+08 1.0 1.7e+03 1.3e+04 3.0e+00  7  3  0  0  0   8  8  1  0  0  8322   21227      3 4.32e+00    2 3.62e+00  0
SNESJacobianEval       2 1.0 5.9638e+01 1.0 1.52e+09 1.0 1.7e+03 5.4e+05 2.0e+00 71  6  0  8  0  77 16  1 18  0  1621       0      0 0.00e+00    2 3.62e+00  0
DMCreateInterp         1 1.0 1.0059e-02 1.0 8.29e+04 1.0 1.1e+03 8.0e+02 1.6e+01  0  0  0  0  1   0  0  0  0  2   528       0      0 0.00e+00    0 0.00e+00  0
DMCreateMat            1 1.0 7.8388e+00 1.0 0.00e+00 0.0 5.3e+03 1.9e+05 1.8e+01  9  0  1  9  1  10  0  2 21  2     0       0      0 0.00e+00    0 0.00e+00  0
Mesh Partition         1 1.0 2.4261e-02 1.0 0.00e+00 0.0 3.2e+02 1.1e+02 8.0e+00  0  0  0  0  0   0  0  0  0  1     0       0      0 0.00e+00    0 0.00e+00  0
Mesh Migration         1 1.0 9.6900e-03 1.0 0.00e+00 0.0 1.8e+03 8.3e+01 2.9e+01  0  0  0  0  1   0  0  1  0  4     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartSelf         1 1.0 8.1943e-0483.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartLblInv       1 1.0 1.0519e-03 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartLblSF        1 1.0 4.5399e-03 1.7 0.00e+00 0.0 1.3e+02 5.6e+01 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartStrtSF       1 1.0 1.7443e-02 1.7 0.00e+00 0.0 6.3e+01 2.2e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPointSF          1 1.0 1.5520e-03 1.2 0.00e+00 0.0 1.3e+02 2.7e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterp          19 1.0 8.8896e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistribute       1 1.0 3.5443e-02 1.0 0.00e+00 0.0 2.2e+03 9.7e+01 3.7e+01  0  0  0  0  2   0  0  1  0  5     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistCones        1 1.0 1.0297e-03 1.2 0.00e+00 0.0 3.8e+02 1.4e+02 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistLabels       1 1.0 1.6130e-03 1.1 0.00e+00 0.0 9.0e+02 6.6e+01 2.4e+01  0  0  0  0  1   0  0  0  0  3     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistField        1 1.0 6.6520e-03 1.0 0.00e+00 0.0 4.4e+02 5.9e+01 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexStratify        33 1.0 1.4024e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00  0  0  0  0  0   0  0  0  0  1     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexSymmetrize      33 1.0 1.9723e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPrealloc         1 1.0 7.8280e+00 1.0 0.00e+00 0.0 5.3e+03 1.9e+05 1.6e+01  9  0  1  9  1  10  0  2 21  2     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexResidualFE       2 1.0 3.7000e+00 1.1 7.87e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  3  0  0  0   5  8  0  0  0 13611       0      0 0.00e+00    0 0.00e+00  0
DMPlexJacobianFE       2 1.0 5.9402e+01 1.0 1.51e+09 1.0 1.1e+03 8.0e+05 2.0e+00 71  6  0  8  0  76 16  0 18  0  1623       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterpFE         1 1.0 9.9446e-03 1.0 8.29e+04 1.0 1.1e+03 8.0e+02 1.6e+01  0  0  0  0  1   0  0  0  0  2   534       0      0 0.00e+00    0 0.00e+00  0
SFSetGraph            43 1.0 7.8240e-04 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetUp               34 1.0 1.5160e-01 1.4 0.00e+00 0.0 1.8e+04 2.1e+04 3.4e+01  0  0  3  3  2   0  0  8  7  4     0       0      0 0.00e+00    0 0.00e+00  0
SFBcastBegin          65 1.0 2.2658e+00143.5 0.00e+00 0.0 1.3e+04 1.3e+04 0.0e+00  2  0  2  2  0   2  0  6  3  0     0       0      1 1.68e-01    4 7.24e+00  0
SFBcastEnd            65 1.0 2.1147e+0021.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      0 6.24e-03    0 0.00e+00  0
SFReduceBegin         16 1.0 1.9178e-0191.4 5.24e+05 1.0 4.2e+03 8.5e+04 0.0e+00  0  0  1  3  0   0  0  2  7  0   173       0      2 4.15e+00    0 0.00e+00 100
SFReduceEnd           16 1.0 1.2777e+0057.6 2.50e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1       0      0 0.00e+00    0 0.00e+00 100
SFFetchOpBegin         2 1.0 3.7671e-03241.6 0.00e+00 0.0 5.6e+02 2.0e+05 0.0e+00  0  0  0  1  0   0  0  0  2  0     0       0      0 0.00e+00    0 0.00e+00  0
SFFetchOpEnd           2 1.0 2.8282e-02 4.2 0.00e+00 0.0 5.6e+02 2.0e+05 0.0e+00  0  0  0  1  0   0  0  0  2  0     0       0      0 0.00e+00    0 0.00e+00  0
SFCreateEmbed          8 1.0 1.1169e-0156.3 0.00e+00 0.0 2.0e+03 7.0e+02 0.0e+00  0  0  0  0  0   0  0  1  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFDistSection          9 1.0 1.0939e-02 2.3 0.00e+00 0.0 4.1e+03 5.9e+03 1.1e+01  0  0  1  0  1   0  0  2  0  1     0       0      0 0.00e+00    0 0.00e+00  0
SFSectionSF           16 1.0 5.2940e-02 2.4 0.00e+00 0.0 5.8e+03 2.0e+04 1.6e+01  0  0  1  1  1   0  0  3  2  2     0       0      0 0.00e+00    0 0.00e+00  0
SFRemoteOff            7 1.0 1.1493e-0122.6 0.00e+00 0.0 6.1e+03 1.3e+03 4.0e+00  0  0  1  0  0   0  0  3  0  1     0       0      0 0.00e+00    0 0.00e+00  0
SFPack               290 1.0 7.5745e-01105.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      2 1.51e-01    0 0.00e+00  0
SFUnpack             292 1.0 1.9496e-0145.4 5.49e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   176       0      0 6.24e-03    0 0.00e+00 100
VecTDot              401 1.0 1.8537e+00 1.7 2.10e+08 1.0 0.0e+00 0.0e+00 4.0e+02  2  1  0  0 20   2  2  0  0 53  7174   116292      0 0.00e+00    0 0.00e+00 100
VecNorm              201 1.0 8.0687e-01 1.8 1.05e+08 1.0 0.0e+00 0.0e+00 2.0e+02  1  0  0  0 10   1  1  0  0 26  8261   114870      0 0.00e+00    0 0.00e+00 100
VecCopy                2 1.0 1.1444e-0269.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                54 1.0 8.2474e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              400 1.0 3.4551e-01 9.3 2.10e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  2  0  0  0 38393   86428      0 0.00e+00    0 0.00e+00 100
VecAYPX              199 1.0 2.0194e-0111.3 1.04e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 32679   81175      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     201 1.0 1.4782e-01 5.8 5.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 22547   60563      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      201 1.0 6.5913e-0113.0 0.00e+00 0.0 1.9e+05 1.5e+04 2.0e+00  0  0 32 27  0   0  0 85 59  0     0       0      1 1.14e-01    0 0.00e+00  0
VecScatterEnd        201 1.0 9.8631e-0110.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DualSpaceSetUp         2 1.0 4.4867e-03 1.1 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    26       0      0 0.00e+00    0 0.00e+00  0
FESetUp                2 1.0 3.3971e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCSetUp                1 1.0 1.2845e-05 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply              201 1.0 2.1640e-01 2.3 5.27e+07 1.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  1  0  0  0 15402   40054      0 0.00e+00    0 0.00e+00 100

--- Event Stage 1: PCSetUp

PCSetUp                1 1.0 3.4705e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0 100  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

--- Event Stage 2: KSP Solve only

MatMult              400 1.0 1.8850e+00 1.3 1.34e+10 1.1 3.7e+05 1.6e+04 0.0e+00  2 55 62 54  0  29 91100100  0 443783   968552      0 0.00e+00    0 0.00e+00 100
MatView                2 1.0 4.4964e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               2 1.0 5.7739e+00 1.0 1.48e+10 1.1 3.7e+05 1.6e+04 1.2e+03  7 60 62 54 61 100100100100100 159833   549009      0 0.00e+00    0 0.00e+00 100
SFPack               400 1.0 2.5136e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFUnpack             400 1.0 1.4536e-04 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecTDot              802 1.0 2.6914e+00 1.5 4.20e+08 1.0 0.0e+00 0.0e+00 8.0e+02  2  2  0  0 40  35  3  0  0 67  9882   121145      0 0.00e+00    0 0.00e+00 100
VecNorm              402 1.0 1.4937e+00 1.9 2.11e+08 1.0 0.0e+00 0.0e+00 4.0e+02  1  1  0  0 20  15  1  0  0 33  8925   114567      0 0.00e+00    0 0.00e+00 100
VecCopy                4 1.0 1.8357e-0256.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                 4 1.0 4.4133e-0316.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              800 1.0 6.5359e-0116.2 4.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0  11  3  0  0  0 40592   89361      0 0.00e+00    0 0.00e+00 100
VecAYPX              398 1.0 3.3752e-0112.3 2.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   5  1  0  0  0 39106   85354      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     402 1.0 2.7869e-01 7.7 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   4  1  0  0  0 23918   62862      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      400 1.0 1.8345e-01 8.1 0.00e+00 0.0 3.7e+05 1.6e+04 0.0e+00  0  0 62 54  0   2  0100100  0     0       0      0 0.00e+00    0 0.00e+00  0
VecScatterEnd        400 1.0 1.1227e+00 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   7  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply              402 1.0 2.7901e-01 7.6 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   4  1  0  0  0 23891   62862      0 0.00e+00    0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container    32             32        18432     0.
                SNES     1              1         1540     0.
              DMSNES     1              1          688     0.
       Krylov Solver     1              1         1664     0.
     DMKSP interface     1              1          656     0.
              Matrix    75             75    195551600     0.
    Distributed Mesh    70             70      7840024     0.
            DM Label   172            172       108704     0.
          Quadrature   148            148        87616     0.
      Mesh Transform     5              5         3780     0.
           Index Set   801            801      1598436     0.
   IS L to G Mapping     2              2      1102568     0.
             Section   249            249       177288     0.
   Star Forest Graph   173            173       188592     0.
     Discrete System   116            116       111364     0.
           Weak Form   117            117        72072     0.
    GraphPartitioner    33             33        22704     0.
              Vector    54             54     19591688     0.
        Linear Space     5              5         3416     0.
          Dual Space    26             26        24336     0.
            FE Space     2              2         1576     0.
              Viewer     2              1          840     0.
      Preconditioner     1              1          872     0.
       Field over DM     1              1          704     0.

--- Event Stage 1: PCSetUp


--- Event Stage 2: KSP Solve only

========================================================================================================================
Average time to get PetscTime(): 5.11e-08
Average time for MPI_Barrier(): 4.3404e-06
Average time for zero size MPI_Send(): 1.0304e-05
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 5
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --download-hypre-configure-arguments=--enable-unified-memory --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-22 14:37:56 on login2 
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------

Using C compiler: cc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3   
Using Fortran compiler: ftn  -fPIC     
-----------------------------------------

Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/opt/rocm-4.5.0/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------

#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 5
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 14 unused database options. They are:
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01
-------------- next part --------------
DM Object: box 64 MPI processes
  type: plex
box in 3 dimensions:
  Number of 0-cells per rank: 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625 274625
  Number of 1-cells per rank: 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200 811200
  Number of 2-cells per rank: 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720 798720
  Number of 3-cells per rank: 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144 262144
Labels:
  celltype: 4 strata with value/size (0 (274625), 1 (811200), 4 (798720), 7 (262144))
  depth: 4 strata with value/size (0 (274625), 1 (811200), 2 (798720), 3 (262144))
  marker: 1 strata with value/size (1 (49530))
  Face Sets: 3 strata with value/size (1 (16129), 3 (16129), 6 (16129))
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 64 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 64 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 64 MPI processes
    type: mpiaijkokkos
    rows=133432831, cols=133432831
    total: nonzeros=8477185319, allocated nonzeros=8477185319
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 64 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 64 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 64 MPI processes
    type: mpiaijkokkos
    rows=133432831, cols=133432831
    total: nonzeros=8477185319, allocated nonzeros=8477185319
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 64 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 64 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 64 MPI processes
    type: mpiaijkokkos
    rows=133432831, cols=133432831
    total: nonzeros=8477185319, allocated nonzeros=8477185319
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher001 with 64 processors, by adams Sat Jan 22 10:12:55 2022
Using Petsc Development GIT revision: v3.16.3-682-g5f40ebe68c  GIT Date: 2022-01-22 09:12:56 -0500

                         Max       Max/Min     Avg       Total
Time (sec):           5.637e+02     1.000   5.637e+02
Objects:              2.158e+03     1.163   1.919e+03
Flop:                 1.958e+11     1.036   1.936e+11  1.239e+13
Flop/sec:             3.473e+08     1.036   3.434e+08  2.198e+10
MPI Messages:         1.656e+04     3.672   9.423e+03  6.031e+05
MPI Message Lengths:  8.942e+08     2.047   7.055e+04  4.255e+10
MPI Reductions:       1.991e+03     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 5.4950e+02  97.5%  4.9150e+12  39.7%  2.287e+05  37.9%  8.566e+04       46.0%  7.660e+02  38.5%
 1:         PCSetUp: 2.2723e-01   0.0%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
 2:  KSP Solve only: 1.4013e+01   2.5%  7.4764e+12  60.3%  3.744e+05  62.1%  6.133e+04       54.0%  1.206e+03  60.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           6 1.0 1.5874e+00 1.0 0.00e+00 0.0 1.4e+04 2.6e+03 2.1e+01  0  0  2  0  1   0  0  6  0  3     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSided         42 1.0 5.4477e+00 3.5 0.00e+00 0.0 1.0e+04 4.0e+00 4.2e+01  1  0  2  0  2   1  0  5  0  5     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSidedF         6 1.0 5.3493e+00 3.7 0.00e+00 0.0 2.2e+03 1.6e+06 6.0e+00  1  0  0  8  0   1  0  1 18  1     0       0      0 0.00e+00    0 0.00e+00  0
MatMult            48589241.7 5.3613e+00 1.1 5.37e+10 1.0 1.9e+05 6.0e+04 2.0e+00  1 27 32 27  0   1 69 84 59  0 632483   837518      1 4.48e-01    0 0.00e+00 100
MatAssemblyBegin      43 1.0 6.0115e+00 3.2 0.00e+00 0.0 2.2e+03 1.6e+06 6.0e+00  1  0  0  8  0   1  0  1 18  1     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd        43 1.0 2.0919e+00 2.3 4.71e+06 0.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  0   0  0  0  0  1   107       0      0 0.00e+00    0 0.00e+00  0
MatZeroEntries         3 1.0 4.5709e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatView                1 1.0 2.7663e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSetUp               1 1.0 2.1533e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               1 1.0 8.1507e+00 1.1 5.91e+10 1.0 1.9e+05 6.1e+04 6.0e+02  1 30 31 27 30   1 76 83 59 79 458636   779127      1 4.48e-01    0 0.00e+00 100
SNESSolve              1 1.0 1.8375e+02 1.0 6.85e+10 1.0 1.9e+05 7.0e+04 6.1e+02 33 35 32 31 31  33 88 84 68 80 23606   779111      3 1.83e+01    2 2.92e+01 86
SNESSetUp              1 1.0 6.4284e+01 1.0 0.00e+00 0.0 5.3e+03 7.7e+05 1.8e+01 11  0  1 10  1  12  0  2 21  2     0       0      0 0.00e+00    0 0.00e+00  0
SNESFunctionEval       2 1.0 3.2234e+01 1.0 6.33e+09 1.0 1.7e+03 5.1e+04 3.0e+00  6  3  0  0  0   6  8  1  0  0 12536   161648      3 3.46e+01    2 2.92e+01  0
SNESJacobianEval       2 1.0 4.4342e+02 1.0 1.21e+10 1.0 1.7e+03 2.2e+06 2.0e+00 79  6  0  9  0  81 16  1 19  0  1742       0      0 0.00e+00    2 2.92e+01  0
DMCreateInterp         1 1.0 2.7933e-03 1.1 8.29e+04 1.0 1.1e+03 8.0e+02 1.6e+01  0  0  0  0  1   0  0  0  0  2  1900       0      0 0.00e+00    0 0.00e+00  0
DMCreateMat            1 1.0 6.4266e+01 1.0 0.00e+00 0.0 5.3e+03 7.7e+05 1.8e+01 11  0  1 10  1  12  0  2 21  2     0       0      0 0.00e+00    0 0.00e+00  0
Mesh Partition         1 1.0 3.0373e-03 1.2 0.00e+00 0.0 3.2e+02 1.1e+02 8.0e+00  0  0  0  0  0   0  0  0  0  1     0       0      0 0.00e+00    0 0.00e+00  0
Mesh Migration         1 1.0 8.2140e-03 1.0 0.00e+00 0.0 1.8e+03 8.3e+01 2.9e+01  0  0  0  0  1   0  0  1  0  4     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartSelf         1 1.0 7.7963e-04117.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartLblInv       1 1.0 1.0695e-03 4.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartLblSF        1 1.0 4.3163e-04 3.1 0.00e+00 0.0 1.3e+02 5.6e+01 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartStrtSF       1 1.0 3.2071e-04 1.8 0.00e+00 0.0 6.3e+01 2.2e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPointSF          1 1.0 1.6480e-03 1.1 0.00e+00 0.0 1.3e+02 2.7e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterp          19 1.0 1.0223e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistribute       1 1.0 1.2842e-02 1.1 0.00e+00 0.0 2.2e+03 9.7e+01 3.7e+01  0  0  0  0  2   0  0  1  0  5     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistCones        1 1.0 6.0093e-04 1.4 0.00e+00 0.0 3.8e+02 1.4e+02 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistLabels       1 1.0 1.4810e-03 1.1 0.00e+00 0.0 9.0e+02 6.6e+01 2.4e+01  0  0  0  0  1   0  0  0  0  3     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistField        1 1.0 5.8186e-03 1.0 0.00e+00 0.0 4.4e+02 5.9e+01 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexStratify        34 1.0 1.2975e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  1     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexSymmetrize      34 1.0 1.7352e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPrealloc         1 1.0 6.4177e+01 1.0 0.00e+00 0.0 5.3e+03 7.7e+05 1.6e+01 11  0  1 10  1  12  0  2 21  2     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexResidualFE       2 1.0 2.9558e+01 1.1 6.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00  5  3  0  0  0   5  8  0  0  0 13625       0      0 0.00e+00    0 0.00e+00  0
DMPlexJacobianFE       2 1.0 4.4278e+02 1.0 1.21e+10 1.0 1.1e+03 3.2e+06 2.0e+00 78  6  0  8  0  80 16  0 18  0  1742       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterpFE         1 1.0 2.7124e-03 1.1 8.29e+04 1.0 1.1e+03 8.0e+02 1.6e+01  0  0  0  0  1   0  0  0  0  2  1957       0      0 0.00e+00    0 0.00e+00  0
SFSetGraph            46 1.0 1.0373e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetUp               36 1.0 7.7355e-01 1.3 0.00e+00 0.0 1.9e+04 7.8e+04 3.6e+01  0  0  3  3  2   0  0  8  7  5     0       0      0 0.00e+00    0 0.00e+00  0
SFBcastBegin          68 1.0 1.4809e+0067.7 0.00e+00 0.0 1.4e+04 4.8e+04 0.0e+00  0  0  2  2  0   0  0  6  3  0     0       0      1 1.20e+00    4 5.83e+01  0
SFBcastEnd            68 1.0 2.0285e+00 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 2.48e-02    0 0.00e+00  0
SFReduceBegin         17 1.0 2.1129e-0123.5 4.19e+06 1.0 4.5e+03 3.2e+05 0.0e+00  0  0  1  3  0   0  0  2  7  0  1263       0      2 3.34e+01    0 0.00e+00 100
SFReduceEnd           17 1.0 2.9884e+0028.2 9.91e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     2       0      0 0.00e+00    0 0.00e+00 100
SFFetchOpBegin         2 1.0 1.1893e-02187.8 0.00e+00 0.0 5.6e+02 8.2e+05 0.0e+00  0  0  0  1  0   0  0  0  2  0     0       0      0 0.00e+00    0 0.00e+00  0
SFFetchOpEnd           2 1.0 1.5195e-01 4.8 0.00e+00 0.0 5.6e+02 8.2e+05 0.0e+00  0  0  0  1  0   0  0  0  2  0     0       0      0 0.00e+00    0 0.00e+00  0
SFCreateEmbed          9 1.0 4.7116e-0135.7 0.00e+00 0.0 2.3e+03 2.4e+03 0.0e+00  0  0  0  0  0   0  0  1  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFDistSection          9 1.0 7.6270e-02 3.2 0.00e+00 0.0 4.1e+03 2.3e+04 1.1e+01  0  0  1  0  1   0  0  2  0  1     0       0      0 0.00e+00    0 0.00e+00  0
SFSectionSF           17 1.0 2.8215e-01 2.8 0.00e+00 0.0 6.4e+03 7.4e+04 1.7e+01  0  0  1  1  1   0  0  3  2  2     0       0      0 0.00e+00    0 0.00e+00  0
SFRemoteOff            8 1.0 5.0376e-0113.6 0.00e+00 0.0 7.3e+03 4.3e+03 5.0e+00  0  0  1  0  0   0  0  3  0  1     0       0      0 0.00e+00    0 0.00e+00  0
SFPack               294 1.0 7.1478e-0126.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      2 5.96e-01    0 0.00e+00  0
SFUnpack             296 1.0 2.1227e-0110.0 4.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1279       0      0 2.48e-02    0 0.00e+00 100
VecTDot              401 1.0 1.7037e+00 1.9 1.68e+09 1.0 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 20   0  2  0  0 52 62812   462246      0 0.00e+00    0 0.00e+00 100
VecNorm              201 1.0 1.0286e+00 3.5 8.43e+08 1.0 0.0e+00 0.0e+00 2.0e+02  0  0  0  0 10   0  1  0  0 26 52149   557209      0 0.00e+00    0 0.00e+00 100
VecCopy                2 1.0 1.8254e-03 9.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                55 1.0 3.5030e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              400 1.0 4.4089e-0113.2 1.68e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  2  0  0  0 242113   443819      0 0.00e+00    0 0.00e+00 100
VecAYPX              199 1.0 8.2717e-0131.6 8.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 64203   79040      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     201 1.0 2.0538e-01 8.8 4.22e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 130584   266964      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      201 1.0 9.4593e-01 5.2 0.00e+00 0.0 1.9e+05 6.0e+04 2.0e+00  0  0 32 27  0   0  0 84 59  0     0       0      1 4.48e-01    0 0.00e+00  0
VecScatterEnd        201 1.0 3.2437e+00 8.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DualSpaceSetUp         2 1.0 3.9278e-03 1.3 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    29       0      0 0.00e+00    0 0.00e+00  0
FESetUp                2 1.0 4.9567e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCSetUp                1 1.0 1.8055e-05 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply              201 1.0 5.3673e-01 1.5 4.22e+08 1.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  1  0  0  0 49969   216991      0 0.00e+00    0 0.00e+00 100

--- Event Stage 1: PCSetUp

PCSetUp                1 1.0 2.6500e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0 100  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

--- Event Stage 2: KSP Solve only

MatMult              400 1.0 1.0094e+01 1.2 1.07e+11 1.0 3.7e+05 6.1e+04 0.0e+00  2 55 62 54  0  68 91100100  0 671849   857147      0 0.00e+00    0 0.00e+00 100
MatView                2 1.0 4.5257e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               2 1.0 1.4591e+01 1.1 1.18e+11 1.0 3.7e+05 6.1e+04 1.2e+03  2 60 62 54 60 100100100100100 512399   804048      0 0.00e+00    0 0.00e+00 100
SFPack               400 1.0 2.4545e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFUnpack             400 1.0 9.4637e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecTDot              802 1.0 3.0577e+00 2.1 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02  0  2  0  0 40  13  3  0  0 67 69996   488328      0 0.00e+00    0 0.00e+00 100
VecNorm              402 1.0 1.9597e+00 3.4 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 20   6  1  0  0 33 54744   571507      0 0.00e+00    0 0.00e+00 100
VecCopy                4 1.0 1.7143e-0228.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                 4 1.0 3.8051e-0316.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              800 1.0 8.6160e-0113.6 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   6  3  0  0  0 247787   448304      0 0.00e+00    0 0.00e+00 100
VecAYPX              398 1.0 1.6831e+0031.1 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   5  1  0  0  0 63107   77030      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     402 1.0 3.8729e-01 9.3 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   2  1  0  0  0 138502   262413      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      400 1.0 1.1947e+0035.1 0.00e+00 0.0 3.7e+05 6.1e+04 0.0e+00  0  0 62 54  0   5  0100100  0     0       0      0 0.00e+00    0 0.00e+00  0
VecScatterEnd        400 1.0 6.2969e+00 8.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  10  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply              402 1.0 3.8758e-01 9.3 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   2  1  0  0  0 138396   262413      0 0.00e+00    0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container    33             33        19008     0.
                SNES     1              1         1540     0.
              DMSNES     1              1          688     0.
       Krylov Solver     1              1         1664     0.
     DMKSP interface     1              1          656     0.
              Matrix    76             76   1627827176     0.
    Distributed Mesh    72             72     58971680     0.
            DM Label   180            180       113760     0.
          Quadrature   148            148        87616     0.
      Mesh Transform     6              6         4536     0.
           Index Set   833            833      4238868     0.
   IS L to G Mapping     2              2      8590824     0.
             Section   256            256       182272     0.
   Star Forest Graph   179            179       195360     0.
     Discrete System   121            121       116164     0.
           Weak Form   122            122        75152     0.
    GraphPartitioner    34             34        23392     0.
              Vector    55             55    157137560     0.
        Linear Space     5              5         3416     0.
          Dual Space    26             26        24336     0.
            FE Space     2              2         1576     0.
              Viewer     2              1          840     0.
      Preconditioner     1              1          872     0.
       Field over DM     1              1          704     0.

--- Event Stage 1: PCSetUp


--- Event Stage 2: KSP Solve only

========================================================================================================================
Average time to get PetscTime(): 5.71e-08
Average time for MPI_Barrier(): 3.9216e-06
Average time for zero size MPI_Send(): 1.05958e-05
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 6
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --download-hypre-configure-arguments=--enable-unified-memory --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-22 14:37:56 on login2 
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------

Using C compiler: cc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3   
Using Fortran compiler: ftn  -fPIC     
-----------------------------------------

Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/opt/rocm-4.5.0/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------

#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 4,4,4
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 6
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 4,4,4
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 14 unused database options. They are:
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01


More information about the petsc-dev mailing list