[petsc-dev] Kokkos/Crusher perforance

Mark Adams mfadams at lbl.gov
Tue Jan 25 12:40:19 CST 2022


Here are two runs, without and with -log_view, respectively.
My new timer is "Solve time = ...."
About 10% difference

On Tue, Jan 25, 2022 at 12:53 PM Mark Adams <mfadams at lbl.gov> wrote:

> BTW, a -device_view would be great.
>
> On Tue, Jan 25, 2022 at 12:30 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>>
>>
>> On Tue, Jan 25, 2022 at 11:56 AM Jed Brown <jed at jedbrown.org> wrote:
>>
>>> Barry Smith <bsmith at petsc.dev> writes:
>>>
>>> >   Thanks Mark, far more interesting. I've improved the formatting to
>>> make it easier to read (and fixed width font for email reading)
>>> >
>>> >   * Can you do same run with say 10 iterations of Jacobi PC?
>>> >
>>> >   * PCApply performance (looks like GAMG) is terrible! Problems too
>>> small?
>>>
>>> This is -pc_type jacobi.
>>>
>>> >   * VecScatter time is completely dominated by SFPack! Junchao what's
>>> up with that? Lots of little kernels in the PCApply? PCJACOBI run will help
>>> clarify where that is coming from.
>>>
>>> It's all in MatMult.
>>>
>>> I'd like to see a run that doesn't wait for the GPU.
>>>
>>>
>> Not sure what you mean. Can I do that?
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220125/4475ef0e/attachment-0001.html>
-------------- next part --------------
Script started on 2022-01-25 13:33:45-05:00 [TERM="xterm-256color" TTY="/dev/pts/0" COLUMNS="296" LINES="100"]
13:33 adams/aijkokkos-gpu-logging *= crusher:/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data$ bash -x run_crusher_jac.sbatch
+ '[' -z '' ']'
+ case "$-" in
+ __lmod_vx=x
+ '[' -n x ']'
+ set +x
Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for this output (/usr/share/lmod/lmod/init/bash)
Shell debugging restarted
+ unset __lmod_vx
+ NG=8
+ NC=1
+ date
Tue 25 Jan 2022 01:33:53 PM EST
+ EXTRA='-dm_view -log_viewx -ksp_view -use_gpu_aware_mpi true'
+ HYPRE_EXTRA='-pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi -pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_coarsen_type PMIS -pc_hypre_boomeramg_no_CF'
+ HYPRE_EXTRA='-pc_hypre_boomeramg_no_CF true -pc_hypre_boomeramg_strong_threshold 0.75 -pc_hypre_boomeramg_agg_nl 1 -pc_hypre_boomeramg_coarsen_type HMIS -pc_hypre_boomeramg_interp_type ext+i '
+ for REFINE in 5
+ for NPIDX in 1
+ let 'N1 = 1 * 1'
++ bc -l
+ PG=2.00000000000000000000
++ printf %.0f 2.00000000000000000000
+ PG=2
+ let 'NCC = 8 / 1'
+ let 'N4 = 2 * 1'
+ let 'NODES = 1 * 1 * 1'
+ let 'N = 1 * 1 * 8'
+ echo n= 8 ' NODES=' 1 ' NC=' 1 ' PG=' 2
n= 8  NODES= 1  NC= 1  PG= 2
++ printf %03d 1
+ foo=001
+ srun -n8 -N1 --ntasks-per-gpu=1 --gpu-bind=closest -c 8 ../ex13 -dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid 2,2,2 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1 -dm_refine 5 -dm_view -log_viewx -ksp_view -use_gpu_aware_mpi true -dm_mat_type aijkokkos -dm_vec_type kokkos -pc_type jacobi
+ tee jac_out_001_kokkos_Crusher_5_1_noview.txt
DM Object: box 8 MPI processes
  type: plex
box in 3 dimensions:
  Number of 0-cells per rank: 35937 35937 35937 35937 35937 35937 35937 35937
  Number of 1-cells per rank: 104544 104544 104544 104544 104544 104544 104544 104544
  Number of 2-cells per rank: 101376 101376 101376 101376 101376 101376 101376 101376
  Number of 3-cells per rank: 32768 32768 32768 32768 32768 32768 32768 32768
Labels:
  celltype: 4 strata with value/size (0 (35937), 1 (104544), 4 (101376), 7 (32768))
  depth: 4 strata with value/size (0 (35937), 1 (104544), 2 (101376), 3 (32768))
  marker: 1 strata with value/size (1 (12474))
  Face Sets: 3 strata with value/size (1 (3969), 3 (3969), 6 (3969))
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=2048383, cols=2048383
    total: nonzeros=127263527, allocated nonzeros=127263527
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=2048383, cols=2048383
    total: nonzeros=127263527, allocated nonzeros=127263527
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=2048383, cols=2048383
    total: nonzeros=127263527, allocated nonzeros=127263527
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
Solve time: 0.341614
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 2,2,2
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 5
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_viewx
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 2,2,2
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 15 unused database options. They are:
Option left: name:-log_viewx (no value)
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01
+ srun -n8 -N1 --ntasks-per-gpu=1 --gpu-bind=closest -c 8 ../ex13 -dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid 2,2,2 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1 -dm_refine 5 -dm_view -log_viewx -ksp_view -use_gpu_aware_mpi true -log_view -dm_mat_type aijkokkos -dm_vec_type kokkos -pc_type jacobi
+ tee jac_out_001_kokkos_Crusher_5_1.txt
DM Object: box 8 MPI processes
  type: plex
box in 3 dimensions:
  Number of 0-cells per rank: 35937 35937 35937 35937 35937 35937 35937 35937
  Number of 1-cells per rank: 104544 104544 104544 104544 104544 104544 104544 104544
  Number of 2-cells per rank: 101376 101376 101376 101376 101376 101376 101376 101376
  Number of 3-cells per rank: 32768 32768 32768 32768 32768 32768 32768 32768
Labels:
  celltype: 4 strata with value/size (0 (35937), 1 (104544), 4 (101376), 7 (32768))
  depth: 4 strata with value/size (0 (35937), 1 (104544), 2 (101376), 3 (32768))
  marker: 1 strata with value/size (1 (12474))
  Face Sets: 3 strata with value/size (1 (3969), 3 (3969), 6 (3969))
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=2048383, cols=2048383
    total: nonzeros=127263527, allocated nonzeros=127263527
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=2048383, cols=2048383
    total: nonzeros=127263527, allocated nonzeros=127263527
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=2048383, cols=2048383
    total: nonzeros=127263527, allocated nonzeros=127263527
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
Solve time: 0.373754
**************************************** ***********************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: -------------------------------------------------------------------

/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher002 with 8 processors, by adams Tue Jan 25 13:36:10 2022
Using Petsc Development GIT revision: v3.16.3-696-g46640c56cb  GIT Date: 2022-01-25 09:20:51 -0500

                         Max       Max/Min     Avg       Total
Time (sec):           6.792e+01     1.000   6.792e+01
Objects:              1.920e+03     1.028   1.877e+03
Flop:                 2.402e+10     1.054   2.340e+10  1.872e+11
Flop/sec:             3.537e+08     1.054   3.445e+08  2.756e+09
MPI Messages:         4.778e+03     1.063   4.552e+03  3.642e+04
MPI Message Lengths:  1.120e+08     1.030   2.416e+04  8.799e+08
MPI Reductions:       1.988e+03     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 6.7566e+01  99.5%  7.4725e+10  39.9%  1.402e+04  38.5%  2.884e+04       45.9%  7.630e+02  38.4%
 1:         PCSetUp: 1.5145e-02   0.0%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
 2:  KSP Solve only: 3.4260e-01   0.5%  1.1247e+11  60.1%  2.240e+04  61.5%  2.123e+04       54.1%  1.206e+03  60.7%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           5 1.0 1.5201e-01 1.0 0.00e+00 0.0 7.8e+02 9.9e+02 1.8e+01  0  0  2  0  1   0  0  6  0  2     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSided         40 1.0 3.2703e-0110.9 0.00e+00 0.0 7.1e+02 4.0e+00 4.0e+01  0  0  2  0  2   0  0  5  0  5     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSidedF         6 1.0 3.0504e-0113.9 0.00e+00 0.0 1.5e+02 4.8e+05 6.0e+00  0  0  0  8  0   0  0  1 18  1     0       0      0 0.00e+00    0 0.00e+00  0
MatMult            12109 1.0 1.2460e-01 1.1 6.56e+09 1.1 1.1e+04 2.1e+04 2.0e+00  0 27 32 27  0   0 68 82 59  0 408579   664816      1 7.43e-02    0 0.00e+00 100
MatAssemblyBegin      43 1.0 3.1584e-01 3.8 0.00e+00 0.0 1.5e+02 4.8e+05 6.0e+00  0  0  0  8  0   0  0  1 18  1     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd        43 1.0 1.7832e-01 3.9 1.16e+06 0.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  0   0  0  0  0  1    25       0      0 0.00e+00    0 0.00e+00  0
MatZeroEntries         3 1.0 4.2839e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatView                1 1.0 9.1165e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSetUp               1 1.0 1.2154e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               1 1.0 2.7111e-01 1.3 7.24e+09 1.1 1.1e+04 2.1e+04 6.0e+02  0 30 31 27 30   0 75 81 59 79 207437   462502      1 7.43e-02    0 0.00e+00 100
SNESSolve              1 1.0 2.9523e+01 1.0 8.41e+09 1.1 1.1e+04 2.4e+04 6.1e+02 43 35 31 31 31  44 88 82 68 80  2224   462047      3 2.15e+00    2 4.10e+00 86
SNESSetUp              1 1.0 5.9730e+00 1.0 0.00e+00 0.0 3.6e+02 2.3e+05 1.8e+01  9  0  1 10  1   9  0  3 21  2     0       0      0 0.00e+00    0 0.00e+00  0
SNESFunctionEval       2 1.0 2.0096e+00 1.1 7.96e+08 1.0 1.1e+02 1.5e+04 3.0e+00  3  3  0  0  0   3  9  1  0  0  3170   20058      3 4.12e+00    2 4.10e+00  0
SNESJacobianEval       2 1.0 5.9313e+01 1.0 1.52e+09 1.0 1.1e+02 6.5e+05 2.0e+00 87  6  0  8  0  88 16  1 18  0   204       0      0 0.00e+00    2 4.10e+00  0
DMCreateInterp         1 1.0 8.9532e-04 1.1 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01  0  0  0  0  1   0  0  1  0  2   741       0      0 0.00e+00    0 0.00e+00  0
DMCreateMat            1 1.0 5.9724e+00 1.0 0.00e+00 0.0 3.6e+02 2.3e+05 1.8e+01  9  0  1 10  1   9  0  3 21  2     0       0      0 0.00e+00    0 0.00e+00  0
Mesh Partition         1 1.0 7.0985e-04 1.1 0.00e+00 0.0 3.5e+01 1.1e+02 8.0e+00  0  0  0  0  0   0  0  0  0  1     0       0      0 0.00e+00    0 0.00e+00  0
Mesh Migration         1 1.0 3.3181e-03 1.0 0.00e+00 0.0 2.0e+02 8.2e+01 2.9e+01  0  0  1  0  1   0  0  1  0  4     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartSelf         1 1.0 1.0845e-0414.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartLblInv       1 1.0 2.1960e-04 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartLblSF        1 1.0 1.1283e-04 1.5 0.00e+00 0.0 1.4e+01 5.6e+01 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartStrtSF       1 1.0 1.3113e-04 1.1 0.00e+00 0.0 7.0e+00 2.2e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPointSF          1 1.0 2.1087e-04 1.1 0.00e+00 0.0 1.4e+01 2.7e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterp          19 1.0 5.6904e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistribute       1 1.0 4.2444e-03 1.0 0.00e+00 0.0 2.5e+02 9.7e+01 3.7e+01  0  0  1  0  2   0  0  2  0  5     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistCones        1 1.0 1.2005e-04 1.0 0.00e+00 0.0 4.2e+01 1.4e+02 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistLabels       1 1.0 3.1844e-04 1.0 0.00e+00 0.0 1.0e+02 6.6e+01 2.4e+01  0  0  0  0  1   0  0  1  0  3     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistField        1 1.0 2.7282e-03 1.0 0.00e+00 0.0 4.9e+01 5.9e+01 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexStratify        33 1.0 5.5198e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00  0  0  0  0  0   0  0  0  0  1     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexSymmetrize      33 1.0 1.2717e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPrealloc         1 1.0 5.9666e+00 1.0 0.00e+00 0.0 3.6e+02 2.3e+05 1.6e+01  9  0  1 10  1   9  0  3 21  2     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexResidualFE       2 1.0 1.5728e+00 1.0 7.87e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  3  0  0  0   2  8  0  0  0  4003       0      0 0.00e+00    0 0.00e+00  0
DMPlexJacobianFE       2 1.0 5.9201e+01 1.0 1.51e+09 1.0 7.6e+01 9.7e+05 2.0e+00 87  6  0  8  0  87 16  1 18  0   203       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterpFE         1 1.0 8.6844e-04 1.1 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01  0  0  0  0  1   0  0  1  0  2   764       0      0 0.00e+00    0 0.00e+00  0
SFSetGraph            43 1.0 7.5083e-04 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetUp               34 1.0 4.7128e-02 1.2 0.00e+00 0.0 1.3e+03 2.4e+04 3.4e+01  0  0  3  3  2   0  0  9  7  4     0       0      0 0.00e+00    0 0.00e+00  0
SFBcastBegin          65 1.0 1.7854e-0146.9 0.00e+00 0.0 9.8e+02 1.4e+04 0.0e+00  0  0  3  2  0   0  0  7  3  0     0       0      1 2.44e-02    4 8.19e+00  0
SFBcastEnd            65 1.0 2.7145e-0137.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFReduceBegin         16 1.0 1.3564e-0187.4 5.24e+05 1.0 2.9e+02 1.0e+05 0.0e+00  0  0  1  3  0   0  0  2  7  0    30       0      2 4.10e+00    0 0.00e+00 100
SFReduceEnd           16 1.0 1.9102e-0125.0 2.50e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1       0      0 0.00e+00    0 0.00e+00 100
SFFetchOpBegin         2 1.0 5.4600e-04124.1 0.00e+00 0.0 3.8e+01 2.5e+05 0.0e+00  0  0  0  1  0   0  0  0  2  0     0       0      0 0.00e+00    0 0.00e+00  0
SFFetchOpEnd           2 1.0 2.6090e-03 1.6 0.00e+00 0.0 3.8e+01 2.5e+05 0.0e+00  0  0  0  1  0   0  0  0  2  0     0       0      0 0.00e+00    0 0.00e+00  0
SFCreateEmbed          8 1.0 9.0613e-02142.5 0.00e+00 0.0 1.4e+02 8.5e+02 0.0e+00  0  0  0  0  0   0  0  1  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFDistSection          9 1.0 3.4540e-03 2.0 0.00e+00 0.0 3.1e+02 6.5e+03 1.1e+01  0  0  1  0  1   0  0  2  0  1     0       0      0 0.00e+00    0 0.00e+00  0
SFSectionSF           16 1.0 1.8106e-02 1.7 0.00e+00 0.0 4.8e+02 2.0e+04 1.6e+01  0  0  1  1  1   0  0  3  2  2     0       0      0 0.00e+00    0 0.00e+00  0
SFRemoteOff            7 1.0 9.2144e-0240.8 0.00e+00 0.0 4.2e+02 1.6e+03 4.0e+00  0  0  1  0  0   0  0  3  0  1     0       0      0 0.00e+00    0 0.00e+00  0
SFPack               290 1.0 1.5857e-0163.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      2 9.87e-02    0 0.00e+00  0
SFUnpack             292 1.0 1.3520e-0165.8 5.49e+05 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    31       0      0 0.00e+00    0 0.00e+00 100
VecTDot              401 1.0 3.9918e-02 1.6 2.10e+08 1.0 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 20   0  2  0  0 53 41154   98745      0 0.00e+00    0 0.00e+00 100
VecNorm              201 1.0 7.9221e-02 5.3 1.05e+08 1.0 0.0e+00 0.0e+00 2.0e+02  0  0  0  0 10   0  1  0  0 26 10394   78440      0 0.00e+00    0 0.00e+00 100
VecCopy                2 1.0 1.0405e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                54 1.0 1.3159e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              400 1.0 1.1370e-02 1.1 2.10e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  2  0  0  0 144122   202169      0 0.00e+00    0 0.00e+00 100
VecAYPX              199 1.0 5.3881e-03 1.1 1.04e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 151307   226976      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     201 1.0 5.8045e-03 1.1 5.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 70933   102845      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      201 1.0 2.7933e-02 4.7 0.00e+00 0.0 1.1e+04 2.1e+04 2.0e+00  0  0 32 27  0   0  0 82 59  0     0       0      1 7.43e-02    0 0.00e+00  0
VecScatterEnd        201 1.0 1.8493e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DualSpaceSetUp         2 1.0 2.4687e-03 1.0 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     6       0      0 0.00e+00    0 0.00e+00  0
FESetUp                2 1.0 1.0635e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCSetUp                1 1.0 4.3190e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply              201 1.0 2.7189e-02 1.0 5.27e+07 1.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  1  0  0  0 15143   43140      0 0.00e+00    0 0.00e+00 100

--- Event Stage 1: PCSetUp

PCSetUp                1 1.0 1.6281e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0 100  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

--- Event Stage 2: KSP Solve only

MatMult              400 1.0 1.9253e-01 1.1 1.31e+10 1.1 2.2e+04 2.1e+04 0.0e+00  0 54 62 54  0  54 91100100  0 528807   717110      0 0.00e+00    0 0.00e+00 100
MatView                2 1.0 8.5814e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               2 1.0 3.7359e-01 1.2 1.45e+10 1.1 2.2e+04 2.1e+04 1.2e+03  1 60 62 54 61 100100100100100 301067   520834      0 0.00e+00    0 0.00e+00 100
SFPack               400 1.0 1.3133e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFUnpack             400 1.0 3.9090e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecTDot              802 1.0 6.5250e-02 1.5 4.20e+08 1.0 0.0e+00 0.0e+00 8.0e+02  0  2  0  0 40  15  3  0  0 67 50354   108871      0 0.00e+00    0 0.00e+00 100
VecNorm              402 1.0 9.4344e-02 3.4 2.11e+08 1.0 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 20  19  1  0  0 33 17456   82582      0 0.00e+00    0 0.00e+00 100
VecCopy                4 1.0 1.7995e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                 4 1.0 1.7595e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              800 1.0 2.0554e-02 1.1 4.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   6  3  0  0  0 159451   231664      0 0.00e+00    0 0.00e+00 100
VecAYPX              398 1.0 1.0453e-02 1.1 2.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   3  1  0  0  0 155981   224425      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     402 1.0 1.1216e-02 1.1 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  1  0  0  0 73420   107169      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      400 1.0 1.7302e-02 1.6 0.00e+00 0.0 2.2e+04 2.1e+04 0.0e+00  0  0 62 54  0   4  0100100  0     0       0      0 0.00e+00    0 0.00e+00  0
VecScatterEnd        400 1.0 1.3178e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply              402 1.0 1.1307e-02 1.1 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  1  0  0  0 72825   107169      0 0.00e+00    0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container    32             32        18432     0.
                SNES     1              1         1540     0.
              DMSNES     1              1          688     0.
       Krylov Solver     1              1         1664     0.
     DMKSP interface     1              1          656     0.
              Matrix    75             75    195551600     0.
    Distributed Mesh    70             70      7826872     0.
            DM Label   172            172       108704     0.
          Quadrature   148            148        87616     0.
      Mesh Transform     5              5         3780     0.
           Index Set   633            633      1440932     0.
   IS L to G Mapping     2              2      1100416     0.
             Section   249            249       177288     0.
   Star Forest Graph   173            173       188592     0.
     Discrete System   116            116       111364     0.
           Weak Form   117            117        72072     0.
    GraphPartitioner    33             33        22704     0.
              Vector    54             54     19589336     0.
        Linear Space     5              5         3416     0.
          Dual Space    26             26        24336     0.
            FE Space     2              2         1576     0.
              Viewer     2              1          840     0.
      Preconditioner     1              1          872     0.
       Field over DM     1              1          704     0.

--- Event Stage 1: PCSetUp


--- Event Stage 2: KSP Solve only

========================================================================================================================
Average time to get PetscTime(): 3.5e-08
Average time for MPI_Barrier(): 2.679e-06
Average time for zero size MPI_Send(): 1.07156e-05
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 2,2,2
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 5
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-log_viewx
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 2,2,2
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 --download-p4est=1 --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4 PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-25 14:29:13 on login2 
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------

Using C compiler: cc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O   
Using Fortran compiler: ftn  -fPIC -g     
-----------------------------------------

Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/include -I/opt/rocm-4.5.0/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -L/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lp4est -lsc -lz -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------

#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 2,2,2
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 5
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-log_viewx
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 2,2,2
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 15 unused database options. They are:
Option left: name:-log_viewx (no value)
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01
+ date
Tue 25 Jan 2022 01:36:10 PM EST
13:36 adams/aijkokkos-gpu-logging *= crusher:/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data$ exit
exit

Script done on 2022-01-25 13:36:17-05:00 [COMMAND_EXIT_CODE="0"]


More information about the petsc-dev mailing list