[petsc-dev] Kokkos/Crusher perforance

Mark Adams mfadams at lbl.gov
Sun Jan 23 22:24:33 CST 2022


Ugh, try again. Still a big difference, but less.  Mat-vec does not change
much.

On Sun, Jan 23, 2022 at 7:12 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>  You have debugging turned on on crusher but not permutter
>
> On Jan 23, 2022, at 6:37 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> * Perlmutter is roughly 5x faster than Crusher on the one node 2M eq test.
> (small)
> This is with 8 processes.
>
> * The next largest version of this test, 16M eq total and 8 processes,
> fails in memory allocation in the mat-mult setup in the Kokkos Mat.
>
> * If I try to run with 64 processes on Perlmutter I get this error in
> initialization. These nodes have 160 Gb of memory.
> (I assume this is related to these large memory requirements from loading
> packages, etc....)
>
> Thanks,
> Mark
>
> + srun -n64 -N1 --cpu-bind=cores --ntasks-per-core=1 ../ex13
> -dm_plex_box_faces 4,4,4 -petscpartitioner_simple_process_grid 4,4,4
> -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1
> -dm_refine 6 -dm_view -pc_type jacobi -log
> _view -ksp_view -use_gpu_aware_mpi false -dm_mat_type aijkokkos
> -dm_vec_type kokkos -log_trace
> + tee jac_out_001_kokkos_Perlmutter_6_8.txt
> [48]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [48]PETSC ERROR: GPU error
> [48]PETSC ERROR: cuda error 2 (cudaErrorMemoryAllocation) : out of memory
> [48]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [48]PETSC ERROR: Petsc Development GIT revision: v3.16.3-683-gbc458ed4d8
>  GIT Date: 2022-01-22 12:18:02 -0600
> [48]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/data/../ex13 on
> a arch-perlmutter-opt-gcc-kokkos-cuda named nid001424 by madams Sun Jan 23
> 15:19:56 2022
> [48]PETSC ERROR: Configure options --CFLAGS="   -g -DLANDAU_DIM=2
> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2
> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler
> -rdynamic -DLANDAU_DIM=2 -DLAN
> DAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --with-cc=cc --with-cxx=CC
> --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91
> --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc
> --COPTFLAGS="   -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS="   -O3"
>  --with-debugging=0 --download-metis --download-parmetis --with-cuda=1
> --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1
> --with-zlib=1 --download-kokkos --download-kokkos-kernels
> --with-kokkos-kernels-tpl=0 --with-
> make-np=8 PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda
> [48]PETSC ERROR: #1 initialize() at
> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:72
> [48]PETSC ERROR: #2 initialize() at
> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:343
> [48]PETSC ERROR: #3 PetscDeviceInitializeTypeFromOptions_Private() at
> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:319
> [48]PETSC ERROR: #4 PetscDeviceInitializeFromOptions_Internal() at
> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:449
> [48]PETSC ERROR: #5 PetscInitialize_Common() at
> /global/u2/m/madams/petsc/src/sys/objects/pinit.c:963
> [48]PETSC ERROR: #6 PetscInitialize() at
> /global/u2/m/madams/petsc/src/sys/objects/pinit.c:1238
>
>
> On Sun, Jan 23, 2022 at 8:58 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>>
>>
>> On Sat, Jan 22, 2022 at 6:22 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>    I cleaned up Mark's last run and put it in a fixed-width font. I
>>> realize this may be too difficult but it would be great to have identical
>>> runs to compare with on Summit.
>>>
>>
>> I was planning on running this on Perlmutter today, as well as some
>> sanity checks like all GPUs are being used. I'll try PetscDeviceView.
>>
>> Junchao modified the timers and all GPU > CPU now, but he seemed to move
>> the timers more outside and Barry wants them tight on the "kernel".
>> I think Junchao is going to work on that so I will hold off.
>> (I removed the the Kokkos wait stuff and seemed to run a little faster
>> but I am not sure how deterministic the timers are, and I did a test with
>> GAMG and it was fine.)
>>
>>
>>>
>>>    As Jed noted Scatter takes a long time but the pack and unpack take
>>> no time? Is this not timed if using Kokkos?
>>>
>>>
>>> --- Event Stage 2: KSP Solve only
>>>
>>> MatMult              400 1.0 8.8003e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04
>>> 0.0e+00  2 55 61 54  0  70 91100100   95,058   132,242      0 0.00e+00    0
>>> 0.00e+00 100
>>> VecScatterBegin      400 1.0 1.3391e+00 2.6 0.00e+00 0.0 2.2e+04 8.5e+04
>>> 0.0e+00  0  0 61 54  0   7  0100100        0         0      0 0.00e+00    0
>>> 0.00e+00  0
>>> VecScatterEnd        400 1.0 1.3240e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   9  0  0  0        0         0      0 0.00e+00    0
>>> 0.00e+00  0
>>> SFPack               400 1.0 1.8276e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0        0         0      0 0.00e+00    0
>>> 0.00e+00  0
>>> SFUnpack             400 1.0 6.2653e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0        0         0      0 0.00e+00    0
>>> 0.00e+00  0
>>>
>>> KSPSolve               2 1.0 1.2540e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04
>>> 1.2e+03  3 60 61 54 60 100100100      73,592   116,796      0 0.00e+00    0
>>> 0.00e+00 100
>>> VecTDot              802 1.0 1.3551e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00
>>> 8.0e+02  0  2  0  0 40  10  3  0      19,627    52,599      0 0.00e+00    0
>>> 0.00e+00 100
>>> VecNorm              402 1.0 9.0151e-01 2.2 1.69e+09 1.0 0.0e+00 0.0e+00
>>> 4.0e+02  0  1  0  0 20   5  1  0  0   14,788   125,477      0 0.00e+00    0
>>> 0.00e+00 100
>>> VecAXPY              800 1.0 8.2617e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  2  0  0  0   7  3  0  0   32,112    61,644      0 0.00e+00    0
>>> 0.00e+00 100
>>> VecAYPX              398 1.0 8.1525e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  1  0  0  0   5  1  0  0   16,190    20,689      0 0.00e+00    0
>>> 0.00e+00 100
>>> VecPointwiseMult     402 1.0 3.5694e-01 1.0 8.43e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   3  1  0  0   18,675    38,633      0 0.00e+00    0
>>> 0.00e+00 100
>>>
>>>
>>>
>>> On Jan 22, 2022, at 12:40 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> And I have a new MR with if you want to see what I've done so far.
>>>
>>>
>>> <jac_out_001_kokkos_Crusher_6_1_notpl.txt>
> <jac_out_001_kokkos_Perlmutter_6_1.txt>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220123/7b130d50/attachment-0001.html>
-------------- next part --------------
DM Object: box 8 MPI processes
  type: plex
box in 3 dimensions:
  Number of 0-cells per rank: 274625 274625 274625 274625 274625 274625 274625 274625
  Number of 1-cells per rank: 811200 811200 811200 811200 811200 811200 811200 811200
  Number of 2-cells per rank: 798720 798720 798720 798720 798720 798720 798720 798720
  Number of 3-cells per rank: 262144 262144 262144 262144 262144 262144 262144 262144
Labels:
  celltype: 4 strata with value/size (0 (274625), 1 (811200), 4 (798720), 7 (262144))
  depth: 4 strata with value/size (0 (274625), 1 (811200), 2 (798720), 3 (262144))
  marker: 1 strata with value/size (1 (49530))
  Face Sets: 3 strata with value/size (1 (16129), 3 (16129), 6 (16129))
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=16581375, cols=16581375
    total: nonzeros=1045678375, allocated nonzeros=1045678375
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=16581375, cols=16581375
    total: nonzeros=1045678375, allocated nonzeros=1045678375
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
  Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=16581375, cols=16581375
    total: nonzeros=1045678375, allocated nonzeros=1045678375
    total number of mallocs used during MatSetValues calls=0
      not using I-node (on process 0) routines
**************************************** ***********************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: -------------------------------------------------------------------



      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      # This code was compiled with GPU support and you've     #
      # created PETSc/GPU objects, but you intentionally used  #
      # -use_gpu_aware_mpi 0, such that PETSc had to copy data #
      # from GPU to CPU for communication. To get meaningfull  #
      # timing results, please use GPU-aware MPI instead.      #
      ##########################################################


/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher017 with 8 processors, by adams Sun Jan 23 23:11:16 2022
Using Petsc Development GIT revision: v3.16.3-683-gbc458ed4d8  GIT Date: 2022-01-22 12:18:02 -0600

                         Max       Max/Min     Avg       Total
Time (sec):           5.056e+02     1.000   5.056e+02
Objects:              1.990e+03     1.027   1.947e+03
Flop:                 1.940e+11     1.027   1.915e+11  1.532e+12
Flop/sec:             3.837e+08     1.027   3.787e+08  3.029e+09
MPI Messages:         4.806e+03     1.066   4.571e+03  3.657e+04
MPI Message Lengths:  4.434e+08     1.015   9.611e+04  3.515e+09
MPI Reductions:       1.991e+03     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 4.9303e+02  97.5%  6.0875e+11  39.7%  1.417e+04  38.7%  1.143e+05       46.1%  7.660e+02  38.5%
 1:         PCSetUp: 1.8248e-01   0.0%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
 2:  KSP Solve only: 1.2366e+01   2.4%  9.2287e+11  60.3%  2.240e+04  61.3%  8.459e+04       53.9%  1.206e+03  60.6%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           6 1.0 1.7748e+00 1.0 0.00e+00 0.0 9.3e+02 3.2e+03 2.1e+01  0  0  3  0  1   0  0  7  0  3     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSided         42 1.0 1.1711e+00 8.7 0.00e+00 0.0 7.5e+02 4.0e+00 4.2e+01  0  0  2  0  2   0  0  5  0  5     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSidedF         6 1.0 1.0859e+0012.4 0.00e+00 0.0 1.5e+02 2.0e+06 6.0e+00  0  0  0  8  0   0  0  1 18  1     0       0      0 0.00e+00    0 0.00e+00  0
MatMult            48589 1.0 4.4117e+00 1.0 5.31e+10 1.0 1.1e+04 8.3e+04 2.0e+00  1 27 31 27  0   1 69 81 59  0 94812   120792    401 2.37e+02  400 2.37e+02 100
MatAssemblyBegin      43 1.0 1.3974e+00 2.9 0.00e+00 0.0 1.5e+02 2.0e+06 6.0e+00  0  0  0  8  0   0  0  1 18  1     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd        43 1.0 1.5000e+00 2.6 4.67e+06 0.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  0   0  0  0  0  1    12       0      0 0.00e+00    0 0.00e+00  0
MatZeroEntries         3 1.0 8.8829e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatView                1 1.0 5.5705e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSetUp               1 1.0 6.7974e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               1 1.0 6.9627e+00 1.1 5.85e+10 1.0 1.1e+04 8.4e+04 6.0e+02  1 30 31 27 30   1 76 80 59 79 66272   104557    401 2.37e+02  400 2.37e+02 100
SNESSolve              1 1.0 2.1160e+02 1.0 6.79e+10 1.0 1.1e+04 9.6e+04 6.1e+02 42 35 31 31 31  43 88 81 68 80  2535   104517    405 2.54e+02  406 2.71e+02 86
SNESSetUp              1 1.0 7.8519e+01 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 1.8e+01 16  0  1 10  1  16  0  3 21  2     0       0      0 0.00e+00    0 0.00e+00  0
SNESFunctionEval       2 1.0 2.7544e+01 1.0 6.33e+09 1.0 1.1e+02 6.2e+04 3.0e+00  5  3  0  0  0   6  8  1  0  0  1839   12604      6 3.40e+01    6 3.39e+01  0
SNESJacobianEval       2 1.0 3.7777e+02 1.0 1.21e+10 1.0 1.1e+02 2.6e+06 2.0e+00 75  6  0  9  0  77 16  1 19  0   256       0      0 0.00e+00    6 3.39e+01  0
DMCreateInterp         1 1.0 2.3422e-03 1.1 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01  0  0  0  0  1   0  0  1  0  2   283       0      0 0.00e+00    0 0.00e+00  0
DMCreateMat            1 1.0 7.8511e+01 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 1.8e+01 16  0  1 10  1  16  0  3 21  2     0       0      0 0.00e+00    0 0.00e+00  0
Mesh Partition         1 1.0 2.3704e-03 1.0 0.00e+00 0.0 3.5e+01 1.1e+02 8.0e+00  0  0  0  0  0   0  0  0  0  1     0       0      0 0.00e+00    0 0.00e+00  0
Mesh Migration         1 1.0 5.6916e-01 1.0 0.00e+00 0.0 2.0e+02 8.2e+01 2.9e+01  0  0  1  0  1   0  0  1  0  4     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartSelf         1 1.0 1.0013e-04 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartLblInv       1 1.0 3.2641e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartLblSF        1 1.0 5.7217e-04 2.6 0.00e+00 0.0 1.4e+01 5.6e+01 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPartStrtSF       1 1.0 1.0368e-03 1.0 0.00e+00 0.0 7.0e+00 2.2e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPointSF          1 1.0 1.8058e-04 1.0 0.00e+00 0.0 1.4e+01 2.7e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterp          19 1.0 8.2986e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistribute       1 1.0 5.7178e-01 1.0 0.00e+00 0.0 2.5e+02 9.7e+01 3.7e+01  0  0  1  0  2   0  0  2  0  5     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistCones        1 1.0 1.5625e-04 1.1 0.00e+00 0.0 4.2e+01 1.4e+02 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistLabels       1 1.0 2.3368e-04 1.0 0.00e+00 0.0 1.0e+02 6.6e+01 2.4e+01  0  0  0  0  1   0  0  1  0  3     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexDistField        1 1.0 5.6853e-01 1.0 0.00e+00 0.0 4.9e+01 5.9e+01 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexStratify        34 1.0 6.9691e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  1     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexSymmetrize      34 1.0 1.2511e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexPrealloc         1 1.0 7.8443e+01 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 1.6e+01 16  0  1 10  1  16  0  3 21  2     0       0      0 0.00e+00    0 0.00e+00  0
DMPlexResidualFE       2 1.0 2.6080e+01 1.0 6.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00  5  3  0  0  0   5  8  0  0  0  1930       0      0 0.00e+00    0 0.00e+00  0
DMPlexJacobianFE       2 1.0 3.7704e+02 1.0 1.21e+10 1.0 7.6e+01 3.9e+06 2.0e+00 74  6  0  8  0  76 16  1 18  0   256       0      0 0.00e+00    0 0.00e+00  0
DMPlexInterpFE         1 1.0 2.2542e-03 1.1 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01  0  0  0  0  1   0  0  1  0  2   294       0      0 0.00e+00    0 0.00e+00  0
SFSetGraph            46 1.0 3.3391e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetUp               36 1.0 4.2510e-01 1.2 0.00e+00 0.0 1.3e+03 9.1e+04 3.6e+01  0  0  4  3  2   0  0  9  7  5     0       0      0 0.00e+00    0 0.00e+00  0
SFBcastBegin          68 1.0 4.1653e-0120.3 0.00e+00 0.0 1.0e+03 5.4e+04 0.0e+00  0  0  3  2  0   0  0  7  3  0     0       0      1 9.79e-02   11 6.79e+01  0
SFBcastEnd            68 1.0 1.1588e+0019.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFReduceBegin         17 1.0 2.4627e-0125.6 4.19e+06 1.0 3.1e+02 3.9e+05 0.0e+00  0  0  1  3  0   0  0  2  7  0   135       0      2 3.32e+01    0 0.00e+00 100
SFReduceEnd           17 1.0 1.3734e+0099.3 9.91e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      4 7.83e-01    0 0.00e+00 100
SFFetchOpBegin         2 1.0 5.1786e-0390.0 0.00e+00 0.0 3.8e+01 1.0e+06 0.0e+00  0  0  0  1  0   0  0  0  2  0     0       0      0 0.00e+00    0 0.00e+00  0
SFFetchOpEnd           2 1.0 3.4001e-02 2.8 0.00e+00 0.0 3.8e+01 1.0e+06 0.0e+00  0  0  0  1  0   0  0  0  2  0     0       0      0 0.00e+00    0 0.00e+00  0
SFCreateEmbed          9 1.0 6.6854e-0180.1 0.00e+00 0.0 1.6e+02 2.9e+03 0.0e+00  0  0  0  0  0   0  0  1  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFDistSection          9 1.0 3.3366e-02 3.0 0.00e+00 0.0 3.1e+02 2.6e+04 1.1e+01  0  0  1  0  1   0  0  2  0  1     0       0      0 0.00e+00    0 0.00e+00  0
SFSectionSF           17 1.0 1.3039e-01 2.5 0.00e+00 0.0 5.2e+02 7.6e+04 1.7e+01  0  0  1  1  1   0  0  4  2  2     0       0      0 0.00e+00    0 0.00e+00  0
SFRemoteOff            8 1.0 6.8439e-0129.4 0.00e+00 0.0 4.9e+02 5.3e+03 5.0e+00  0  0  1  0  0   0  0  3  0  1     0       0      0 0.00e+00    0 0.00e+00  0
SFPack               294 1.0 3.7839e-0123.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      2 3.94e-01    0 0.00e+00  0
SFUnpack             296 1.0 2.6401e-01 6.5 4.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   127       0      0 0.00e+00    0 0.00e+00 100
VecTDot              401 1.0 9.3199e-01 1.1 1.68e+09 1.0 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 20   0  2  0  0 52 14269   30835      0 0.00e+00    0 0.00e+00 100
VecNorm              201 1.0 7.0918e-01 3.1 8.43e+08 1.0 0.0e+00 0.0e+00 2.0e+02  0  0  0  0 10   0  1  0  0 26  9399   98437      0 0.00e+00    0 0.00e+00 100
VecCopy                2 1.0 2.6561e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                55 1.0 2.4164e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              400 1.0 4.0752e-01 1.1 1.68e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  2  0  0  0 32551   64687      0 0.00e+00    0 0.00e+00 100
VecAYPX              199 1.0 3.4332e-01 1.7 8.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 19222   26376      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     201 1.0 1.9094e-01 1.1 4.22e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 17455   36765      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      201 1.0 5.8133e-01 1.8 0.00e+00 0.0 1.1e+04 8.3e+04 2.0e+00  0  0 31 27  0   0  0 81 59  0     0       0      1 2.96e-01  400 2.37e+02  0
VecScatterEnd        201 1.0 3.2821e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0    400 2.37e+02    0 0.00e+00  0
DualSpaceSetUp         2 1.0 4.0375e-03 1.2 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     4       0      0 0.00e+00    0 0.00e+00  0
FESetUp                2 1.0 4.2975e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCSetUp                1 1.0 1.1332e-05 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply              201 1.0 4.2768e-01 1.0 4.22e+08 1.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  1  0  0  0  7793   34871      0 0.00e+00    0 0.00e+00 100

--- Event Stage 1: PCSetUp

PCSetUp                1 1.0 1.8768e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0 100  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

--- Event Stage 2: KSP Solve only

MatMult              400 1.0 8.1754e+00 1.0 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00  2 55 61 54  0  65 91100100  0 102324   133771    800 4.74e+02  800 4.74e+02 100
MatView                2 1.0 1.1267e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               2 1.0 1.2605e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03  2 60 61 54 60 100100100100100 73214   113908    800 4.74e+02  800 4.74e+02 100
SFPack               400 1.0 1.7243e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFUnpack             400 1.0 1.5637e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecTDot              802 1.0 2.0607e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02  0  2  0  0 40  15  3  0  0 67 12907   25655      0 0.00e+00    0 0.00e+00 100
VecNorm              402 1.0 9.5100e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 20   6  1  0  0 33 14018   96704      0 0.00e+00    0 0.00e+00 100
VecCopy                4 1.0 4.4442e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                 4 1.0 3.4707e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              800 1.0 7.9864e-01 1.1 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   6  3  0  0  0 33219   65843      0 0.00e+00    0 0.00e+00 100
VecAYPX              398 1.0 8.0719e-01 1.7 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   5  1  0  0  0 16352   21253      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     402 1.0 3.7318e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  1  0  0  0 17862   38464      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      400 1.0 1.4075e+00 1.8 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00  0  0 61 54  0   9  0100100  0     0       0      0 0.00e+00  800 4.74e+02  0
VecScatterEnd        400 1.0 6.3044e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5  0  0  0  0     0       0    800 4.74e+02    0 0.00e+00  0
PCApply              402 1.0 3.7337e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  1  0  0  0 17853   38464      0 0.00e+00    0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container    33             33        19008     0.
                SNES     1              1         1540     0.
              DMSNES     1              1          688     0.
       Krylov Solver     1              1         1664     0.
     DMKSP interface     1              1          656     0.
              Matrix    76             76   1627827176     0.
    Distributed Mesh    72             72     58958528     0.
            DM Label   180            180       113760     0.
          Quadrature   148            148        87616     0.
      Mesh Transform     6              6         4536     0.
           Index Set   665            665      4081364     0.
   IS L to G Mapping     2              2      8588672     0.
             Section   256            256       182272     0.
   Star Forest Graph   179            179       195360     0.
     Discrete System   121            121       116164     0.
           Weak Form   122            122        75152     0.
    GraphPartitioner    34             34        23392     0.
              Vector    55             55    157135208     0.
        Linear Space     5              5         3416     0.
          Dual Space    26             26        24336     0.
            FE Space     2              2         1576     0.
              Viewer     2              1          840     0.
      Preconditioner     1              1          872     0.
       Field over DM     1              1          704     0.

--- Event Stage 1: PCSetUp


--- Event Stage 2: KSP Solve only

========================================================================================================================
Average time to get PetscTime(): 3.81e-08
Average time for MPI_Barrier(): 8.176e-07
Average time for zero size MPI_Send(): 7.58962e-06
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 2,2,2
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 6
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 2,2,2
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi false
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-23 14:40:47 on login2 
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------

Using C compiler: cc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3   
Using Fortran compiler: ftn  -fPIC     
-----------------------------------------

Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/opt/rocm-4.5.0/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------



      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      # This code was compiled with GPU support and you've     #
      # created PETSc/GPU objects, but you intentionally used  #
      # -use_gpu_aware_mpi 0, such that PETSc had to copy data #
      # from GPU to CPU for communication. To get meaningfull  #
      # timing results, please use GPU-aware MPI instead.      #
      ##########################################################


#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 2,2,2
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 6
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 2,2,2
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi false
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 14 unused database options. They are:
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01


More information about the petsc-dev mailing list