[petsc-dev] Kokkos/Crusher perforance
Mark Adams
mfadams at lbl.gov
Sun Jan 23 17:37:55 CST 2022
* Perlmutter is roughly 5x faster than Crusher on the one node 2M eq test.
(small)
This is with 8 processes.
* The next largest version of this test, 16M eq total and 8 processes,
fails in memory allocation in the mat-mult setup in the Kokkos Mat.
* If I try to run with 64 processes on Perlmutter I get this error in
initialization. These nodes have 160 Gb of memory.
(I assume this is related to these large memory requirements from loading
packages, etc....)
Thanks,
Mark
+ srun -n64 -N1 --cpu-bind=cores --ntasks-per-core=1 ../ex13
-dm_plex_box_faces 4,4,4 -petscpartitioner_simple_process_grid 4,4,4
-dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1
-dm_refine 6 -dm_view -pc_type jacobi -log
_view -ksp_view -use_gpu_aware_mpi false -dm_mat_type aijkokkos
-dm_vec_type kokkos -log_trace
+ tee jac_out_001_kokkos_Perlmutter_6_8.txt
[48]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[48]PETSC ERROR: GPU error
[48]PETSC ERROR: cuda error 2 (cudaErrorMemoryAllocation) : out of memory
[48]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[48]PETSC ERROR: Petsc Development GIT revision: v3.16.3-683-gbc458ed4d8
GIT Date: 2022-01-22 12:18:02 -0600
[48]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tests/data/../ex13 on a
arch-perlmutter-opt-gcc-kokkos-cuda named nid001424 by madams Sun Jan 23
15:19:56 2022
[48]PETSC ERROR: Configure options --CFLAGS=" -g -DLANDAU_DIM=2
-DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2
-DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler
-rdynamic -DLANDAU_DIM=2 -DLAN
DAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --with-cc=cc --with-cxx=CC
--with-fc=ftn --LDFLAGS=-lmpifort_gnu_91
--with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc
--COPTFLAGS=" -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS=" -O3"
--with-debugging=0 --download-metis --download-parmetis --with-cuda=1
--with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1
--with-zlib=1 --download-kokkos --download-kokkos-kernels
--with-kokkos-kernels-tpl=0 --with-
make-np=8 PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda
[48]PETSC ERROR: #1 initialize() at
/global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:72
[48]PETSC ERROR: #2 initialize() at
/global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:343
[48]PETSC ERROR: #3 PetscDeviceInitializeTypeFromOptions_Private() at
/global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:319
[48]PETSC ERROR: #4 PetscDeviceInitializeFromOptions_Internal() at
/global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:449
[48]PETSC ERROR: #5 PetscInitialize_Common() at
/global/u2/m/madams/petsc/src/sys/objects/pinit.c:963
[48]PETSC ERROR: #6 PetscInitialize() at
/global/u2/m/madams/petsc/src/sys/objects/pinit.c:1238
On Sun, Jan 23, 2022 at 8:58 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>
> On Sat, Jan 22, 2022 at 6:22 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>> I cleaned up Mark's last run and put it in a fixed-width font. I
>> realize this may be too difficult but it would be great to have identical
>> runs to compare with on Summit.
>>
>
> I was planning on running this on Perlmutter today, as well as some sanity
> checks like all GPUs are being used. I'll try PetscDeviceView.
>
> Junchao modified the timers and all GPU > CPU now, but he seemed to move
> the timers more outside and Barry wants them tight on the "kernel".
> I think Junchao is going to work on that so I will hold off.
> (I removed the the Kokkos wait stuff and seemed to run a little faster but
> I am not sure how deterministic the timers are, and I did a test with GAMG
> and it was fine.)
>
>
>>
>> As Jed noted Scatter takes a long time but the pack and unpack take no
>> time? Is this not timed if using Kokkos?
>>
>>
>> --- Event Stage 2: KSP Solve only
>>
>> MatMult 400 1.0 8.8003e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04
>> 0.0e+00 2 55 61 54 0 70 91100100 95,058 132,242 0 0.00e+00 0
>> 0.00e+00 100
>> VecScatterBegin 400 1.0 1.3391e+00 2.6 0.00e+00 0.0 2.2e+04 8.5e+04
>> 0.0e+00 0 0 61 54 0 7 0100100 0 0 0 0.00e+00 0
>> 0.00e+00 0
>> VecScatterEnd 400 1.0 1.3240e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 9 0 0 0 0 0 0 0.00e+00 0
>> 0.00e+00 0
>> SFPack 400 1.0 1.8276e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0
>> 0.00e+00 0
>> SFUnpack 400 1.0 6.2653e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0
>> 0.00e+00 0
>>
>> KSPSolve 2 1.0 1.2540e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04
>> 1.2e+03 3 60 61 54 60 100100100 73,592 116,796 0 0.00e+00 0
>> 0.00e+00 100
>> VecTDot 802 1.0 1.3551e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00
>> 8.0e+02 0 2 0 0 40 10 3 0 19,627 52,599 0 0.00e+00 0
>> 0.00e+00 100
>> VecNorm 402 1.0 9.0151e-01 2.2 1.69e+09 1.0 0.0e+00 0.0e+00
>> 4.0e+02 0 1 0 0 20 5 1 0 0 14,788 125,477 0 0.00e+00 0
>> 0.00e+00 100
>> VecAXPY 800 1.0 8.2617e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 2 0 0 0 7 3 0 0 32,112 61,644 0 0.00e+00 0
>> 0.00e+00 100
>> VecAYPX 398 1.0 8.1525e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 5 1 0 0 16,190 20,689 0 0.00e+00 0
>> 0.00e+00 100
>> VecPointwiseMult 402 1.0 3.5694e-01 1.0 8.43e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 3 1 0 0 18,675 38,633 0 0.00e+00 0
>> 0.00e+00 100
>>
>>
>>
>> On Jan 22, 2022, at 12:40 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>> And I have a new MR with if you want to see what I've done so far.
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220123/444b1731/attachment-0001.html>
-------------- next part --------------
DM Object: box 8 MPI processes
type: plex
box in 3 dimensions:
Number of 0-cells per rank: 274625 274625 274625 274625 274625 274625 274625 274625
Number of 1-cells per rank: 811200 811200 811200 811200 811200 811200 811200 811200
Number of 2-cells per rank: 798720 798720 798720 798720 798720 798720 798720 798720
Number of 3-cells per rank: 262144 262144 262144 262144 262144 262144 262144 262144
Labels:
celltype: 4 strata with value/size (0 (274625), 1 (811200), 4 (798720), 7 (262144))
depth: 4 strata with value/size (0 (274625), 1 (811200), 2 (798720), 3 (262144))
marker: 1 strata with value/size (1 (49530))
Face Sets: 3 strata with value/size (1 (16129), 3 (16129), 6 (16129))
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaijkokkos
rows=16581375, cols=16581375
total: nonzeros=1045678375, allocated nonzeros=1045678375
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaijkokkos
rows=16581375, cols=16581375
total: nonzeros=1045678375, allocated nonzeros=1045678375
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaijkokkos
rows=16581375, cols=16581375
total: nonzeros=1045678375, allocated nonzeros=1045678375
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option. #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher-g named crusher017 with 8 processors, by adams Sun Jan 23 16:01:29 2022
Using Petsc Development GIT revision: v3.16.3-683-gbc458ed4d8 GIT Date: 2022-01-22 12:18:02 -0600
Max Max/Min Avg Total
Time (sec): 2.117e+03 1.000 2.117e+03
Objects: 1.990e+03 1.027 1.947e+03
Flop: 1.940e+11 1.027 1.915e+11 1.532e+12
Flop/sec: 9.164e+07 1.027 9.045e+07 7.236e+08
Memory: 5.124e+09 1.000 5.124e+09 4.099e+10
MPI Messages: 4.806e+03 1.066 4.571e+03 3.657e+04
MPI Message Lengths: 4.434e+08 1.015 9.611e+04 3.515e+09
MPI Reductions: 9.076e+03 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 2.0884e+03 98.7% 6.0875e+11 39.7% 1.417e+04 38.7% 1.143e+05 46.1% 3.420e+03 37.7%
1: PCSetUp: 3.2049e-01 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 4.000e+00 0.0%
2: KSP Solve only: 2.7931e+01 1.3% 9.2287e+11 60.3% 2.240e+04 61.3% 8.459e+04 53.9% 5.632e+03 62.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
CpuToGpu Count: total number of CPU to GPU copies per processor
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
GpuToCpu Count: total number of GPU to CPU copies per processor
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option. #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 6 1.0 1.6825e+01 1.0 0.00e+00 0.0 9.3e+02 3.2e+03 1.1e+02 1 0 3 0 1 1 0 7 0 3 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSided 42 1.0 2.0473e-02 1.0 0.00e+00 0.0 7.5e+02 4.0e+00 8.4e+01 0 0 2 0 1 0 0 5 0 2 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSidedF 6 1.0 1.4666e+005278.3 0.00e+00 0.0 1.5e+02 2.0e+06 1.2e+01 0 0 0 8 0 0 0 1 18 0 0 0 0 0.00e+00 0 0.00e+00 0
MatMult 48589 1.0 5.2831e+00 1.0 5.31e+10 1.0 1.1e+04 8.3e+04 4.0e+00 0 27 31 27 0 0 69 81 59 0 79173 112883 1 2.96e-01 0 0.00e+00 100
MatAssemblyBegin 43 1.0 4.0873e+00 1.9 0.00e+00 0.0 1.5e+02 2.0e+06 2.4e+01 0 0 0 8 0 0 0 1 18 1 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyEnd 43 1.0 4.7121e+00 5.0 4.67e+06 0.0 0.0e+00 0.0e+00 4.0e+01 0 0 0 0 0 0 0 0 0 1 4 0 0 0.00e+00 0 0.00e+00 0
MatZeroEntries 3 1.0 8.3230e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatView 1 1.0 3.5321e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSetUp 1 1.0 4.6707e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 1 1.0 1.4542e+01 1.0 5.85e+10 1.0 1.1e+04 8.4e+04 2.8e+03 1 30 31 27 31 1 76 80 59 82 31730 44788 1 2.96e-01 0 0.00e+00 100
SNESSolve 1 1.0 6.8305e+02 1.0 6.79e+10 1.0 1.1e+04 9.6e+04 2.8e+03 32 35 31 31 31 33 88 81 68 83 785 44692 3 1.70e+01 2 3.32e+01 86
SNESSetUp 1 1.0 7.3260e+02 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 7.4e+01 35 0 1 10 1 35 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0
SNESFunctionEval 2 1.0 2.1545e+02 1.0 6.33e+09 1.0 1.1e+02 6.2e+04 1.1e+01 10 3 0 0 0 10 8 1 0 0 235 699 3 3.33e+01 2 3.32e+01 0
SNESJacobianEval 2 1.0 1.1011e+03 1.0 1.21e+10 1.0 1.1e+02 2.6e+06 8.0e+00 52 6 0 9 0 53 16 1 19 0 88 0 0 0.00e+00 2 3.32e+01 0
DMCreateInterp 1 1.0 6.4663e-02 1.0 8.29e+04 1.0 7.6e+01 1.1e+03 8.3e+01 0 0 0 0 1 0 0 1 0 2 10 0 0 0.00e+00 0 0.00e+00 0
DMCreateMat 1 1.0 7.3258e+02 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 7.4e+01 35 0 1 10 1 35 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0
Mesh Partition 1 1.0 9.5106e-02 1.0 0.00e+00 0.0 3.5e+01 1.1e+02 2.0e+01 0 0 0 0 0 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0
Mesh Migration 1 1.0 2.2802e-02 1.0 0.00e+00 0.0 2.0e+02 8.2e+01 5.5e+01 0 0 1 0 1 0 0 1 0 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartSelf 1 1.0 4.5569e-0412.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartLblInv 1 1.0 6.5870e-04 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartLblSF 1 1.0 1.9269e-02 1.0 0.00e+00 0.0 1.4e+01 5.6e+01 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartStrtSF 1 1.0 5.6106e-02 1.0 0.00e+00 0.0 7.0e+00 2.2e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPointSF 1 1.0 2.9290e-04 1.0 0.00e+00 0.0 1.4e+01 2.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterp 19 1.0 5.0179e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistribute 1 1.0 1.1842e-01 1.0 0.00e+00 0.0 2.5e+02 9.7e+01 7.5e+01 0 0 1 0 1 0 0 2 0 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistCones 1 1.0 2.7983e-04 1.0 0.00e+00 0.0 4.2e+01 1.4e+02 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistLabels 1 1.0 8.2325e-04 1.0 0.00e+00 0.0 1.0e+02 6.6e+01 3.4e+01 0 0 0 0 0 0 0 1 0 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistField 1 1.0 2.0814e-02 1.0 0.00e+00 0.0 4.9e+01 5.9e+01 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexStratify 34 1.0 1.3422e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexSymmetrize 34 1.0 1.3228e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPrealloc 1 1.0 7.3154e+02 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 6.8e+01 35 0 1 10 1 35 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexResidualFE 2 1.0 1.1708e+02 1.0 6.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 3 0 0 0 6 8 0 0 0 430 0 0 0.00e+00 0 0.00e+00 0
DMPlexJacobianFE 2 1.0 1.0044e+03 1.0 1.21e+10 1.0 7.6e+01 3.9e+06 8.0e+00 47 6 0 8 0 48 16 1 18 0 96 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterpFE 1 1.0 6.4481e-02 1.0 8.29e+04 1.0 7.6e+01 1.1e+03 7.9e+01 0 0 0 0 1 0 0 1 0 2 10 0 0 0.00e+00 0 0.00e+00 0
SFSetGraph 46 1.0 2.5263e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetUp 36 1.0 1.5407e+00 1.3 0.00e+00 0.0 1.3e+03 9.1e+04 7.2e+01 0 0 4 3 1 0 0 9 7 2 0 0 0 0.00e+00 0 0.00e+00 0
SFBcastBegin 68 1.0 7.0080e-01 4.6 0.00e+00 0.0 1.0e+03 5.4e+04 0.0e+00 0 0 3 2 0 0 0 7 3 0 0 0 1 9.79e-02 4 6.63e+01 0
SFBcastEnd 68 1.0 9.3238e-0110.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFReduceBegin 17 1.0 1.2490e+0129.0 4.19e+06 1.0 3.1e+02 3.9e+05 0.0e+00 0 0 1 3 0 0 0 2 7 0 3 0 2 3.32e+01 0 0.00e+00 100
SFReduceEnd 17 1.0 1.2483e-0110.6 9.91e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3 0 0 0.00e+00 0 0.00e+00 100
SFFetchOpBegin 2 1.0 1.9744e-02405.4 0.00e+00 0.0 3.8e+01 1.0e+06 0.0e+00 0 0 0 1 0 0 0 0 2 0 0 0 0 0.00e+00 0 0.00e+00 0
SFFetchOpEnd 2 1.0 7.6417e-02 1.7 0.00e+00 0.0 3.8e+01 1.0e+06 0.0e+00 0 0 0 1 0 0 0 0 2 0 0 0 0 0.00e+00 0 0.00e+00 0
SFCreateEmbed 9 1.0 6.6322e+0011.8 0.00e+00 0.0 1.6e+02 2.9e+03 0.0e+00 0 0 0 0 0 0 0 1 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFDistSection 9 1.0 7.8531e-02 1.9 0.00e+00 0.0 3.1e+02 2.6e+04 2.2e+01 0 0 1 0 0 0 0 2 0 1 0 0 0 0.00e+00 0 0.00e+00 0
SFSectionSF 17 1.0 4.9843e-01 2.3 0.00e+00 0.0 5.2e+02 7.6e+04 3.4e+01 0 0 1 1 0 0 0 4 2 1 0 0 0 0.00e+00 0 0.00e+00 0
SFRemoteOff 8 1.0 6.6626e+0011.0 0.00e+00 0.0 4.9e+02 5.3e+03 1.0e+01 0 0 1 0 0 0 0 3 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFPack 294 1.0 5.7900e-01 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 3.94e-01 0 0.00e+00 0
SFUnpack 296 1.0 4.0865e-01 3.6 4.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 82 0 0 0.00e+00 0 0.00e+00 100
VecTDot 401 1.0 5.8803e+00 1.0 1.68e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 1 0 0 9 0 2 0 0 23 2262 3364 0 0.00e+00 0 0.00e+00 100
VecNorm 201 1.0 2.5526e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 4.0e+02 0 0 0 0 4 0 1 0 0 12 26114 79276 0 0.00e+00 0 0.00e+00 100
VecCopy 2 1.0 2.9731e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 55 1.0 1.2648e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 400 1.0 3.6560e-01 1.2 1.68e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 2 0 0 0 36283 70527 0 0.00e+00 0 0.00e+00 100
VecAYPX 199 1.0 5.4400e-01 1.2 8.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 12131 14047 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 201 1.0 2.2390e+00 1.1 4.22e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 1489 1545 0 0.00e+00 0 0.00e+00 100
VecScatterBegin 201 1.0 2.5802e-01 1.6 0.00e+00 0.0 1.1e+04 8.3e+04 4.0e+00 0 0 31 27 0 0 0 81 59 0 0 0 1 2.96e-01 0 0.00e+00 0
VecScatterEnd 201 1.0 9.5519e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DualSpaceSetUp 2 1.0 2.4679e-02 1.0 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1 0 0 0.00e+00 0 0.00e+00 0
FESetUp 2 1.0 3.2891e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCSetUp 1 1.0 5.6158e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCApply 201 1.0 2.5857e+00 1.0 4.22e+08 1.0 0.0e+00 0.0e+00 7.0e+00 0 0 0 0 0 0 1 0 0 0 1289 1534 0 0.00e+00 0 0.00e+00 100
--- Event Stage 1: PCSetUp
PCSetUp 1 1.0 3.2428e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 100 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
--- Event Stage 2: KSP Solve only
MatMult 400 1.0 9.7959e+00 1.0 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00 0 55 61 54 0 35 91100100 0 85397 120519 0 0.00e+00 0 0.00e+00 100
MatView 2 1.0 6.3953e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 2 1.0 2.7910e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 5.6e+03 1 60 61 54 62 100100100100100 33066 44823 0 0.00e+00 0 0.00e+00 100
SFPack 400 1.0 3.5893e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFUnpack 400 1.0 3.4430e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecTDot 802 1.0 1.1806e+01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00 1.6e+03 1 2 0 0 18 42 3 0 0 28 2253 3335 0 0.00e+00 0 0.00e+00 100
VecNorm 402 1.0 4.8740e-01 1.1 1.69e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 1 0 0 9 2 1 0 0 14 27352 79664 0 0.00e+00 0 0.00e+00 100
VecCopy 4 1.0 5.9192e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 4 1.0 3.5748e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 800 1.0 7.0964e-01 1.1 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 2 3 0 0 0 37385 73135 0 0.00e+00 0 0.00e+00 100
VecAYPX 398 1.0 1.2010e+00 1.5 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 3 1 0 0 0 10990 12791 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 402 1.0 4.4733e+00 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 16 1 0 0 0 1490 1547 0 0.00e+00 0 0.00e+00 100
VecScatterBegin 400 1.0 4.3141e-01 1.2 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00 0 0 61 54 0 1 0100100 0 0 0 0 0.00e+00 0 0.00e+00 0
VecScatterEnd 400 1.0 1.5419e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCApply 402 1.0 4.4744e+00 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 16 1 0 0 0 1490 1547 0 0.00e+00 0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 33 33 19008 0.
SNES 1 1 1540 0.
DMSNES 1 1 688 0.
Krylov Solver 1 1 1664 0.
DMKSP interface 1 1 656 0.
Matrix 76 76 1627827176 0.
Distributed Mesh 72 72 58958528 0.
DM Label 180 180 113760 0.
Quadrature 148 148 87616 0.
Mesh Transform 6 6 4536 0.
Index Set 665 665 4081364 0.
IS L to G Mapping 2 2 8588672 0.
Section 256 256 182272 0.
Star Forest Graph 179 179 195360 0.
Discrete System 121 121 116164 0.
Weak Form 122 122 75152 0.
GraphPartitioner 34 34 23392 0.
Vector 55 55 157135208 0.
Linear Space 5 5 3416 0.
Dual Space 26 26 24336 0.
FE Space 2 2 1576 0.
Viewer 2 1 840 0.
Preconditioner 1 1 872 0.
Field over DM 1 1 704 0.
--- Event Stage 1: PCSetUp
--- Event Stage 2: KSP Solve only
========================================================================================================================
Average time to get PetscTime(): 4.1e-08
Average time for MPI_Barrier(): 7.894e-07
Average time for zero size MPI_Send(): 8.00675e-06
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 2,2,2
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 6
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 2,2,2
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=1 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 PETSC_ARCH=arch-olcf-crusher-g
-----------------------------------------
Libraries compiled on 2022-01-23 14:28:01 on login2
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher-g
-----------------------------------------
Using C compiler: cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g3 -O0
Using Fortran compiler: ftn -fPIC -g -O0
-----------------------------------------
Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher-g/include -I/opt/rocm-4.5.0/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher-g/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher-g/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher-g/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher-g/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option. #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 2,2,2
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 6
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 2,2,2
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi true
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 14 unused database options. They are:
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01
-------------- next part --------------
DM Object: box 8 MPI processes
type: plex
box in 3 dimensions:
Number of 0-cells per rank: 274625 274625 274625 274625 274625 274625 274625 274625
Number of 1-cells per rank: 811200 811200 811200 811200 811200 811200 811200 811200
Number of 2-cells per rank: 798720 798720 798720 798720 798720 798720 798720 798720
Number of 3-cells per rank: 262144 262144 262144 262144 262144 262144 262144 262144
Labels:
celltype: 4 strata with value/size (0 (274625), 1 (811200), 4 (798720), 7 (262144))
depth: 4 strata with value/size (0 (274625), 1 (811200), 2 (798720), 3 (262144))
marker: 1 strata with value/size (1 (49530))
Face Sets: 3 strata with value/size (1 (16129), 3 (16129), 6 (16129))
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaijkokkos
rows=16581375, cols=16581375
total: nonzeros=1045678375, allocated nonzeros=1045678375
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaijkokkos
rows=16581375, cols=16581375
total: nonzeros=1045678375, allocated nonzeros=1045678375
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaijkokkos
rows=16581375, cols=16581375
total: nonzeros=1045678375, allocated nonzeros=1045678375
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with GPU support and you've #
# created PETSc/GPU objects, but you intentionally used #
# -use_gpu_aware_mpi 0, such that PETSc had to copy data #
# from GPU to CPU for communication. To get meaningfull #
# timing results, please use GPU-aware MPI instead. #
##########################################################
/global/u2/m/madams/petsc/src/snes/tests/data/../ex13-kok on a arch-perlmutter-opt-gcc-kokkos-cuda named nid003016 with 8 processors, by madams Sun Jan 23 13:52:55 2022
Using Petsc Development GIT revision: v3.16.3-683-gbc458ed4d8 GIT Date: 2022-01-22 12:18:02 -0600
Max Max/Min Avg Total
Time (sec): 2.478e+02 1.000 2.478e+02
Objects: 1.990e+03 1.027 1.947e+03
Flop: 1.940e+11 1.027 1.915e+11 1.532e+12
Flop/sec: 7.827e+08 1.027 7.725e+08 6.180e+09
MPI Messages: 4.806e+03 1.066 4.571e+03 3.657e+04
MPI Message Lengths: 4.434e+08 1.015 9.611e+04 3.515e+09
MPI Reductions: 1.992e+03 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 2.4318e+02 98.1% 6.0875e+11 39.7% 1.417e+04 38.7% 1.143e+05 46.1% 7.660e+02 38.5%
1: PCSetUp: 1.1254e-01 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
2: KSP Solve only: 4.5429e+00 1.8% 9.2287e+11 60.3% 2.240e+04 61.3% 8.459e+04 53.9% 1.206e+03 60.5%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
CpuToGpu Count: total number of CPU to GPU copies per processor
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
GpuToCpu Count: total number of GPU to CPU copies per processor
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 6 1.0 1.2129e+00 1.0 0.00e+00 0.0 9.3e+02 3.2e+03 2.1e+01 0 0 3 0 1 0 0 7 0 3 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSided 42 1.0 1.0956e+00 3.2 0.00e+00 0.0 7.5e+02 4.0e+00 4.2e+01 0 0 2 0 2 0 0 5 0 5 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSidedF 6 1.0 1.0119e+00 3.0 0.00e+00 0.0 1.5e+02 2.0e+06 6.0e+00 0 0 0 8 0 0 0 1 18 1 0 0 0 0.00e+00 0 0.00e+00 0
MatMult 48589 1.0 1.3106e+00 1.0 5.31e+10 1.0 1.1e+04 8.3e+04 2.0e+00 1 27 31 27 0 1 69 81 59 0 319165 348897 401 2.37e+02 400 2.37e+02 100
MatAssemblyBegin 43 1.0 1.0130e+00 1.5 0.00e+00 0.0 1.5e+02 2.0e+06 6.0e+00 0 0 0 8 0 0 0 1 18 1 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyEnd 43 1.0 9.7129e-01 3.9 4.67e+06 0.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 0 0 0 1 19 0 0 0.00e+00 0 0.00e+00 0
MatZeroEntries 3 1.0 5.3763e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatView 1 1.0 2.7296e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSetUp 1 1.0 6.9012e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 1 1.0 2.7568e+00 1.2 5.85e+10 1.0 1.1e+04 8.4e+04 6.0e+02 1 30 31 27 30 1 76 80 59 79 167381 248190 401 2.37e+02 400 2.37e+02 100
SNESSolve 1 1.0 9.5932e+01 1.0 6.79e+10 1.0 1.1e+04 9.6e+04 6.1e+02 39 35 31 31 31 39 88 81 68 80 5592 247975 405 2.54e+02 406 2.71e+02 86
SNESSetUp 1 1.0 5.4541e+01 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 1.8e+01 22 0 1 10 1 22 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0
SNESFunctionEval 2 1.0 1.4407e+01 1.0 6.33e+09 1.0 1.1e+02 6.2e+04 3.0e+00 6 3 0 0 0 6 8 1 0 0 3515 10267 6 3.40e+01 6 3.39e+01 0
SNESJacobianEval 2 1.0 1.7022e+02 1.0 1.21e+10 1.0 1.1e+02 2.6e+06 2.0e+00 69 6 0 9 0 70 16 1 19 0 568 0 0 0.00e+00 6 3.39e+01 0
DMCreateInterp 1 1.0 1.0255e-02 1.0 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01 0 0 0 0 1 0 0 1 0 2 65 0 0 0.00e+00 0 0.00e+00 0
DMCreateMat 1 1.0 5.4538e+01 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 1.8e+01 22 0 1 10 1 22 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0
Mesh Partition 1 1.0 5.4900e-03 1.0 0.00e+00 0.0 3.5e+01 1.1e+02 8.0e+00 0 0 0 0 0 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0
Mesh Migration 1 1.0 2.4895e-02 1.0 0.00e+00 0.0 2.0e+02 8.2e+01 2.9e+01 0 0 1 0 1 0 0 1 0 4 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartSelf 1 1.0 2.1502e-04 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartLblInv 1 1.0 3.7241e-04 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartLblSF 1 1.0 6.5123e-04 1.5 0.00e+00 0.0 1.4e+01 5.6e+01 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPartStrtSF 1 1.0 2.7649e-03 1.0 0.00e+00 0.0 7.0e+00 2.2e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPointSF 1 1.0 1.0099e-04 1.4 0.00e+00 0.0 1.4e+01 2.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterp 19 1.0 4.7841e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistribute 1 1.0 3.0598e-02 1.0 0.00e+00 0.0 2.5e+02 9.7e+01 3.7e+01 0 0 1 0 2 0 0 2 0 5 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistCones 1 1.0 2.3352e-04 1.0 0.00e+00 0.0 4.2e+01 1.4e+02 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistLabels 1 1.0 3.8594e-04 1.1 0.00e+00 0.0 1.0e+02 6.6e+01 2.4e+01 0 0 0 0 1 0 0 1 0 3 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexDistField 1 1.0 2.2224e-02 1.0 0.00e+00 0.0 4.9e+01 5.9e+01 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexStratify 34 1.0 4.7792e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexSymmetrize 34 1.0 1.0723e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexPrealloc 1 1.0 5.4490e+01 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 1.6e+01 22 0 1 10 1 22 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0
DMPlexResidualFE 2 1.0 1.3754e+01 1.0 6.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 3 0 0 0 6 8 0 0 0 3660 0 0 0.00e+00 0 0.00e+00 0
DMPlexJacobianFE 2 1.0 1.6975e+02 1.0 1.21e+10 1.0 7.6e+01 3.9e+06 2.0e+00 68 6 0 8 0 70 16 1 18 0 568 0 0 0.00e+00 0 0.00e+00 0
DMPlexInterpFE 1 1.0 9.2552e-03 1.0 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01 0 0 0 0 1 0 0 1 0 2 72 0 0 0.00e+00 0 0.00e+00 0
SFSetGraph 46 1.0 9.5831e-04 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetUp 36 1.0 2.5251e-01 1.2 0.00e+00 0.0 1.3e+03 9.1e+04 3.6e+01 0 0 4 3 2 0 0 9 7 5 0 0 0 0.00e+00 0 0.00e+00 0
SFBcastBegin 68 1.0 2.5963e-02 1.3 0.00e+00 0.0 1.0e+03 5.4e+04 0.0e+00 0 0 3 2 0 0 0 7 3 0 0 0 1 9.79e-02 11 6.79e+01 0
SFBcastEnd 68 1.0 4.7651e-0113.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFReduceBegin 17 1.0 1.3447e-02 2.1 4.19e+06 1.0 3.1e+02 3.9e+05 0.0e+00 0 0 1 3 0 0 0 2 7 0 2466 0 2 3.32e+01 0 0.00e+00 100
SFReduceEnd 17 1.0 8.9506e-0119.6 9.91e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 4 7.83e-01 0 0.00e+00 100
SFFetchOpBegin 2 1.0 1.6954e-0366.4 0.00e+00 0.0 3.8e+01 1.0e+06 0.0e+00 0 0 0 1 0 0 0 0 2 0 0 0 0 0.00e+00 0 0.00e+00 0
SFFetchOpEnd 2 1.0 1.2827e-02 1.5 0.00e+00 0.0 3.8e+01 1.0e+06 0.0e+00 0 0 0 1 0 0 0 0 2 0 0 0 0 0.00e+00 0 0.00e+00 0
SFCreateEmbed 9 1.0 4.2304e-01100.6 0.00e+00 0.0 1.6e+02 2.9e+03 0.0e+00 0 0 0 0 0 0 0 1 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFDistSection 9 1.0 2.4147e-02 4.8 0.00e+00 0.0 3.1e+02 2.6e+04 1.1e+01 0 0 1 0 1 0 0 2 0 1 0 0 0 0.00e+00 0 0.00e+00 0
SFSectionSF 17 1.0 7.9611e-02 2.1 0.00e+00 0.0 5.2e+02 7.6e+04 1.7e+01 0 0 1 1 1 0 0 4 2 2 0 0 0 0.00e+00 0 0.00e+00 0
SFRemoteOff 8 1.0 4.2911e-0139.9 0.00e+00 0.0 4.9e+02 5.3e+03 5.0e+00 0 0 1 0 0 0 0 3 0 1 0 0 0 0.00e+00 0 0.00e+00 0
SFPack 294 1.0 1.3528e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 3.94e-01 0 0.00e+00 0
SFUnpack 296 1.0 1.9024e-02 1.4 4.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1764 0 0 0.00e+00 0 0.00e+00 100
VecTDot 401 1.0 3.6469e-01 1.3 1.68e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 0 2 0 0 52 36464 86204 0 0.00e+00 0 0.00e+00 100
VecNorm 201 1.0 5.0161e-01 3.1 8.43e+08 1.0 0.0e+00 0.0e+00 2.0e+02 0 0 0 0 10 0 1 0 0 26 13289 92324 0 0.00e+00 0 0.00e+00 100
VecCopy 2 1.0 1.2643e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 55 1.0 1.0852e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 400 1.0 2.7144e-01 1.1 1.68e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 2 0 0 0 48870 66227 0 0.00e+00 0 0.00e+00 100
VecAYPX 199 1.0 1.3385e-01 1.3 8.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 49304 50397 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 201 1.0 1.3709e-01 1.3 4.22e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 24311 25426 0 0.00e+00 0 0.00e+00 100
VecScatterBegin 201 1.0 8.0074e-02 1.1 0.00e+00 0.0 1.1e+04 8.3e+04 2.0e+00 0 0 31 27 0 0 0 81 59 0 0 0 1 2.96e-01 400 2.37e+02 0
VecScatterEnd 201 1.0 4.1467e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 400 2.37e+02 0 0.00e+00 0
DualSpaceSetUp 2 1.0 7.7611e-03 1.0 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 0 0 0.00e+00 0 0.00e+00 0
FESetUp 2 1.0 6.2696e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCSetUp 1 1.0 1.0039e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCApply 201 1.0 2.7006e-01 1.1 4.22e+08 1.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 1 0 0 0 12341 24968 0 0.00e+00 0 0.00e+00 100
--- Event Stage 1: PCSetUp
PCSetUp 1 1.0 1.1611e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 100 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
--- Event Stage 2: KSP Solve only
MatMult 400 1.0 2.4530e+00 1.0 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00 1 55 61 54 0 54 91100100 0 341028 374935 800 4.74e+02 800 4.74e+02 100
MatView 2 1.0 9.2448e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 2 1.0 4.7214e+00 1.1 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03 2 60 61 54 60 100100100100100 195466 271548 800 4.74e+02 800 4.74e+02 100
SFPack 400 1.0 1.4551e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFUnpack 400 1.0 8.7549e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecTDot 802 1.0 7.1575e-01 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 2 0 0 40 14 3 0 0 67 37159 86475 0 0.00e+00 0 0.00e+00 100
VecNorm 402 1.0 6.3610e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 11 1 0 0 33 20958 92829 0 0.00e+00 0 0.00e+00 100
VecCopy 4 1.0 3.1460e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 4 1.0 1.9096e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 800 1.0 5.3516e-01 1.1 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 11 3 0 0 0 49575 77040 0 0.00e+00 0 0.00e+00 100
VecAYPX 398 1.0 2.6546e-01 1.3 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 5 1 0 0 0 49720 65273 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 402 1.0 2.7135e-01 1.3 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 1 0 0 0 24565 32236 0 0.00e+00 0 0.00e+00 100
VecScatterBegin 400 1.0 1.5257e-01 1.1 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00 0 0 61 54 0 3 0100100 0 0 0 0 0.00e+00 800 4.74e+02 0
VecScatterEnd 400 1.0 7.8066e-02 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 0 800 4.74e+02 0 0.00e+00 0
PCApply 402 1.0 2.7145e-01 1.3 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 1 0 0 0 24556 32236 0 0.00e+00 0 0.00e+00 100
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 33 33 19008 0.
SNES 1 1 1540 0.
DMSNES 1 1 688 0.
Krylov Solver 1 1 1664 0.
DMKSP interface 1 1 656 0.
Matrix 76 76 1627827176 0.
Distributed Mesh 72 72 58958528 0.
DM Label 180 180 113760 0.
Quadrature 148 148 87616 0.
Mesh Transform 6 6 4536 0.
Index Set 665 665 4081364 0.
IS L to G Mapping 2 2 8588672 0.
Section 256 256 182272 0.
Star Forest Graph 179 179 195360 0.
Discrete System 121 121 116164 0.
Weak Form 122 122 75152 0.
GraphPartitioner 34 34 23584 0.
Vector 55 55 157135208 0.
Linear Space 5 5 3416 0.
Dual Space 26 26 24336 0.
FE Space 2 2 1576 0.
Viewer 2 1 840 0.
Preconditioner 1 1 872 0.
Field over DM 1 1 704 0.
--- Event Stage 1: PCSetUp
--- Event Stage 2: KSP Solve only
========================================================================================================================
Average time to get PetscTime(): 3.31e-08
Average time for MPI_Barrier(): 1.8454e-06
Average time for zero size MPI_Send(): 1.26619e-05
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 2,2,2
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 6
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_trace
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 2,2,2
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi false
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --CFLAGS=" -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --with-cc=cc --with-cxx=CC --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc --COPTFLAGS=" -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS=" -O3" --with-debugging=0 --download-metis --download-parmetis --with-cuda=1 --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1 --with-zlib=1 --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 --with-make-np=8 PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda
-----------------------------------------
Libraries compiled on 2022-01-23 17:40:44 on login30
Machine characteristics: Linux-5.3.18-24.75_10.0.190-cray_shasta_c-x86_64-with-glibc2.2.5
Using PETSc directory: /global/homes/m/madams/petsc
Using PETSc arch: arch-perlmutter-opt-gcc-kokkos-cuda
-----------------------------------------
Using C compiler: cc -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -fPIC -O3
Using Fortran compiler: ftn -fPIC -O3
-----------------------------------------
Using include paths: -I/global/homes/m/madams/petsc/include -I/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/include -I/global/common/software/nersc/cos1.3/cuda/11.3.0/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/lib -L/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/lib -lpetsc -Wl,-rpath,/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/lib -L/global/homes/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/lib -Wl,-rpath,/global/common/software/nersc/cos1.3/cuda/11.3.0/lib64 -L/global/common/software/nersc/cos1.3/cuda/11.3.0/lib64 -L/global/common/software/nersc/cos1.3/cuda/11.3.0/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -lp4est -lsc -lparmetis -lmetis -lcudart -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lquadmath -lstdc++ -ldl
-----------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with GPU support and you've #
# created PETSc/GPU objects, but you intentionally used #
# -use_gpu_aware_mpi 0, such that PETSc had to copy data #
# from GPU to CPU for communication. To get meaningfull #
# timing results, please use GPU-aware MPI instead. #
##########################################################
#PETSc Option Table entries:
-benchmark_it 2
-dm_distribute
-dm_mat_type aijkokkos
-dm_plex_box_faces 2,2,2
-dm_plex_box_lower 0,0,0
-dm_plex_box_upper 1,1,1
-dm_plex_dim 3
-dm_plex_simplex 0
-dm_refine 6
-dm_vec_type kokkos
-dm_view
-ksp_converged_reason
-ksp_max_it 200
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_trace
-log_view
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-options_left
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 0
-pc_gamg_threshold -0.01
-pc_type jacobi
-petscpartitioner_simple_node_grid 1,1,1
-petscpartitioner_simple_process_grid 2,2,2
-petscpartitioner_type simple
-potential_petscspace_degree 2
-snes_max_it 1
-snes_rtol 1.e-8
-snes_type ksponly
-use_gpu_aware_mpi false
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There are 14 unused database options. They are:
Option left: name:-mg_levels_esteig_ksp_max_it value: 10
Option left: name:-mg_levels_esteig_ksp_type value: cg
Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05
Option left: name:-mg_levels_ksp_type value: chebyshev
Option left: name:-mg_levels_pc_type value: jacobi
Option left: name:-pc_gamg_coarse_eq_limit value: 100
Option left: name:-pc_gamg_coarse_grid_layout_type value: compact
Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
Option left: name:-pc_gamg_esteig_ksp_type value: cg
Option left: name:-pc_gamg_process_eq_limit value: 400
Option left: name:-pc_gamg_repartition value: false
Option left: name:-pc_gamg_reuse_interpolation value: true
Option left: name:-pc_gamg_square_graph value: 0
Option left: name:-pc_gamg_threshold value: -0.01
More information about the petsc-dev
mailing list