[petsc-dev] bad cpu/MPI performance problem
Mark Adams
mfadams at lbl.gov
Sun Jan 8 11:21:08 CST 2023
I am running on Crusher, CPU only, 64 cores per node with Plex/PetscFE.
In going up to 64 nodes, something really catastrophic is happening.
I understand I am not using the machine the way it was intended, but I just
want to see if there are any options that I could try for a quick fix/help.
In a debug build I get a stack trace on many but not all of the 4K
processes.
Alas, I am not sure why this job was terminated but every process that I
checked, that had an "ERROR", had this stack:
11:57 main *+= crusher:/gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/data$
grep ERROR slurm-245063.out |g 3160
[3160]PETSC ERROR:
------------------------------------------------------------------------
[3160]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
batch system) has told this process to end
[3160]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger
[3160]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and
https://petsc.org/release/faq/
[3160]PETSC ERROR: --------------------- Stack Frames
------------------------------------
[3160]PETSC ERROR: The line numbers in the error traceback are not always
exact.
[3160]PETSC ERROR: #1 MPI function
[3160]PETSC ERROR: #2 PetscCommDuplicate() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/sys/objects/tagm.c:248
[3160]PETSC ERROR: #3 PetscHeaderCreate_Private() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/sys/objects/inherit.c:56
[3160]PETSC ERROR: #4 PetscSFCreate() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/vec/is/sf/interface/sf.c:65
[3160]PETSC ERROR: #5 DMLabelGather() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/dm/label/dmlabel.c:1932
[3160]PETSC ERROR: #6 DMPlexLabelComplete_Internal() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/dm/impls/plex/plexsubmesh.c:177
[3160]PETSC ERROR: #7 DMPlexLabelComplete() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/dm/impls/plex/plexsubmesh.c:227
[3160]PETSC ERROR: #8 DMCompleteBCLabels_Internal() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:5301
[3160]PETSC ERROR: #9 DMCopyDS() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:6117
[3160]PETSC ERROR: #10 DMCopyDisc() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/dm/interface/dm.c:6143
[3160]PETSC ERROR: #11 SetupDiscretization() at
/gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/mhd_2field.c:755
Maybe the MPI is just getting overwhelmed*.*
And I was able to get one run to to work (one TS with beuler), and the
solver performance was horrendous and I see this (attached):
Time (sec): 1.601e+02 1.001 1.600e+02
VecMDot 111712 1.0 5.1684e+01 1.4 2.32e+07 12.8 0.0e+00 0.0e+00
1.1e+05 30 4 0 0 23 30 4 0 0 23 499
VecNorm 163478 1.0 6.6660e+01 1.2 1.51e+07 21.5 0.0e+00 0.0e+00
1.6e+05 39 2 0 0 34 39 2 0 0 34 139
VecNormalize 154599 1.0 6.3942e+01 1.2 2.19e+07 23.3 0.0e+00 0.0e+00
1.5e+05 38 2 0 0 32 38 2 0 0 32 189
etc,
KSPSolve 3 1.0 1.1553e+02 1.0 1.34e+09 47.1 2.8e+09 6.0e+01
2.8e+05 72 95 45 72 58 72 95 45 72 58 4772
Any ideas would be welcome,
Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20230108/c708f675/attachment-0001.html>
-------------- next part --------------
+ '[' -z '' ']'
+ case "$-" in
+ __lmod_vx=x
+ '[' -n x ']'
+ set +x
Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for this output (/usr/share/lmod/lmod/init/bash)
Shell debugging restarted
+ unset __lmod_vx
+ NG=8
+ NC=8
+ VERSION=
+ export MPICH_OFI_NIC_POLICY=NUMA
+ MPICH_OFI_NIC_POLICY=NUMA
+ export OMP_PROC_BIND=true
+ OMP_PROC_BIND=true
+ NTPN=64
+ ORDER=2
+ DT=0.001
+ MU=0.005
+ ETA=0.0001
+ STEPS=1
+ TYPE=tilt
+ EXTRA1='-pc_type mg -ksp_type fgmres -ksp_converged_reason -pc_mg_type full -mg_levels_ksp_max_it 4 -mg_levels_ksp_type gmres -mg_levels_pc_type jacobi -log_view -ts_adapt_dt_max 0.01 -mg_coarse_pc_type gamg -mg_coarse_ksp_type fgmres -mg_coarse_mg_levels_ksp_type gmres -mg_coarse_ksp_rtol 1e-1'
+ date
Sun 08 Jan 2023 11:56:11 AM EST
+ for REFINE in 3
+ for NPIDX in 8
+ let 'N1 = 8'
+ let 'NODES = 8 * 8'
+ let 'N = 64 * 8 * 8'
+ export FILE=sol_64_3.h5
+ FILE=sol_64_3.h5
+ echo n= 4096 ' NODES=' 64 sol_64_3.h5
n= 4096 NODES= 64 sol_64_3.h5
++ printf %03d 64
+ foo=064
+ srun -n 4096 --ntasks-per-node=64 ../mhd -dm_refine_hierarchy 3 -petscspace_degree 2 -ts_dt 0.001 -mu 0.005 -eta 0.0001 -ts_max_steps 1 -ts_max_time -pc_type mg -ksp_type fgmres -ksp_converged_reason -pc_mg_type full -mg_levels_ksp_max_it 4 -mg_levels_ksp_type gmres -mg_levels_pc_type jacobi -log_view -ts_adapt_dt_max 0.01 -mg_coarse_pc_type gamg -mg_coarse_ksp_type fgmres -mg_coarse_mg_levels_ksp_type gmres -mg_coarse_ksp_rtol 1e-1
+ tee out_064_tilt_3_64
Test Type = tilt
Model Type = two-field
eta = 0.0001
mu = 0.005
DM Object: box 4096 MPI processes
type: plex
box in 2 dimensions:
Min/Max of 0-cells per rank: 81/90
Min/Max of 1-cells per rank: 208/216
Min/Max of 2-cells per rank: 128/128
Labels:
celltype: 3 strata with value/size (1 (208), 3 (128), 0 (81))
depth: 3 strata with value/size (0 (81), 1 (208), 2 (128))
marker: 1 strata with value/size (1 (33))
Face Sets: 1 strata with value/size (1 (30))
Defined by transform from:
DM_0x84000002_1 in 2 dimensions:
Min/Max of 0-cells per rank: 25/30
Min/Max of 1-cells per rank: 56/60
Min/Max of 2-cells per rank: 32/32
Labels:
celltype: 3 strata with value/size (1 (56), 3 (32), 0 (25))
depth: 3 strata with value/size (0 (25), 1 (56), 2 (32))
marker: 1 strata with value/size (1 (17))
Face Sets: 1 strata with value/size (1 (14))
Defined by transform from:
DM_0x84000002_2 in 2 dimensions:
Min/Max of 0-cells per rank: 9/12
Min/Max of 1-cells per rank: 16/18
Min/Max of 2-cells per rank: 8/8
Labels:
celltype: 3 strata with value/size (1 (16), 3 (8), 0 (9))
depth: 3 strata with value/size (0 (9), 1 (16), 2 (8))
marker: 1 strata with value/size (1 (9))
Face Sets: 1 strata with value/size (1 (6))
Defined by transform from:
DM_0x84000002_3 in 2 dimensions:
Min/Max of 0-cells per rank: 4/6
Min/Max of 1-cells per rank: 5/6
Min/Max of 2-cells per rank: 2/2
Labels:
depth: 3 strata with value/size (0 (4), 1 (5), 2 (2))
celltype: 3 strata with value/size (0 (4), 1 (5), 3 (2))
marker: 1 strata with value/size (1 (5))
Face Sets: 1 strata with value/size (1 (2))
0 TS dt 0.001 time 0.
MHD 0) time = 0, Eergy= 2.3259668002406e+00 (plot ID 0)
0 SNES Function norm 4.332169496840e-02
Linear solve converged due to CONVERGED_RTOL iterations 2
1 SNES Function norm 1.183091626579e-05
Linear solve converged due to CONVERGED_RTOL iterations 3
2 SNES Function norm 5.616046049129e-09
Linear solve converged due to CONVERGED_RTOL iterations 4
3 SNES Function norm 7.868995994841e-13
Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3
TSAdapt none beuler 0: step 0 accepted t=0 + 1.000e-03 dt=1.000e-03
1 TS dt 0.001 time 0.001
MHD 1) time = 0.001, Eergy= 2.3259660955830e+00 (plot ID 1)
****************************************************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
****************************************************************************************************************************************************************
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------
/gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/data/../mhd on a arch-olcf-crusher named crusher072 with 4096 processors, by adams Sun Jan 8 11:59:06 2023
Using 1 OpenMP threads
Using Petsc Development GIT revision: v3.18.3-352-g91c56366cb1 GIT Date: 2023-01-05 17:22:48 +0000
Max Max/Min Avg Total
Time (sec): 1.601e+02 1.001 1.600e+02
Objects: 5.255e+05 1.502 3.611e+05
Flops: 1.352e+09 37.758 1.420e+08 5.816e+11
Flops/sec: 8.460e+06 37.813 8.877e+05 3.636e+09
MPI Msg Count: 5.487e+06 11.915 1.512e+06 6.195e+09
MPI Msg Len (bytes): 3.733e+08 30.110 3.779e+01 2.341e+11
MPI Reductions: 4.769e+05 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 1.5996e+02 100.0% 5.8162e+11 100.0% 6.195e+09 100.0% 3.779e+01 100.0% 4.769e+05 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 86094 1.0 2.2476e+01 1.5 0.00e+00 0.0 2.7e+08 4.0e+00 0.0e+00 12 0 4 0 0 12 0 4 0 0 0
BuildTwoSidedF 125 1.0 6.2748e-02 1.3 0.00e+00 0.0 6.3e+05 5.4e+03 0.0e+00 0 0 0 1 0 0 0 0 1 0 0
DMCoarsen 3 1.0 2.3923e-02 10.2 0.00e+00 0.0 7.1e+04 3.0e+01 1.4e+01 0 0 0 0 0 0 0 0 0 0 0
DMRefine 24 1.0 3.3197e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMCreateInterp 3 1.0 8.2148e+00 1.0 1.41e+05 1.9 1.4e+09 1.5e+01 8.6e+04 5 0 23 9 18 5 0 23 9 18 52
DMCreateInject 3 1.0 1.1611e-02 1.1 8.52e+02 1.0 3.7e+04 3.0e+02 3.0e+01 0 0 0 0 0 0 0 0 0 0 301
DMCreateMat 4 1.0 3.7184e-02 1.0 0.00e+00 0.0 1.2e+06 1.4e+03 5.2e+01 0 0 0 1 0 0 0 0 1 0 0
DMPlexBuFrCeLi 1 1.0 2.8821e-01 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexBuCoFrCeLi 1 1.0 4.2694e-04 24.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
Mesh Partition 1 1.0 4.9001e+00 1.0 0.00e+00 0.0 1.6e+04 6.5e+01 7.0e+00 3 0 0 0 0 3 0 0 0 0 0
Mesh Migration 1 1.0 1.7576e+00 1.0 0.00e+00 0.0 9.0e+04 4.5e+01 2.7e+01 1 0 0 0 0 1 0 0 0 0 0
DMPlexPartSelf 1 1.0 1.5571e+00 46740.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexPartLblInv 1 1.0 2.4942e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 1 0 0 0 0 1 0 0 0 0 0
DMPlexPartLblSF 1 1.0 2.1375e+00 1.5 0.00e+00 0.0 6.1e+03 3.5e+01 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
DMPlexPartStrtSF 1 1.0 4.0332e-03 34.6 0.00e+00 0.0 4.1e+03 1.0e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexPointSF 1 1.0 2.3857e-01 6.1 0.00e+00 0.0 8.2e+03 1.3e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexInterp 25 1.0 2.7422e-01 61.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexDistribute 1 1.0 6.8859e+00 1.0 0.00e+00 0.0 1.1e+05 5.4e+01 3.7e+01 4 0 0 0 0 4 0 0 0 0 0
DMPlexDistCones 1 1.0 3.6314e-01 1.0 0.00e+00 0.0 2.3e+04 5.7e+01 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexDistLabels 1 1.0 1.0041e+00 1.0 0.00e+00 0.0 3.6e+04 4.7e+01 2.1e+01 1 0 0 0 0 1 0 0 0 0 0
DMPlexDistField 1 1.0 2.8359e-01 1.0 0.00e+00 0.0 2.7e+04 3.1e+01 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexStratify 61 1.0 1.4595e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 1 0 0 0 0 1 0 0 0 0 0
DMPlexSymmetrize 61 1.0 1.6725e-03 22.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DMPlexPrealloc 4 1.0 3.4322e-02 1.0 0.00e+00 0.0 1.2e+06 1.4e+03 4.4e+01 0 0 0 1 0 0 0 0 1 0 0
DMPlexResidualFE 4 1.0 1.4056e-02 1.1 1.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 323961
DMPlexJacobianFE 12 1.0 1.4956e-01 1.3 5.25e+06 1.0 3.2e+05 6.8e+03 1.2e+01 0 4 0 1 0 0 4 0 1 0 140688
DMPlexInterpFE 3 1.0 3.6915e-02 1.1 1.38e+04 1.0 1.6e+05 7.0e+03 3.9e+01 0 0 0 0 0 0 0 0 0 0 1533
DMPlexInjectorFE 3 1.0 9.5989e-03 1.1 8.52e+02 1.0 3.7e+04 3.0e+02 1.8e+01 0 0 0 0 0 0 0 0 0 0 364
DMPlexIntegralFEM 4 1.0 8.5999e-03 1.1 1.72e+05 1.3 9.8e+04 2.6e+02 4.0e+00 0 0 0 0 0 0 0 0 0 0 65039
SFSetGraph 86137 1.0 1.7477e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 86094 1.0 2.3369e+01 1.5 0.00e+00 0.0 8.0e+08 1.3e+01 0.0e+00 12 0 13 4 0 12 0 13 4 0 0
SFBcastBegin 172190 1.0 1.2254e+00 6.9 0.00e+00 0.0 2.1e+09 1.7e+01 0.0e+00 0 0 33 15 0 0 0 33 15 0 0
SFBcastEnd 172190 1.0 4.6465e+00 17.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
SFReduceBegin 28707 1.0 3.4174e-01 7.7 0.00e+00 0.0 5.1e+08 3.4e+01 0.0e+00 0 0 8 7 0 0 0 8 7 0 0
SFReduceEnd 28707 1.0 2.1009e+00 185.0 1.87e+03 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1
SFFetchOpBegin 8 1.0 3.9807e-04 9.8 0.00e+00 0.0 1.4e+05 1.3e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFFetchOpEnd 8 1.0 8.4017e-02 357.6 0.00e+00 0.0 1.4e+05 1.3e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFCreateEmbed 12 1.0 2.6587e-02 36.4 0.00e+00 0.0 2.0e+05 9.8e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFDistSection 57349 1.0 1.9829e+01 1.4 0.00e+00 0.0 2.8e+09 1.5e+01 5.7e+04 11 0 45 18 12 11 0 45 18 12 0
SFSectionSF 57360 1.0 1.4852e+01 1.8 0.00e+00 0.0 3.2e+07 1.0e+01 0.0e+00 7 0 1 0 0 7 0 1 0 0 0
SFRemoteOff 11 1.0 2.6552e-01 244.6 0.00e+00 0.0 4.5e+05 2.3e+01 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFPack 415943 1.0 3.0997e-01 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFUnpack 415951 1.0 2.3806e-01 8.3 1.64e+07 4096.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2510
MatMult 164941 1.0 1.7551e+01 38.4 1.17e+09 59.2 2.7e+09 6.0e+01 0.0e+00 2 75 43 68 0 2 75 43 68 0 24851
MatMultAdd 25725 1.0 2.6326e+00 49.5 4.32e+07 44.0 8.1e+07 5.6e+01 0.0e+00 1 3 1 2 0 1 3 1 2 0 5822
MatMultTranspose 25770 1.0 4.0162e+00 86.0 6.07e+07 30.8 8.2e+07 5.8e+01 0.0e+00 0 4 1 2 0 0 4 1 2 0 5311
MatSolve 8557 0.0 1.4780e-02 0.0 3.28e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2216
MatLUFactorSym 2 1.0 1.3721e-04 10.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 3 1.0 2.1736e-04 26.9 1.68e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 771
MatConvert 1 1.0 8.2123e-04 2.5 0.00e+00 0.0 6.5e+04 6.1e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 6 1.0 5.9544e-05 7.8 4.96e+03 155.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21676
MatResidual 25725 1.0 3.4553e+00 51.1 1.90e+08 90.1 3.8e+08 6.0e+01 0.0e+00 0 10 6 10 0 0 10 6 10 0 17058
MatAssemblyBegin 183 1.0 5.9995e-02 1.3 0.00e+00 0.0 6.3e+05 5.4e+03 0.0e+00 0 0 0 1 0 0 0 0 1 0 0
MatAssemblyEnd 183 1.0 6.1860e-02 1.8 1.79e+05 0.0 0.0e+00 0.0e+00 1.9e+02 0 0 0 0 0 0 0 0 0 0 2259
MatGetRowIJ 2 0.0 5.0017e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMat 6 1.0 1.7156e-02 1.0 0.00e+00 0.0 3.3e+03 6.7e+02 7.8e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 2 0.0 1.5178e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 3 1.0 8.4847e-02 1.0 5.50e+02 34.4 1.7e+06 6.8e+00 1.4e+02 0 0 0 0 0 0 0 0 0 0 7
MatZeroEntries 25 1.0 4.5243e-04 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 6 1.0 4.3716e-03 1.4 1.25e+03 62.4 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 0 106
MatTranspose 20 1.0 2.4047e-02 1.0 0.00e+00 0.0 2.0e+05 1.6e+01 3.6e+01 0 0 0 0 0 0 0 0 0 0 0
MatMatMultSym 18 1.0 8.2997e-03 1.4 0.00e+00 0.0 1.5e+05 1.1e+02 3.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatMatMultNum 24 1.0 2.2100e-03 2.3 7.44e+05 612.7 4.8e+04 2.2e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 53891
MatPtAPSymbolic 7 1.0 9.4411e-02 1.0 0.00e+00 0.0 4.8e+05 2.1e+02 4.9e+01 0 0 0 0 0 0 0 0 0 0 0
MatPtAPNumeric 10 1.0 3.1720e-02 1.0 2.07e+06 748.2 1.0e+05 1.0e+03 2.8e+01 0 0 0 0 0 0 0 0 0 0 9180
MatGetLocalMat 18 1.0 3.4709e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 18 1.0 7.4246e-03 4.2 0.00e+00 0.0 6.3e+05 2.2e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecDot 3 1.0 7.5026e-04 3.6 7.34e+03 1.4 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 33461
VecMDot 111712 1.0 5.1684e+01 1.4 2.32e+07 12.8 0.0e+00 0.0e+00 1.1e+05 30 4 0 0 23 30 4 0 0 23 499
VecNorm 163478 1.0 6.6660e+01 1.2 1.51e+07 21.5 0.0e+00 0.0e+00 1.6e+05 39 2 0 0 34 39 2 0 0 34 139
VecScale 163474 1.0 4.1438e-02 2.3 7.62e+06 20.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 114115
VecCopy 51501 1.0 1.0359e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 154784 1.0 2.2054e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 77515 1.0 1.5456e-02 1.8 7.34e+06 23.6 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 260335
VecAYPX 25725 1.0 4.0271e-03 2.0 1.22e+06 26.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 159416
VecAXPBYCZ 5 1.0 3.0874e-04 15.7 1.84e+04 1.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 203383
VecWAXPY 273 1.0 4.0046e-04 4.4 1.66e+04 4.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 74858
VecMAXPY 163474 1.0 2.8194e-02 1.6 3.28e+07 12.8 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 1182751
VecAssemblyBegin 54 1.0 7.6876e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 54 1.0 2.9547e-04 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 154605 1.0 2.6318e-02 2.0 7.36e+06 22.1 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 156279
VecScatterBegin 215038 1.0 4.8206e+00 30.5 0.00e+00 0.0 2.8e+09 6.0e+01 0.0e+00 0 0 45 72 0 0 0 45 72 0 0
VecScatterEnd 215038 1.0 1.8449e+01 81.5 1.64e+07 4357.6 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 32
VecReduceArith 6 1.0 5.2762e-05 13.8 1.47e+04 1.4 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 951609
VecReduceComm 3 1.0 5.8082e-04 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 154599 1.0 6.3942e+01 1.2 2.19e+07 23.3 0.0e+00 0.0e+00 1.5e+05 38 2 0 0 32 38 2 0 0 32 189
DualSpaceSetUp 30 1.0 1.5583e-01 35.4 1.91e+03 1.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 50
FESetUp 30 1.0 4.3889e-01 6.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
TSStep 1 1.0 1.2428e+02 1.0 1.35e+09 38.0 4.3e+09 4.5e+01 3.6e+05 78 100 69 83 76 78 100 69 83 76 4674
TSFunctionEval 4 1.0 2.2100e-02 1.6 1.19e+06 1.1 2.1e+05 3.2e+02 0.0e+00 0 1 0 0 0 0 1 0 0 0 207155
TSJacobianEval 12 1.0 1.5598e-01 1.2 5.40e+06 1.1 8.3e+05 2.7e+03 1.8e+01 0 4 0 1 0 0 4 0 1 0 135286
KSPSetUp 33 1.0 6.4498e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 3 1.0 1.1553e+02 1.0 1.34e+09 47.1 2.8e+09 6.0e+01 2.8e+05 72 95 45 72 58 72 95 45 72 58 4772
KSPGMRESOrthog 111712 1.0 5.1726e+01 1.4 4.67e+07 12.3 0.0e+00 0.0e+00 1.1e+05 30 9 0 0 23 30 9 0 0 23 1011
SNESSolve 1 1.0 1.2426e+02 1.0 1.35e+09 38.0 4.3e+09 4.5e+01 3.6e+05 78 100 69 82 76 78 100 69 82 76 4675
SNESSetUp 1 1.0 1.5743e-02 1.0 0.00e+00 0.0 3.0e+05 2.9e+03 1.3e+01 0 0 0 0 0 0 0 0 0 0 0
SNESFunctionEval 4 1.0 2.2331e-02 1.6 1.20e+06 1.1 2.1e+05 3.2e+02 0.0e+00 0 1 0 0 0 0 1 0 0 0 207265
SNESJacobianEval 12 1.0 1.5644e-01 1.2 5.40e+06 1.1 8.3e+05 2.7e+03 2.4e+01 0 4 0 1 0 0 4 0 1 0 134890
SNESLineSearch 3 1.0 1.8880e-02 1.0 1.27e+06 1.1 2.7e+05 3.8e+02 1.2e+01 0 1 0 0 0 0 1 0 0 0 250629
PCSetUp_GAMG+ 3 1.0 2.9065e-01 1.0 2.41e+06 368.0 3.1e+06 9.0e+01 6.6e+02 0 0 0 0 0 0 0 0 0 0 1370
PCGAMGCreateG 3 1.0 1.2053e-02 1.0 6.87e+02 171.8 1.8e+05 1.6e+01 5.4e+01 0 0 0 0 0 0 0 0 0 0 16
GAMG Coarsen 3 1.0 8.4956e-02 1.0 5.50e+02 34.4 1.7e+06 6.8e+00 1.4e+02 0 0 0 0 0 0 0 0 0 0 7
GAMG MIS/Agg 3 1.0 8.4865e-02 1.0 5.50e+02 34.4 1.7e+06 6.8e+00 1.4e+02 0 0 0 0 0 0 0 0 0 0 7
PCGAMGProl 3 1.0 2.2059e-02 1.0 0.00e+00 0.0 1.1e+05 5.0e+01 1.3e+02 0 0 0 0 0 0 0 0 0 0 0
GAMG Prol-col 3 1.0 1.8696e-02 1.0 0.00e+00 0.0 1.1e+05 1.9e+01 1.0e+02 0 0 0 0 0 0 0 0 0 0 0
GAMG Prol-lift 3 1.0 2.0733e-03 1.1 0.00e+00 0.0 8.5e+03 4.3e+02 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
PCGAMGOptProl 3 1.0 1.6406e-02 1.0 3.51e+05 120.5 6.8e+05 8.0e+01 1.1e+02 0 0 0 0 0 0 0 0 0 0 6513
GAMG smooth 3 1.0 8.6092e-03 1.4 9.23e+04 156.0 1.8e+05 1.5e+02 3.0e+01 0 0 0 0 0 0 0 0 0 0 3065
PCGAMGCreateL 3 1.0 4.9664e-02 1.0 6.52e+05 714.4 2.0e+05 3.7e+02 1.8e+02 0 0 0 0 0 0 0 0 0 0 1949
GAMG PtAP 3 1.0 1.8299e-02 1.0 6.52e+05 714.4 2.0e+05 3.7e+02 3.3e+01 0 0 0 0 0 0 0 0 0 0 5290
GAMG Reduce 3 1.0 3.1419e-02 1.0 0.00e+00 0.0 7.0e+03 3.5e+02 1.5e+02 0 0 0 0 0 0 0 0 0 0 0
PCGAMG Gal l00 3 1.0 9.4588e-02 1.0 3.25e+05 118.7 4.2e+05 4.4e+02 2.2e+01 0 0 0 0 0 0 0 0 0 0 2688
PCGAMG Opt l00 1 1.0 3.5215e-03 2.0 2.19e+04 42.8 1.7e+05 1.5e+02 8.0e+00 0 0 0 0 0 0 0 0 0 0 6627
PCGAMG Gal l01 3 1.0 1.9694e-02 1.3 9.33e+05 0.0 8.2e+03 1.8e+03 2.2e+01 0 0 0 0 0 0 0 0 0 0 1505
PCGAMG Opt l01 1 1.0 1.5205e-03 1.1 2.78e+04 0.0 2.4e+03 3.2e+02 8.0e+00 0 0 0 0 0 0 0 0 0 0 745
PCGAMG Gal l02 3 1.0 1.0422e-02 1.4 1.27e+06 0.0 6.1e+02 2.2e+03 2.2e+01 0 0 0 0 0 0 0 0 0 0 650
PCGAMG Opt l02 1 1.0 1.5479e-03 1.0 6.39e+04 0.0 2.1e+02 5.9e+02 8.0e+00 0 0 0 0 0 0 0 0 0 0 232
PCSetUp 6 1.0 8.6265e+00 1.0 4.24e+06 2.5 1.5e+09 1.6e+01 8.7e+04 5 1 23 10 18 5 1 23 10 18 904
PCSetUpOnBlocks 8557 1.0 4.3035e-03 4.2 1.68e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 39
PCApply 9 1.0 1.1553e+02 1.0 1.34e+09 48.4 2.8e+09 6.0e+01 2.8e+05 72 94 45 72 58 72 94 45 72 58 4738
--- Event Stage 1: Unknown
------------------------------------------------------------------------------------------------------------------------
Object Type Creations Destructions. Reports information only for process 0.
--- Event Stage 0: Main Stage
Container 107 107
Distributed Mesh 138 138
DM Label 57648 57648
Quadrature 512 512
Mesh Transform 177 177
Index Set 190065 190065
IS L to G Mapping 5 5
Section 115247 115247
Star Forest Graph 86394 86394
Discrete System 261 261
Weak Form 277 277
GraphPartitioner 62 62
Matrix 314 314
Matrix Coarsen 3 3
Vector 497 497
Linear Space 8 8
Dual Space 58 58
FE Space 30 30
Viewer 2 1
TSAdapt 1 1
TS 1 1
DMTS 1 1
SNES 1 1
DMSNES 6 6
SNESLineSearch 1 1
Krylov Solver 13 13
DMKSP interface 4 4
Preconditioner 13 13
Field over DM 4 4
PetscRandom 3 3
--- Event Stage 1: Unknown
========================================================================================================================
Average time to get PetscTime(): 4.61e-08
Average time for MPI_Barrier(): 3.05634e-05
Average time for zero size MPI_Send(): 9.29304e-06
#PETSc Option Table entries:
-dm_plex_box_faces 64,64 # (source: code)
-dm_plex_box_lower -2,-2 # (source: code)
-dm_plex_box_upper 2,2 # (source: code)
-dm_plex_simplex 1 # (source: file)
-dm_refine_hierarchy 3 # (source: command line)
-eta 0.0001 # (source: command line)
-ksp_converged_reason # (source: command line)
-ksp_max_it 50 # (source: file)
-ksp_rtol 1e-3 # (source: file)
-ksp_type fgmres # (source: command line)
-log_view # (source: command line)
-mg_coarse_ksp_rtol 1e-1 # (source: command line)
-mg_coarse_ksp_type fgmres # (source: command line)
-mg_coarse_mg_levels_ksp_type gmres # (source: command line)
-mg_coarse_pc_type gamg # (source: command line)
-mg_levels_ksp_max_it 4 # (source: command line)
-mg_levels_ksp_type gmres # (source: command line)
-mg_levels_pc_type jacobi # (source: command line)
-mu 0.005 # (source: command line)
-options_left # (source: file)
-pc_mg_type full # (source: command line)
-pc_type mg # (source: command line)
-petscpartitioner_type simple # (source: code)
-petscspace_degree 2 # (source: command line)
-snes_converged_reason # (source: file)
-snes_max_it 10 # (source: file)
-snes_monitor # (source: file)
-snes_rtol 1.e-9 # (source: file)
-snes_stol 1.e-9 # (source: file)
-ts_adapt_dt_max 0.01 # (source: command line)
-ts_adapt_monitor # (source: file)
-ts_arkimex_type 1bee # (source: file)
-ts_dt 0.001 # (source: command line)
-ts_max_reject 10 # (source: file)
-ts_max_snes_failures -1 # (source: file)
-ts_max_steps 1 # (source: command line)
-ts_max_time # (source: command line)
-ts_monitor # (source: file)
-ts_type beuler # (source: file)
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn LIBS="-L/opt/cray/pe/mpich/8.1.17/gtl/lib -lmpi_gtl_hsa" --with-openmp=1 --with-debugging=0 --with-64-bit-indices=0 --with-mpiexec=srun --download-superlu --download-superlu_dist --download-mumps --download-scalapack --download-hdf5=1 --download-triangle PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2023-01-07 16:22:45 on login2
Machine characteristics: Linux-5.3.18-150300.59.87_11.0.78-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------
Using C compiler: cc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3 -fopenmp
Using Fortran compiler: ftn -fPIC -fopenmp -fopenmp
-----------------------------------------
Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/opt/cray/pe/mpich/8.1.17/gtl/lib -ldmumps -lmumps_common -lpord -lpthread -lscalapack -lsuperlu -lsuperlu_dist -lhdf5_hl -lhdf5 -ltriangle -lquadmath -lmpifort_cray -lmpi_gtl_hsa
-----------------------------------------
#PETSc Option Table entries:
-dm_plex_box_faces 64,64 # (source: code)
-dm_plex_box_lower -2,-2 # (source: code)
-dm_plex_box_upper 2,2 # (source: code)
-dm_plex_simplex 1 # (source: file)
-dm_refine_hierarchy 3 # (source: command line)
-eta 0.0001 # (source: command line)
-ksp_converged_reason # (source: command line)
-ksp_max_it 50 # (source: file)
-ksp_rtol 1e-3 # (source: file)
-ksp_type fgmres # (source: command line)
-log_view # (source: command line)
-mg_coarse_ksp_rtol 1e-1 # (source: command line)
-mg_coarse_ksp_type fgmres # (source: command line)
-mg_coarse_mg_levels_ksp_type gmres # (source: command line)
-mg_coarse_pc_type gamg # (source: command line)
-mg_levels_ksp_max_it 4 # (source: command line)
-mg_levels_ksp_type gmres # (source: command line)
-mg_levels_pc_type jacobi # (source: command line)
-mu 0.005 # (source: command line)
-options_left # (source: file)
-pc_mg_type full # (source: command line)
-pc_type mg # (source: command line)
-petscpartitioner_type simple # (source: code)
-petscspace_degree 2 # (source: command line)
-snes_converged_reason # (source: file)
-snes_max_it 10 # (source: file)
-snes_monitor # (source: file)
-snes_rtol 1.e-9 # (source: file)
-snes_stol 1.e-9 # (source: file)
-ts_adapt_dt_max 0.01 # (source: command line)
-ts_adapt_monitor # (source: file)
-ts_arkimex_type 1bee # (source: file)
-ts_dt 0.001 # (source: command line)
-ts_max_reject 10 # (source: file)
-ts_max_snes_failures -1 # (source: file)
-ts_max_steps 1 # (source: command line)
-ts_max_time # (source: command line)
-ts_monitor # (source: file)
-ts_type beuler # (source: file)
#End of PETSc Option Table entries
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
There is one unused database option. It is:
Option left: name:-ts_arkimex_type value: 1bee source: file
+ date
Sun 08 Jan 2023 11:59:07 AM EST
More information about the petsc-dev
mailing list