[petsc-dev] bad cpu/MPI performance problem

Mark Adams mfadams at lbl.gov
Sun Jan 8 11:21:08 CST 2023

I am running on Crusher, CPU only, 64 cores per node with Plex/PetscFE.
In going up to 64 nodes, something really catastrophic is happening.
I understand I am not using the machine the way it was intended, but I just
want to see if there are any options that I could try for a quick fix/help.

In a debug build I get a stack trace on many but not all of the 4K
Alas, I am not sure why this job was terminated but every process that I
checked, that had an "ERROR", had this stack:

11:57 main *+= crusher:/gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/data$
grep ERROR slurm-245063.out |g 3160
[3160]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
batch system) has told this process to end
[3160]PETSC ERROR: Try option -start_in_debugger or
[3160]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and
[3160]PETSC ERROR: ---------------------  Stack Frames
[3160]PETSC ERROR: The line numbers in the error traceback are not always
[3160]PETSC ERROR: #1 MPI function
[3160]PETSC ERROR: #2 PetscCommDuplicate() at
[3160]PETSC ERROR: #3 PetscHeaderCreate_Private() at
[3160]PETSC ERROR: #4 PetscSFCreate() at
[3160]PETSC ERROR: #5 DMLabelGather() at
[3160]PETSC ERROR: #6 DMPlexLabelComplete_Internal() at
[3160]PETSC ERROR: #7 DMPlexLabelComplete() at
[3160]PETSC ERROR: #8 DMCompleteBCLabels_Internal() at
[3160]PETSC ERROR: #9 DMCopyDS() at
[3160]PETSC ERROR: #10 DMCopyDisc() at
[3160]PETSC ERROR: #11 SetupDiscretization() at

Maybe the MPI is just getting overwhelmed*.*

And I was able to get one run to to work (one TS with beuler), and the
solver performance was horrendous and I see this (attached):

Time (sec):           1.601e+02     1.001   1.600e+02
VecMDot           111712 1.0 5.1684e+01 1.4 2.32e+07 12.8 0.0e+00 0.0e+00
1.1e+05 30  4  0  0 23  30  4  0  0 23   499
VecNorm           163478 1.0 6.6660e+01 1.2 1.51e+07 21.5 0.0e+00 0.0e+00
1.6e+05 39  2  0  0 34  39  2  0  0 34   139
VecNormalize      154599 1.0 6.3942e+01 1.2 2.19e+07 23.3 0.0e+00 0.0e+00
1.5e+05 38  2  0  0 32  38  2  0  0 32   189
KSPSolve               3 1.0 1.1553e+02 1.0 1.34e+09 47.1 2.8e+09 6.0e+01
2.8e+05 72 95 45 72 58  72 95 45 72 58  4772

Any ideas would be welcome,
-------------- next part --------------
+ '[' -z '' ']'
+ case "$-" in
+ __lmod_vx=x
+ '[' -n x ']'
+ set +x
Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for this output (/usr/share/lmod/lmod/init/bash)
Shell debugging restarted
+ unset __lmod_vx
+ NG=8
+ NC=8
+ export OMP_PROC_BIND=true
+ NTPN=64
+ DT=0.001
+ MU=0.005
+ ETA=0.0001
+ TYPE=tilt
+ EXTRA1='-pc_type mg -ksp_type fgmres -ksp_converged_reason -pc_mg_type full -mg_levels_ksp_max_it 4 -mg_levels_ksp_type gmres -mg_levels_pc_type jacobi -log_view -ts_adapt_dt_max 0.01 -mg_coarse_pc_type gamg -mg_coarse_ksp_type fgmres -mg_coarse_mg_levels_ksp_type gmres -mg_coarse_ksp_rtol 1e-1'
+ date
Sun 08 Jan 2023 11:56:11 AM EST
+ for REFINE in 3
+ for NPIDX in 8
+ let 'N1 = 8'
+ let 'NODES = 8 * 8'
+ let 'N = 64 * 8 * 8'
+ export FILE=sol_64_3.h5
+ FILE=sol_64_3.h5
+ echo n= 4096 ' NODES=' 64 sol_64_3.h5
n= 4096  NODES= 64 sol_64_3.h5
++ printf %03d 64
+ foo=064
+ srun -n 4096 --ntasks-per-node=64 ../mhd -dm_refine_hierarchy 3 -petscspace_degree 2 -ts_dt 0.001 -mu 0.005 -eta 0.0001 -ts_max_steps 1 -ts_max_time -pc_type mg -ksp_type fgmres -ksp_converged_reason -pc_mg_type full -mg_levels_ksp_max_it 4 -mg_levels_ksp_type gmres -mg_levels_pc_type jacobi -log_view -ts_adapt_dt_max 0.01 -mg_coarse_pc_type gamg -mg_coarse_ksp_type fgmres -mg_coarse_mg_levels_ksp_type gmres -mg_coarse_ksp_rtol 1e-1
+ tee out_064_tilt_3_64
Test Type = tilt
Model Type = two-field
eta = 0.0001
mu = 0.005
DM Object: box 4096 MPI processes
  type: plex
box in 2 dimensions:
  Min/Max of 0-cells per rank: 81/90
  Min/Max of 1-cells per rank: 208/216
  Min/Max of 2-cells per rank: 128/128
  celltype: 3 strata with value/size (1 (208), 3 (128), 0 (81))
  depth: 3 strata with value/size (0 (81), 1 (208), 2 (128))
  marker: 1 strata with value/size (1 (33))
  Face Sets: 1 strata with value/size (1 (30))
  Defined by transform from:
  DM_0x84000002_1 in 2 dimensions:
    Min/Max of 0-cells per rank:   25/30  
    Min/Max of 1-cells per rank:   56/60  
    Min/Max of 2-cells per rank:   32/32  
    celltype: 3 strata with value/size (1 (56), 3 (32), 0 (25))
    depth: 3 strata with value/size (0 (25), 1 (56), 2 (32))
    marker: 1 strata with value/size (1 (17))
    Face Sets: 1 strata with value/size (1 (14))
    Defined by transform from:
    DM_0x84000002_2 in 2 dimensions:
      Min/Max of 0-cells per rank:     9/12    
      Min/Max of 1-cells per rank:     16/18    
      Min/Max of 2-cells per rank:     8/8    
      celltype: 3 strata with value/size (1 (16), 3 (8), 0 (9))
      depth: 3 strata with value/size (0 (9), 1 (16), 2 (8))
      marker: 1 strata with value/size (1 (9))
      Face Sets: 1 strata with value/size (1 (6))
      Defined by transform from:
      DM_0x84000002_3 in 2 dimensions:
        Min/Max of 0-cells per rank:       4/6      
        Min/Max of 1-cells per rank:       5/6      
        Min/Max of 2-cells per rank:       2/2      
        depth: 3 strata with value/size (0 (4), 1 (5), 2 (2))
        celltype: 3 strata with value/size (0 (4), 1 (5), 3 (2))
        marker: 1 strata with value/size (1 (5))
        Face Sets: 1 strata with value/size (1 (2))
0 TS dt 0.001 time 0.
MHD    0) time =         0, Eergy=  2.3259668002406e+00 (plot ID 0)
    0 SNES Function norm 4.332169496840e-02 
    Linear solve converged due to CONVERGED_RTOL iterations 2
    1 SNES Function norm 1.183091626579e-05 
    Linear solve converged due to CONVERGED_RTOL iterations 3
    2 SNES Function norm 5.616046049129e-09 
    Linear solve converged due to CONVERGED_RTOL iterations 4
    3 SNES Function norm 7.868995994841e-13 
  Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3
      TSAdapt none beuler 0: step   0 accepted t=0          + 1.000e-03 dt=1.000e-03 
1 TS dt 0.001 time 0.001
MHD    1) time =     0.001, Eergy=  2.3259660955830e+00 (plot ID 1)
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***

------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------

/gpfs/alpine/csc314/scratch/adams/mg-m3dc1/src/data/../mhd on a arch-olcf-crusher named crusher072 with 4096 processors, by adams Sun Jan  8 11:59:06 2023
Using 1 OpenMP threads
Using Petsc Development GIT revision: v3.18.3-352-g91c56366cb1  GIT Date: 2023-01-05 17:22:48 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           1.601e+02     1.001   1.600e+02
Objects:              5.255e+05     1.502   3.611e+05
Flops:                1.352e+09    37.758   1.420e+08  5.816e+11
Flops/sec:            8.460e+06    37.813   8.877e+05  3.636e+09
MPI Msg Count:        5.487e+06    11.915   1.512e+06  6.195e+09
MPI Msg Len (bytes):  3.733e+08    30.110   3.779e+01  2.341e+11
MPI Reductions:       4.769e+05     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 1.5996e+02 100.0%  5.8162e+11 100.0%  6.195e+09 100.0%  3.779e+01      100.0%  4.769e+05 100.0%

See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

BuildTwoSided      86094 1.0 2.2476e+01 1.5 0.00e+00 0.0 2.7e+08 4.0e+00 0.0e+00 12  0  4  0  0  12  0  4  0  0     0
BuildTwoSidedF       125 1.0 6.2748e-02 1.3 0.00e+00 0.0 6.3e+05 5.4e+03 0.0e+00  0  0  0  1  0   0  0  0  1  0     0
DMCoarsen              3 1.0 2.3923e-02 10.2 0.00e+00 0.0 7.1e+04 3.0e+01 1.4e+01  0  0  0  0  0   0  0  0  0  0     0
DMRefine              24 1.0 3.3197e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMCreateInterp         3 1.0 8.2148e+00 1.0 1.41e+05 1.9 1.4e+09 1.5e+01 8.6e+04  5  0 23  9 18   5  0 23  9 18    52
DMCreateInject         3 1.0 1.1611e-02 1.1 8.52e+02 1.0 3.7e+04 3.0e+02 3.0e+01  0  0  0  0  0   0  0  0  0  0   301
DMCreateMat            4 1.0 3.7184e-02 1.0 0.00e+00 0.0 1.2e+06 1.4e+03 5.2e+01  0  0  0  1  0   0  0  0  1  0     0
DMPlexBuFrCeLi         1 1.0 2.8821e-01 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexBuCoFrCeLi       1 1.0 4.2694e-04 24.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
Mesh Partition         1 1.0 4.9001e+00 1.0 0.00e+00 0.0 1.6e+04 6.5e+01 7.0e+00  3  0  0  0  0   3  0  0  0  0     0
Mesh Migration         1 1.0 1.7576e+00 1.0 0.00e+00 0.0 9.0e+04 4.5e+01 2.7e+01  1  0  0  0  0   1  0  0  0  0     0
DMPlexPartSelf         1 1.0 1.5571e+00 46740.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexPartLblInv       1 1.0 2.4942e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  1  0  0  0  0   1  0  0  0  0     0
DMPlexPartLblSF        1 1.0 2.1375e+00 1.5 0.00e+00 0.0 6.1e+03 3.5e+01 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
DMPlexPartStrtSF       1 1.0 4.0332e-03 34.6 0.00e+00 0.0 4.1e+03 1.0e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexPointSF          1 1.0 2.3857e-01 6.1 0.00e+00 0.0 8.2e+03 1.3e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexInterp          25 1.0 2.7422e-01 61.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexDistribute       1 1.0 6.8859e+00 1.0 0.00e+00 0.0 1.1e+05 5.4e+01 3.7e+01  4  0  0  0  0   4  0  0  0  0     0
DMPlexDistCones        1 1.0 3.6314e-01 1.0 0.00e+00 0.0 2.3e+04 5.7e+01 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexDistLabels       1 1.0 1.0041e+00 1.0 0.00e+00 0.0 3.6e+04 4.7e+01 2.1e+01  1  0  0  0  0   1  0  0  0  0     0
DMPlexDistField        1 1.0 2.8359e-01 1.0 0.00e+00 0.0 2.7e+04 3.1e+01 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexStratify        61 1.0 1.4595e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00  1  0  0  0  0   1  0  0  0  0     0
DMPlexSymmetrize      61 1.0 1.6725e-03 22.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexPrealloc         4 1.0 3.4322e-02 1.0 0.00e+00 0.0 1.2e+06 1.4e+03 4.4e+01  0  0  0  1  0   0  0  0  1  0     0
DMPlexResidualFE       4 1.0 1.4056e-02 1.1 1.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 323961
DMPlexJacobianFE      12 1.0 1.4956e-01 1.3 5.25e+06 1.0 3.2e+05 6.8e+03 1.2e+01  0  4  0  1  0   0  4  0  1  0 140688
DMPlexInterpFE         3 1.0 3.6915e-02 1.1 1.38e+04 1.0 1.6e+05 7.0e+03 3.9e+01  0  0  0  0  0   0  0  0  0  0  1533
DMPlexInjectorFE       3 1.0 9.5989e-03 1.1 8.52e+02 1.0 3.7e+04 3.0e+02 1.8e+01  0  0  0  0  0   0  0  0  0  0   364
DMPlexIntegralFEM       4 1.0 8.5999e-03 1.1 1.72e+05 1.3 9.8e+04 2.6e+02 4.0e+00  0  0  0  0  0   0  0  0  0  0 65039
SFSetGraph         86137 1.0 1.7477e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp            86094 1.0 2.3369e+01 1.5 0.00e+00 0.0 8.0e+08 1.3e+01 0.0e+00 12  0 13  4  0  12  0 13  4  0     0
SFBcastBegin      172190 1.0 1.2254e+00 6.9 0.00e+00 0.0 2.1e+09 1.7e+01 0.0e+00  0  0 33 15  0   0  0 33 15  0     0
SFBcastEnd        172190 1.0 4.6465e+00 17.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
SFReduceBegin      28707 1.0 3.4174e-01 7.7 0.00e+00 0.0 5.1e+08 3.4e+01 0.0e+00  0  0  8  7  0   0  0  8  7  0     0
SFReduceEnd        28707 1.0 2.1009e+00 185.0 1.87e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
SFFetchOpBegin         8 1.0 3.9807e-04 9.8 0.00e+00 0.0 1.4e+05 1.3e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFFetchOpEnd           8 1.0 8.4017e-02 357.6 0.00e+00 0.0 1.4e+05 1.3e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFCreateEmbed         12 1.0 2.6587e-02 36.4 0.00e+00 0.0 2.0e+05 9.8e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFDistSection      57349 1.0 1.9829e+01 1.4 0.00e+00 0.0 2.8e+09 1.5e+01 5.7e+04 11  0 45 18 12  11  0 45 18 12     0
SFSectionSF        57360 1.0 1.4852e+01 1.8 0.00e+00 0.0 3.2e+07 1.0e+01 0.0e+00  7  0  1  0  0   7  0  1  0  0     0
SFRemoteOff           11 1.0 2.6552e-01 244.6 0.00e+00 0.0 4.5e+05 2.3e+01 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFPack            415943 1.0 3.0997e-01 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack          415951 1.0 2.3806e-01 8.3 1.64e+07 4096.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2510
MatMult           164941 1.0 1.7551e+01 38.4 1.17e+09 59.2 2.7e+09 6.0e+01 0.0e+00  2 75 43 68  0   2 75 43 68  0 24851
MatMultAdd         25725 1.0 2.6326e+00 49.5 4.32e+07 44.0 8.1e+07 5.6e+01 0.0e+00  1  3  1  2  0   1  3  1  2  0  5822
MatMultTranspose   25770 1.0 4.0162e+00 86.0 6.07e+07 30.8 8.2e+07 5.8e+01 0.0e+00  0  4  1  2  0   0  4  1  2  0  5311
MatSolve            8557 0.0 1.4780e-02 0.0 3.28e+07 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2216
MatLUFactorSym         2 1.0 1.3721e-04 10.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         3 1.0 2.1736e-04 26.9 1.68e+05 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   771
MatConvert             1 1.0 8.2123e-04 2.5 0.00e+00 0.0 6.5e+04 6.1e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale               6 1.0 5.9544e-05 7.8 4.96e+03 155.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 21676
MatResidual        25725 1.0 3.4553e+00 51.1 1.90e+08 90.1 3.8e+08 6.0e+01 0.0e+00  0 10  6 10  0   0 10  6 10  0 17058
MatAssemblyBegin     183 1.0 5.9995e-02 1.3 0.00e+00 0.0 6.3e+05 5.4e+03 0.0e+00  0  0  0  1  0   0  0  0  1  0     0
MatAssemblyEnd       183 1.0 6.1860e-02 1.8 1.79e+05 0.0 0.0e+00 0.0e+00 1.9e+02  0  0  0  0  0   0  0  0  0  0  2259
MatGetRowIJ            2 0.0 5.0017e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMat        6 1.0 1.7156e-02 1.0 0.00e+00 0.0 3.3e+03 6.7e+02 7.8e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         2 0.0 1.5178e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen             3 1.0 8.4847e-02 1.0 5.50e+02 34.4 1.7e+06 6.8e+00 1.4e+02  0  0  0  0  0   0  0  0  0  0     7
MatZeroEntries        25 1.0 4.5243e-04 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                6 1.0 4.3716e-03 1.4 1.25e+03 62.4 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0   106
MatTranspose          20 1.0 2.4047e-02 1.0 0.00e+00 0.0 2.0e+05 1.6e+01 3.6e+01  0  0  0  0  0   0  0  0  0  0     0
MatMatMultSym         18 1.0 8.2997e-03 1.4 0.00e+00 0.0 1.5e+05 1.1e+02 3.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatMatMultNum         24 1.0 2.2100e-03 2.3 7.44e+05 612.7 4.8e+04 2.2e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0 53891
MatPtAPSymbolic        7 1.0 9.4411e-02 1.0 0.00e+00 0.0 4.8e+05 2.1e+02 4.9e+01  0  0  0  0  0   0  0  0  0  0     0
MatPtAPNumeric        10 1.0 3.1720e-02 1.0 2.07e+06 748.2 1.0e+05 1.0e+03 2.8e+01  0  0  0  0  0   0  0  0  0  0  9180
MatGetLocalMat        18 1.0 3.4709e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol         18 1.0 7.4246e-03 4.2 0.00e+00 0.0 6.3e+05 2.2e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecDot                 3 1.0 7.5026e-04 3.6 7.34e+03 1.4 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0 33461
VecMDot           111712 1.0 5.1684e+01 1.4 2.32e+07 12.8 0.0e+00 0.0e+00 1.1e+05 30  4  0  0 23  30  4  0  0 23   499
VecNorm           163478 1.0 6.6660e+01 1.2 1.51e+07 21.5 0.0e+00 0.0e+00 1.6e+05 39  2  0  0 34  39  2  0  0 34   139
VecScale          163474 1.0 4.1438e-02 2.3 7.62e+06 20.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 114115
VecCopy            51501 1.0 1.0359e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            154784 1.0 2.2054e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY            77515 1.0 1.5456e-02 1.8 7.34e+06 23.6 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 260335
VecAYPX            25725 1.0 4.0271e-03 2.0 1.22e+06 26.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 159416
VecAXPBYCZ             5 1.0 3.0874e-04 15.7 1.84e+04 1.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 203383
VecWAXPY             273 1.0 4.0046e-04 4.4 1.66e+04 4.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 74858
VecMAXPY          163474 1.0 2.8194e-02 1.6 3.28e+07 12.8 0.0e+00 0.0e+00 0.0e+00  0  6  0  0  0   0  6  0  0  0 1182751
VecAssemblyBegin      54 1.0 7.6876e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd        54 1.0 2.9547e-04 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult  154605 1.0 2.6318e-02 2.0 7.36e+06 22.1 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 156279
VecScatterBegin   215038 1.0 4.8206e+00 30.5 0.00e+00 0.0 2.8e+09 6.0e+01 0.0e+00  0  0 45 72  0   0  0 45 72  0     0
VecScatterEnd     215038 1.0 1.8449e+01 81.5 1.64e+07 4357.6 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0    32
VecReduceArith         6 1.0 5.2762e-05 13.8 1.47e+04 1.4 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 951609
VecReduceComm          3 1.0 5.8082e-04 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize      154599 1.0 6.3942e+01 1.2 2.19e+07 23.3 0.0e+00 0.0e+00 1.5e+05 38  2  0  0 32  38  2  0  0 32   189
DualSpaceSetUp        30 1.0 1.5583e-01 35.4 1.91e+03 1.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0    50
FESetUp               30 1.0 4.3889e-01 6.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
TSStep                 1 1.0 1.2428e+02 1.0 1.35e+09 38.0 4.3e+09 4.5e+01 3.6e+05 78 100 69 83 76  78 100 69 83 76  4674
TSFunctionEval         4 1.0 2.2100e-02 1.6 1.19e+06 1.1 2.1e+05 3.2e+02 0.0e+00  0  1  0  0  0   0  1  0  0  0 207155
TSJacobianEval        12 1.0 1.5598e-01 1.2 5.40e+06 1.1 8.3e+05 2.7e+03 1.8e+01  0  4  0  1  0   0  4  0  1  0 135286
KSPSetUp              33 1.0 6.4498e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               3 1.0 1.1553e+02 1.0 1.34e+09 47.1 2.8e+09 6.0e+01 2.8e+05 72 95 45 72 58  72 95 45 72 58  4772
KSPGMRESOrthog    111712 1.0 5.1726e+01 1.4 4.67e+07 12.3 0.0e+00 0.0e+00 1.1e+05 30  9  0  0 23  30  9  0  0 23  1011
SNESSolve              1 1.0 1.2426e+02 1.0 1.35e+09 38.0 4.3e+09 4.5e+01 3.6e+05 78 100 69 82 76  78 100 69 82 76  4675
SNESSetUp              1 1.0 1.5743e-02 1.0 0.00e+00 0.0 3.0e+05 2.9e+03 1.3e+01  0  0  0  0  0   0  0  0  0  0     0
SNESFunctionEval       4 1.0 2.2331e-02 1.6 1.20e+06 1.1 2.1e+05 3.2e+02 0.0e+00  0  1  0  0  0   0  1  0  0  0 207265
SNESJacobianEval      12 1.0 1.5644e-01 1.2 5.40e+06 1.1 8.3e+05 2.7e+03 2.4e+01  0  4  0  1  0   0  4  0  1  0 134890
SNESLineSearch         3 1.0 1.8880e-02 1.0 1.27e+06 1.1 2.7e+05 3.8e+02 1.2e+01  0  1  0  0  0   0  1  0  0  0 250629
PCSetUp_GAMG+          3 1.0 2.9065e-01 1.0 2.41e+06 368.0 3.1e+06 9.0e+01 6.6e+02  0  0  0  0  0   0  0  0  0  0  1370
 PCGAMGCreateG         3 1.0 1.2053e-02 1.0 6.87e+02 171.8 1.8e+05 1.6e+01 5.4e+01  0  0  0  0  0   0  0  0  0  0    16
 GAMG Coarsen          3 1.0 8.4956e-02 1.0 5.50e+02 34.4 1.7e+06 6.8e+00 1.4e+02  0  0  0  0  0   0  0  0  0  0     7
  GAMG MIS/Agg         3 1.0 8.4865e-02 1.0 5.50e+02 34.4 1.7e+06 6.8e+00 1.4e+02  0  0  0  0  0   0  0  0  0  0     7
 PCGAMGProl            3 1.0 2.2059e-02 1.0 0.00e+00 0.0 1.1e+05 5.0e+01 1.3e+02  0  0  0  0  0   0  0  0  0  0     0
  GAMG Prol-col        3 1.0 1.8696e-02 1.0 0.00e+00 0.0 1.1e+05 1.9e+01 1.0e+02  0  0  0  0  0   0  0  0  0  0     0
  GAMG Prol-lift       3 1.0 2.0733e-03 1.1 0.00e+00 0.0 8.5e+03 4.3e+02 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
 PCGAMGOptProl         3 1.0 1.6406e-02 1.0 3.51e+05 120.5 6.8e+05 8.0e+01 1.1e+02  0  0  0  0  0   0  0  0  0  0  6513
  GAMG smooth          3 1.0 8.6092e-03 1.4 9.23e+04 156.0 1.8e+05 1.5e+02 3.0e+01  0  0  0  0  0   0  0  0  0  0  3065
 PCGAMGCreateL         3 1.0 4.9664e-02 1.0 6.52e+05 714.4 2.0e+05 3.7e+02 1.8e+02  0  0  0  0  0   0  0  0  0  0  1949
  GAMG PtAP            3 1.0 1.8299e-02 1.0 6.52e+05 714.4 2.0e+05 3.7e+02 3.3e+01  0  0  0  0  0   0  0  0  0  0  5290
  GAMG Reduce          3 1.0 3.1419e-02 1.0 0.00e+00 0.0 7.0e+03 3.5e+02 1.5e+02  0  0  0  0  0   0  0  0  0  0     0
PCGAMG Gal l00         3 1.0 9.4588e-02 1.0 3.25e+05 118.7 4.2e+05 4.4e+02 2.2e+01  0  0  0  0  0   0  0  0  0  0  2688
PCGAMG Opt l00         1 1.0 3.5215e-03 2.0 2.19e+04 42.8 1.7e+05 1.5e+02 8.0e+00  0  0  0  0  0   0  0  0  0  0  6627
PCGAMG Gal l01         3 1.0 1.9694e-02 1.3 9.33e+05 0.0 8.2e+03 1.8e+03 2.2e+01  0  0  0  0  0   0  0  0  0  0  1505
PCGAMG Opt l01         1 1.0 1.5205e-03 1.1 2.78e+04 0.0 2.4e+03 3.2e+02 8.0e+00  0  0  0  0  0   0  0  0  0  0   745
PCGAMG Gal l02         3 1.0 1.0422e-02 1.4 1.27e+06 0.0 6.1e+02 2.2e+03 2.2e+01  0  0  0  0  0   0  0  0  0  0   650
PCGAMG Opt l02         1 1.0 1.5479e-03 1.0 6.39e+04 0.0 2.1e+02 5.9e+02 8.0e+00  0  0  0  0  0   0  0  0  0  0   232
PCSetUp                6 1.0 8.6265e+00 1.0 4.24e+06 2.5 1.5e+09 1.6e+01 8.7e+04  5  1 23 10 18   5  1 23 10 18   904
PCSetUpOnBlocks     8557 1.0 4.3035e-03 4.2 1.68e+05 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    39
PCApply                9 1.0 1.1553e+02 1.0 1.34e+09 48.4 2.8e+09 6.0e+01 2.8e+05 72 94 45 72 58  72 94 45 72 58  4738

--- Event Stage 1: Unknown


Object Type          Creations   Destructions. Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container   107            107
    Distributed Mesh   138            138
            DM Label 57648          57648
          Quadrature   512            512
      Mesh Transform   177            177
           Index Set 190065          190065
   IS L to G Mapping     5              5
             Section 115247          115247
   Star Forest Graph 86394          86394
     Discrete System   261            261
           Weak Form   277            277
    GraphPartitioner    62             62
              Matrix   314            314
      Matrix Coarsen     3              3
              Vector   497            497
        Linear Space     8              8
          Dual Space    58             58
            FE Space    30             30
              Viewer     2              1
             TSAdapt     1              1
                  TS     1              1
                DMTS     1              1
                SNES     1              1
              DMSNES     6              6
      SNESLineSearch     1              1
       Krylov Solver    13             13
     DMKSP interface     4              4
      Preconditioner    13             13
       Field over DM     4              4
         PetscRandom     3              3

--- Event Stage 1: Unknown

Average time to get PetscTime(): 4.61e-08
Average time for MPI_Barrier(): 3.05634e-05
Average time for zero size MPI_Send(): 9.29304e-06
#PETSc Option Table entries:
-dm_plex_box_faces 64,64 # (source: code)
-dm_plex_box_lower -2,-2 # (source: code)
-dm_plex_box_upper 2,2 # (source: code)
-dm_plex_simplex 1 # (source: file)
-dm_refine_hierarchy 3 # (source: command line)
-eta 0.0001 # (source: command line)
-ksp_converged_reason # (source: command line)
-ksp_max_it 50 # (source: file)
-ksp_rtol 1e-3 # (source: file)
-ksp_type fgmres # (source: command line)
-log_view # (source: command line)
-mg_coarse_ksp_rtol 1e-1 # (source: command line)
-mg_coarse_ksp_type fgmres # (source: command line)
-mg_coarse_mg_levels_ksp_type gmres # (source: command line)
-mg_coarse_pc_type gamg # (source: command line)
-mg_levels_ksp_max_it 4 # (source: command line)
-mg_levels_ksp_type gmres # (source: command line)
-mg_levels_pc_type jacobi # (source: command line)
-mu 0.005 # (source: command line)
-options_left # (source: file)
-pc_mg_type full # (source: command line)
-pc_type mg # (source: command line)
-petscpartitioner_type simple # (source: code)
-petscspace_degree 2 # (source: command line)
-snes_converged_reason # (source: file)
-snes_max_it 10 # (source: file)
-snes_monitor # (source: file)
-snes_rtol 1.e-9 # (source: file)
-snes_stol 1.e-9 # (source: file)
-ts_adapt_dt_max 0.01 # (source: command line)
-ts_adapt_monitor # (source: file)
-ts_arkimex_type 1bee # (source: file)
-ts_dt 0.001 # (source: command line)
-ts_max_reject 10 # (source: file)
-ts_max_snes_failures -1 # (source: file)
-ts_max_steps 1 # (source: command line)
-ts_max_time # (source: command line)
-ts_monitor # (source: file)
-ts_type beuler # (source: file)
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn LIBS="-L/opt/cray/pe/mpich/8.1.17/gtl/lib -lmpi_gtl_hsa" --with-openmp=1 --with-debugging=0 --with-64-bit-indices=0 --with-mpiexec=srun --download-superlu --download-superlu_dist --download-mumps --download-scalapack --download-hdf5=1 --download-triangle PETSC_ARCH=arch-olcf-crusher
Libraries compiled on 2023-01-07 16:22:45 on login2 
Machine characteristics: Linux-5.3.18-150300.59.87_11.0.78-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher

Using C compiler: cc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3   -fopenmp 
Using Fortran compiler: ftn  -fPIC   -fopenmp    -fopenmp

Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/opt/cray/pe/mpich/8.1.17/gtl/lib -ldmumps -lmumps_common -lpord -lpthread -lscalapack -lsuperlu -lsuperlu_dist -lhdf5_hl -lhdf5 -ltriangle -lquadmath -lmpifort_cray -lmpi_gtl_hsa

+ date
Sun 08 Jan 2023 11:59:07 AM EST

