0 KSP Residual norm 7.451923898937e+05 
  1 KSP Residual norm 4.190843782941e+04 
  2 KSP Residual norm 1.798945700462e+04 
  3 KSP Residual norm 1.223044424436e+04 
  4 KSP Residual norm 4.174475088375e+03 
  5 KSP Residual norm 2.268425004557e+03 
  6 KSP Residual norm 1.421063929846e+03 
  7 KSP Residual norm 8.696045866455e+02 
  8 KSP Residual norm 6.215492727051e+02 
  9 KSP Residual norm 3.584359100324e+02 
 10 KSP Residual norm 2.222233124404e+02 
 11 KSP Residual norm 1.453955901066e+02 
 12 KSP Residual norm 1.147316865075e+02 
 13 KSP Residual norm 6.529686801408e+01 
 14 KSP Residual norm 3.243905036721e+01 
 15 KSP Residual norm 2.576701637974e+01 
 16 KSP Residual norm 2.016990220700e+01 
 17 KSP Residual norm 1.413112299877e+01 
 18 KSP Residual norm 9.367231136940e+00 
 19 KSP Residual norm 4.844577180017e+00 
 20 KSP Residual norm 3.906548874809e+00 
 21 KSP Residual norm 2.975179464759e+00 
 22 KSP Residual norm 2.127389637593e+00 
 23 KSP Residual norm 1.031794005392e+00 
 24 KSP Residual norm 8.259219463880e-01 
 25 KSP Residual norm 6.766239785615e-01 
 26 KSP Residual norm 4.583503891554e-01 
 27 KSP Residual norm 2.462126830750e-01 
 28 KSP Residual norm 1.925471110482e-01 
 29 KSP Residual norm 9.494938545903e-02 
 30 KSP Residual norm 6.330021551417e-02 
KSP Object: 1 MPI processes
  type: dgmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    happy breakdown tolerance 1e-30
      Adaptive strategy is used: TRUE
     Frequency of extracted eigenvalues = 1 using Ritz values 
     Total number of extracted eigenvalues = 0
     Maximum number of eigenvalues set to be extracted = 9
     relaxation parameter for the adaptive strategy(smv)  = 1.
     Number of matvecs : 31
  maximum iterations=10000, nonzero initial guess
  tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: hypre
    HYPRE BoomerAMG preconditioning
      Cycle type V
      Maximum number of levels 25
      Maximum number of iterations PER hypre call 1
      Convergence tolerance PER hypre call 0.
      Threshold for strong coupling 0.25
      Interpolation truncation factor 0.2
      Interpolation: max elements per row 0
      Number of levels of aggressive coarsening 4
      Number of paths for aggressive coarsening 4
      Maximum row sums 0.9
      Sweeps down         1
      Sweeps up           1
      Sweeps on coarse    1
      Relax down          l1scaled-Jacobi
      Relax up            l1scaled-Jacobi
      Relax on coarse     Gaussian-elimination
      Relax weight  (all)      1.
      Outer relax weight (all) 1.
      Not using CF-relaxation
      Not using more complex smoothers.
      Measure type        local
      Coarsen type        Falgout
      Interpolation type  ext+i
      SpGEMM type         cusparse
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: seqaijcusparse
    rows=1468928, cols=1468928
    total: nonzeros=13176768, allocated nonzeros=0
    total number of mallocs used during MatSetValues calls=0
      not using I-node routines
 solver time 2.97172329100000043
**************************************** ***********************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: -------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


./test_miniapp on a  named r228n03 with 1 processor, by nvarini1 Thu Nov 24 16:25:34 2022
Using Petsc Release Version 3.17.5, unknown 

                         Max       Max/Min     Avg       Total
Time (sec):           4.042e+00     1.000   4.042e+00
Objects:              4.300e+01     1.000   4.300e+01
Flops:                3.737e+09     1.000   3.737e+09  3.737e+09
Flops/sec:            9.245e+08     1.000   9.245e+08  9.245e+08
Memory (bytes):       2.960e+07     1.000   2.960e+07  2.960e+07
MPI Msg Count:        0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Len (bytes):  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 4.0423e+00 100.0%  3.7372e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult               31 1.0 3.2221e-02 1.0 7.71e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1 21  0  0  0   1 21  0  0  0 23942   66984      2 1.76e+02    0 0.00e+00 100
MatConvert             2 1.0 3.9585e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10  0  0  0  0  10  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyBegin       2 1.0 3.6720e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd         2 1.0 4.1741e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10  0  0  0  0  10  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatGetRowIJ            1 1.0 3.0390e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatView                1 1.0 9.9941e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatCUSPARSCopyTo       1 1.0 1.9756e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      1 1.64e+02    0 0.00e+00  0
VecMDot               30 1.0 1.6379e-02 1.0 1.37e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0 37  0  0  0   0 37  0  0  0 83408   101899      0 0.00e+00    0 0.00e+00 100
VecNorm               32 1.0 3.4793e-03 1.0 9.40e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0 27020   34285      0 0.00e+00    0 0.00e+00 100
VecScale              31 1.0 1.7788e-03 1.0 4.55e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 25600   40041      0 0.00e+00    0 0.00e+00 100
VecCopy                1 1.0 2.6919e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    1 1.18e+01  0
VecSet                33 1.0 1.7317e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY                2 1.0 4.2070e-04 1.0 5.88e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 13967   37766      1 1.18e+01    0 0.00e+00 100
VecMAXPY              31 1.0 2.3615e-02 1.0 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1 39  0  0  0   1 39  0  0  0 61581   63417      0 0.00e+00    0 0.00e+00 100
VecAssemblyBegin       2 1.0 1.1410e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAssemblyEnd         2 1.0 9.2100e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecNormalize          31 1.0 5.2003e-03 1.0 1.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 26270   36055      0 0.00e+00    0 0.00e+00 100
VecCUDACopyTo          3 1.0 5.7760e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      3 3.53e+01    0 0.00e+00  0
VecCUDACopyFrom        3 1.0 6.9723e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    3 3.53e+01  0
KSPSetUp               1 1.0 2.9976e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               1 1.0 1.9746e-01 1.0 3.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00  5100  0  0  0   5100  0  0  0 18927   69104      4 1.99e+02    1 1.18e+01 100
KSPGMRESOrthog        30 1.0 3.9150e-02 1.0 2.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1 73  0  0  0   1 73  0  0  0 69790   78168      0 0.00e+00    0 0.00e+00 100
PCSetUp                1 1.0 2.7708e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 69  0  0  0  0  69  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply               32 1.0 9.8630e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0       0      1 1.18e+01    0 0.00e+00  0
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     2              2      5882160     0.
              Vector    37             37     35316728     0.
       Krylov Solver     1              1        19136     0.
      Preconditioner     1              1         1528     0.
              Viewer     2              1          856     0.
========================================================================================================================
Average time to get PetscTime(): 3.69e-08
#PETSc Option Table entries:
-ksp_initial_guess_nonzero yes
-ksp_monitor
-ksp_reuse_preconditioner yes
-ksp_rtol 1e-7
-ksp_type dgmres
-ksp_view
-log_view
-mat_type seqaijcusparse
-pc_hypre_boomeramg_agg_nl 4
-pc_hypre_boomeramg_agg_num_paths 4
-pc_hypre_boomeramg_coarsen_type Falgout
-pc_hypre_boomeramg_interp_type ext+i
-pc_hypre_boomeramg_no_CF false
-pc_hypre_boomeramg_strong_threshold 0.25
-pc_hypre_boomeramg_truncfactor 0.2
-pc_hypre_type boomeramg
-pc_type hypre
-vec_type cuda
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/m100_work/FUAC6_GBS2N/petsc-install-3.17 --with-cxx=mpixlC --with-cc=mpixlc --with-fc=mpixlf --with-cuda=1 --download-hypre --download-hypre-configure-arguments=--enable-unified-memory --download-fblaslapack=1
-----------------------------------------
Libraries compiled on 2022-11-22 15:09:44 on login03 
Machine characteristics: Linux-4.18.0-147.51.2.el8_1.ppc64le-ppc64le-with-redhat-8.1-Ootpa
Using PETSc directory: /m100_work/FUAC6_GBS2N/petsc-install-3.17
Using PETSc arch: 
-----------------------------------------

Using C compiler: mpixlc  -qPIC -g -O0   
Using Fortran compiler: mpixlf  -qPIC -g -O0     
-----------------------------------------

Using include paths: -I/m100_work/FUAC6_GBS2N/petsc-install-3.17/include -I/cineca/prod/opt/compilers/cuda/11.0/none/include
-----------------------------------------

Using C linker: mpixlc
Using Fortran linker: mpixlf
Using libraries: -Wl,-rpath,/m100_work/FUAC6_GBS2N/petsc-install-3.17/lib -L/m100_work/FUAC6_GBS2N/petsc-install-3.17/lib -lpetsc -Wl,-rpath,/m100_work/FUAC6_GBS2N/petsc-install-3.17/lib -L/m100_work/FUAC6_GBS2N/petsc-install-3.17/lib -Wl,-rpath,/cineca/prod/opt/compilers/cuda/11.0/none/lib64 -L/cineca/prod/opt/compilers/cuda/11.0/none/lib64 -L/cineca/prod/opt/compilers/cuda/11.0/none/lib64/stubs -Wl,-rpath,/cineca/prod/opt/compilers/spectrum_mpi/10.3.1/binary/lib -L/cineca/prod/opt/compilers/spectrum_mpi/10.3.1/binary/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlsmp/5.1.1/lib -L/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlsmp/5.1.1/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlmass/9.1.1/lib -L/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlmass/9.1.1/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlf/16.1.1/lib -L/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlf/16.1.1/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/lib -Wl,-rpath,/m100/prod/opt/compilers/gnu/8.4.0/none/lib/gcc/powerpc64le-unknown-linux-gnu/8.4.0 -L/m100/prod/opt/compilers/gnu/8.4.0/none/lib/gcc/powerpc64le-unknown-linux-gnu/8.4.0 -Wl,-rpath,/m100/prod/opt/compilers/gnu/8.4.0/none/lib/gcc -L/m100/prod/opt/compilers/gnu/8.4.0/none/lib/gcc -Wl,-rpath,/m100/prod/opt/compilers/gnu/8.4.0/none/lib64 -L/m100/prod/opt/compilers/gnu/8.4.0/none/lib64 -Wl,-rpath,/m100/prod/opt/compilers/gnu/8.4.0/none/lib -L/m100/prod/opt/compilers/gnu/8.4.0/none/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlC/16.1.1/lib -L/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlC/16.1.1/lib -lHYPRE -lflapack -lfblas -lcudart -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -ldl -lmpiprofilesupport -lmpi_ibm_usempi -lmpi_ibm_mpifh -lmpi_ibm -lxlf90_r -lxlopt -lxl -lxlfmath -lgcc_s -lrt -lpthread -lm -ldl -lmpiprofilesupport -lmpi_ibm -lxlopt -lxl -libmc++ -lstdc++ -lm -lgcc_s -lpthread -ldl
-----------------------------------------


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################