[1637602133.843999] [sqg2e4:18844:0]         mxm.c:196  MXM  WARN  The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) 
  0 KSP Residual norm 6.736906443113e+00 
  1 KSP Residual norm 3.924488666810e-01 
  2 KSP Residual norm 3.573236154366e-02 
  3 KSP Residual norm 5.628368310285e-03 
  4 KSP Residual norm 9.795224289872e-04 
  5 KSP Residual norm 9.239787115081e-05 
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=0.000138889, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: hypre
    HYPRE BoomerAMG preconditioning
      Cycle type V
      Maximum number of levels 25
      Maximum number of iterations PER hypre call 1
      Convergence tolerance PER hypre call 0.
      Threshold for strong coupling 0.25
      Interpolation truncation factor 0.
      Interpolation: max elements per row 0
      Number of levels of aggressive coarsening 0
      Number of paths for aggressive coarsening 1
      Maximum row sums 0.9
      Sweeps down         1
      Sweeps up           1
      Sweeps on coarse    1
      Relax down          l1scaled-Jacobi
      Relax up            l1scaled-Jacobi
      Relax on coarse     Gaussian-elimination
      Relax weight  (all)      1.
      Outer relax weight (all) 1.
      Not using CF-relaxation
      Not using more complex smoothers.
      Measure type        local
      Coarsen type        PMIS
      Interpolation type  ext+i
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: hypre
    rows=56, cols=56
Norm of error 0.000122452 iterations 5
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      # This code was compiled with GPU support and you've     #
      # created PETSc/GPU objects, but you intentionally used  #
      # -use_gpu_aware_mpi 0, such that PETSc had to copy data #
      # from GPU to CPU for communication. To get meaningfull  #
      # timing results, please use GPU-aware MPI instead.      #
      ##########################################################


./ex4 on a arch-linux2-c-debug named sqg2e4.bullx with 1 processor, by kxc07-lxm25 Mon Nov 22 17:28:55 2021
Using Petsc Development GIT revision: v3.16.1-353-g887dddf386  GIT Date: 2021-11-19 20:24:41 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           3.084e-01     1.000   3.084e-01
Objects:              1.100e+01     1.000   1.100e+01
Flop:                 3.567e+03     1.000   3.567e+03  3.567e+03
Flop/sec:             1.157e+04     1.000   1.157e+04  1.157e+04
Memory:               2.483e+05     1.000   2.483e+05  2.483e+05
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 3.0838e-01 100.0%  3.5670e+03 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          1 1.0 1.0332e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSidedF         1 1.0 2.3869e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatMult                6 1.0 2.5697e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 83  0  0  0  0  83  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyBegin       1 1.0 5.2592e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd         1 1.0 6.1558e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatView                1 1.0 4.9629e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecTDot               10 1.0 3.6972e-04 1.0 1.11e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0 31  0  0  0   0 31  0  0  0     3       5      0 0.00e+00    0 0.00e+00 100
VecNorm                7 1.0 4.3831e-04 1.0 7.77e+02 1.0 0.0e+00 0.0e+00 0.0e+00  0 22  0  0  0   0 22  0  0  0     2       2      0 0.00e+00    0 0.00e+00 100
VecCopy                2 1.0 4.9198e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                 8 1.0 2.1870e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY               11 1.0 2.0920e-04 1.0 1.23e+03 1.0 0.0e+00 0.0e+00 0.0e+00  0 35  0  0  0   0 35  0  0  0     6      18      0 0.00e+00    0 0.00e+00 100
VecAYPX                4 1.0 8.7857e-05 1.0 4.48e+02 1.0 0.0e+00 0.0e+00 0.0e+00  0 13  0  0  0   0 13  0  0  0     5      11      0 0.00e+00    0 0.00e+00 100
KSPSetUp               1 1.0 1.8801e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               1 1.0 2.6545e-03 1.0 3.34e+03 1.0 0.0e+00 0.0e+00 0.0e+00  1 94  0  0  0   1 94  0  0  0     1       5      0 0.00e+00    0 0.00e+00 100
PCSetUp                1 1.0 4.3471e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  14  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply                6 1.0 1.1224e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     1              1         3008     0.
              Vector     6              6         9984     0.
       Krylov Solver     1              1         1672     0.
      Preconditioner     1              1         1512     0.
              Viewer     2              1          848     0.
========================================================================================================================
Average time to get PetscTime(): 2.99e-08
#PETSc Option Table entries:
-ksp_monitor
-ksp_type cg
-ksp_view
-log_view
-mat_type hypre
-pc_type hypre
-use_gpu_aware_mpi 0
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-blaslapack-dir=/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl --with-cuda=1 --with-cuda-arch=70 --download-hypre=yes --download-hypre-configure-arguments=HYPRE_CUDA_SM=70 --download-hypre-commit=origin/hypre_petsc --with-shared-libraries=1 --known-mpi-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx -with-fc=mpif90
-----------------------------------------
Libraries compiled on 2021-11-22 17:18:17 on hcxlogin2 
Machine characteristics: Linux-3.10.0-1127.el7.x86_64-x86_64-with-redhat-7.8-Maipo
Using PETSc directory: /lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g3 -O0   
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O0     
-----------------------------------------

Using include paths: -I/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/include -I/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/include -I/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl/include -I/lustre/scafellpike/local/apps/cuda/11.2/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib -L/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib -L/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl/lib/intel64 -L/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl/lib/intel64 -Wl,-rpath,/lustre/scafellpike/local/apps/cuda/11.2/lib64 -L/lustre/scafellpike/local/apps/cuda/11.2/lib64 -L/lustre/scafellpike/local/apps/cuda/11.2/lib64/stubs -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/openmpi/4.0.4-cuda11.2/lib -L/lustre/scafellpike/local/apps/gcc7/openmpi/4.0.4-cuda11.2/lib -Wl,-rpath,/opt/lsf/10.1/linux3.10-glibc2.17-x86_64/lib -L/opt/lsf/10.1/linux3.10-glibc2.17-x86_64/lib -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0 -L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0 -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc -L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib64 -L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib64 -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib -L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib -lHYPRE -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lcudart -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lstdc++ -ldl -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      # This code was compiled with GPU support and you've     #
      # created PETSc/GPU objects, but you intentionally used  #
      # -use_gpu_aware_mpi 0, such that PETSc had to copy data #
      # from GPU to CPU for communication. To get meaningfull  #
      # timing results, please use GPU-aware MPI instead.      #
      ##########################################################


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################