[1637602133.843999] [sqg2e4:18844:0] mxm.c:196 MXM WARN The 'ulimit -s' on the system is set to 'unlimited'. This may have negative performance implications. Please set the stack size to the default value (10240) 0 KSP Residual norm 6.736906443113e+00 1 KSP Residual norm 3.924488666810e-01 2 KSP Residual norm 3.573236154366e-02 3 KSP Residual norm 5.628368310285e-03 4 KSP Residual norm 9.795224289872e-04 5 KSP Residual norm 9.239787115081e-05 KSP Object: 1 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=0.000138889, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: hypre HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 25 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.25 Interpolation truncation factor 0. Interpolation: max elements per row 0 Number of levels of aggressive coarsening 0 Number of paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse 1 Relax down l1scaled-Jacobi Relax up l1scaled-Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 1. Outer relax weight (all) 1. Not using CF-relaxation Not using more complex smoothers. Measure type local Coarsen type PMIS Interpolation type ext+i linear system matrix = precond matrix: Mat Object: 1 MPI processes type: hypre rows=56, cols=56 Norm of error 0.000122452 iterations 5 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## ########################################################## # # # WARNING!!! # # # # This code was compiled with GPU support and you've # # created PETSc/GPU objects, but you intentionally used # # -use_gpu_aware_mpi 0, such that PETSc had to copy data # # from GPU to CPU for communication. To get meaningfull # # timing results, please use GPU-aware MPI instead. # ########################################################## ./ex4 on a arch-linux2-c-debug named sqg2e4.bullx with 1 processor, by kxc07-lxm25 Mon Nov 22 17:28:55 2021 Using Petsc Development GIT revision: v3.16.1-353-g887dddf386 GIT Date: 2021-11-19 20:24:41 +0000 Max Max/Min Avg Total Time (sec): 3.084e-01 1.000 3.084e-01 Objects: 1.100e+01 1.000 1.100e+01 Flop: 3.567e+03 1.000 3.567e+03 3.567e+03 Flop/sec: 1.157e+04 1.000 1.157e+04 1.157e+04 Memory: 2.483e+05 1.000 2.483e+05 2.483e+05 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 3.0838e-01 100.0% 3.5670e+03 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 1 1.0 1.0332e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 1 1.0 2.3869e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatMult 6 1.0 2.5697e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 83 0 0 0 0 83 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyBegin 1 1.0 5.2592e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 1 1.0 6.1558e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatView 1 1.0 4.9629e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecTDot 10 1.0 3.6972e-04 1.0 1.11e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 31 0 0 0 0 31 0 0 0 3 5 0 0.00e+00 0 0.00e+00 100 VecNorm 7 1.0 4.3831e-04 1.0 7.77e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 22 0 0 0 0 22 0 0 0 2 2 0 0.00e+00 0 0.00e+00 100 VecCopy 2 1.0 4.9198e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecSet 8 1.0 2.1870e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 11 1.0 2.0920e-04 1.0 1.23e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 35 0 0 0 0 35 0 0 0 6 18 0 0.00e+00 0 0.00e+00 100 VecAYPX 4 1.0 8.7857e-05 1.0 4.48e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 13 0 0 0 0 13 0 0 0 5 11 0 0.00e+00 0 0.00e+00 100 KSPSetUp 1 1.0 1.8801e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 2.6545e-03 1.0 3.34e+03 1.0 0.0e+00 0.0e+00 0.0e+00 1 94 0 0 0 1 94 0 0 0 1 5 0 0.00e+00 0 0.00e+00 100 PCSetUp 1 1.0 4.3471e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 PCApply 6 1.0 1.1224e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 1 1 3008 0. Vector 6 6 9984 0. Krylov Solver 1 1 1672 0. Preconditioner 1 1 1512 0. Viewer 2 1 848 0. ======================================================================================================================== Average time to get PetscTime(): 2.99e-08 #PETSc Option Table entries: -ksp_monitor -ksp_type cg -ksp_view -log_view -mat_type hypre -pc_type hypre -use_gpu_aware_mpi 0 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-blaslapack-dir=/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl --with-cuda=1 --with-cuda-arch=70 --download-hypre=yes --download-hypre-configure-arguments=HYPRE_CUDA_SM=70 --download-hypre-commit=origin/hypre_petsc --with-shared-libraries=1 --known-mpi-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx -with-fc=mpif90 ----------------------------------------- Libraries compiled on 2021-11-22 17:18:17 on hcxlogin2 Machine characteristics: Linux-3.10.0-1127.el7.x86_64-x86_64-with-redhat-7.8-Maipo Using PETSc directory: /lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc Using PETSc arch: arch-linux2-c-debug ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g3 -O0 Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O0 ----------------------------------------- Using include paths: -I/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/include -I/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/include -I/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl/include -I/lustre/scafellpike/local/apps/cuda/11.2/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib -L/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib -L/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib -Wl,-rpath,/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl/lib/intel64 -L/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl/lib/intel64 -Wl,-rpath,/lustre/scafellpike/local/apps/cuda/11.2/lib64 -L/lustre/scafellpike/local/apps/cuda/11.2/lib64 -L/lustre/scafellpike/local/apps/cuda/11.2/lib64/stubs -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/openmpi/4.0.4-cuda11.2/lib -L/lustre/scafellpike/local/apps/gcc7/openmpi/4.0.4-cuda11.2/lib -Wl,-rpath,/opt/lsf/10.1/linux3.10-glibc2.17-x86_64/lib -L/opt/lsf/10.1/linux3.10-glibc2.17-x86_64/lib -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0 -L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0 -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc -L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib64 -L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib64 -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib -L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib -lHYPRE -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lcudart -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -lstdc++ -ldl -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl ----------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with GPU support and you've # # created PETSc/GPU objects, but you intentionally used # # -use_gpu_aware_mpi 0, such that PETSc had to copy data # # from GPU to CPU for communication. To get meaningfull # # timing results, please use GPU-aware MPI instead. # ########################################################## ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ##########################################################