0 KSP Residual norm 1.378059190390e+03 1 KSP Residual norm 9.646379240534e+01 2 KSP Residual norm 3.684931844720e+00 3 KSP Residual norm 7.575104942051e-02 4 KSP Residual norm 1.089997831013e-03 Residual norm 1.08339e-06 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /home/kchockalingam/tools/petsc-3.15.3/src_gpu/ksp/ksp/tutorials/./ex45 on a named glados.dl.ac.uk with 1 processor, by kchockalingam Thu Nov 18 14:53:06 2021 Using Petsc Release Version 3.15.3, Aug 06, 2021 Max Max/Min Avg Total Time (sec): 1.335e+01 1.000 1.335e+01 Objects: 2.600e+01 1.000 2.600e+01 Flop: 1.091e+08 1.000 1.091e+08 1.091e+08 Flop/sec: 8.171e+06 1.000 8.171e+06 8.171e+06 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.3346e+01 100.0% 1.0905e+08 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 1 1.0 4.3973e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 1 1.0 5.1856e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatMult 5 1.0 1.1157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 5 8.39e+01 0 MatAssemblyBegin 1 1.0 8.8774e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 1 1.0 6.7200e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSetUp 1 1.0 1.4946e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 4.0358e+00 1.0 1.01e+08 1.0 0.0e+00 0.0e+00 0.0e+00 30 92 0 0 0 30 92 0 0 0 25 31884 21 1.68e+02 21 1.34e+02 100 DMCreateMat 1 1.0 2.3935e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFSetGraph 1 1.0 1.0941e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecTDot 8 1.0 7.9997e-03 1.0 3.36e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 31 0 0 0 0 31 0 0 0 4194 35351 5 8.39e+01 8 6.40e-05 100 VecNorm 6 1.0 7.9422e-03 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 23 0 0 0 0 23 0 0 0 3169 29324 5 8.39e+01 6 4.80e-05 100 VecCopy 2 1.0 2.0379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecSet 12 1.0 8.2206e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 9 1.0 3.6628e-03 1.0 3.77e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 35 0 0 0 0 35 0 0 0 10306 43814 11 3.36e+01 0 0.00e+00 100 VecAYPX 3 1.0 4.6369e-04 1.0 1.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 12 0 0 0 0 12 0 0 0 27137 27244 3 2.40e-05 0 0.00e+00 100 VecCUDACopyTo 12 1.0 1.6889e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 12 2.01e+02 0 0.00e+00 0 VecCUDACopyFrom 9 1.0 1.1696e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 9 1.51e+02 0 PCSetUp 1 1.0 7.4422e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 56 0 0 0 0 56 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 PCApply 5 1.0 3.9271e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 29 0 0 0 0 29 0 0 0 0 0 0 0 0.00e+00 4 6.71e+01 0 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Krylov Solver 1 1 1672 0. DMKSP interface 1 1 664 0. Matrix 1 1 2936 0. Distributed Mesh 1 1 5560 0. Index Set 2 2 8390416 0. IS L to G Mapping 1 1 8389288 0. Star Forest Graph 3 3 3672 0. Discrete System 1 1 904 0. Weak Form 1 1 824 0. Vector 11 11 100682024 0. Preconditioner 1 1 1496 0. Viewer 2 1 848 0. ======================================================================================================================== Average time to get PetscTime(): 2.38419e-08 #PETSc Option Table entries: -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -dm_mat_type hypre -dm_vec_type cuda -ksp_monitor -ksp_type cg -log_view -pc_hypre_type boomeramg -pc_type hypre #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --package-prefix-hash=/home/kchockalingam/petsc-hash-pkgs --with-make-test-np=2 COPTFLAGS="-g -O3 -fno-omit-frame-pointer" FOPTFLAGS="-g -O3 -fno-omit-frame-pointer" CXXOPTFLAGS="-g -O3 -fno-omit-frame-pointer" --with-cuda=1 --with-cuda-arch=70 --with-blaslapack=1 --with-cuda-dir=/apps/packages/cuda/10.1/ --with-mpi-dir=/apps/packages/gcc/7.3.0/openmpi/3.1.2 --download-hypre=1 --download-hypre-configure-arguments=--enable-gpu-profiling=yes,--enable-cusparse=yes,--enable-cublas=yes,--enable-curand=yes,HYPRE_CUDA_SM=70 --with-debugging=no PETSC_ARCH=arch-ci-linux-cuda11-hypre-double ----------------------------------------- Libraries compiled on 2021-11-18 14:19:41 on glados.dl.ac.uk Machine characteristics: Linux-4.18.0-193.6.3.el8_2.x86_64-x86_64-with-centos-8.2.2004-Core Using PETSc directory: /home/kchockalingam/tools/petsc-3.15.3 Using PETSc arch: ----------------------------------------- Using C compiler: /apps/packages/gcc/7.3.0/openmpi/3.1.2/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -fno-omit-frame-pointer Using Fortran compiler: /apps/packages/gcc/7.3.0/openmpi/3.1.2/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -fno-omit-frame-pointer ----------------------------------------- Using include paths: -I/home/kchockalingam/tools/petsc-3.15.3/include -I/home/kchockalingam/tools/petsc-3.15.3/arch-ci-linux-cuda11-hypre-double/include -I/home/kchockalingam/petsc-hash-pkgs/194329/include -I/apps/packages/gcc/7.3.0/openmpi/3.1.2/include -I/apps/packages/cuda/10.1/include ----------------------------------------- Using C linker: /apps/packages/gcc/7.3.0/openmpi/3.1.2/bin/mpicc Using Fortran linker: /apps/packages/gcc/7.3.0/openmpi/3.1.2/bin/mpif90 Using libraries: -Wl,-rpath,/home/kchockalingam/tools/petsc-3.15.3/lib -L/home/kchockalingam/tools/petsc-3.15.3/lib -lpetsc -Wl,-rpath,/home/kchockalingam/petsc-hash-pkgs/194329/lib -L/home/kchockalingam/petsc-hash-pkgs/194329/lib -Wl,-rpath,/apps/packages/cuda/10.1/lib64 -L/apps/packages/cuda/10.1/lib64 -Wl,-rpath,/apps/packages/gcc/7.3.0/openmpi/3.1.2/lib -L/apps/packages/gcc/7.3.0/openmpi/3.1.2/lib -Wl,-rpath,/apps/packages/compilers/gcc/7.3.0/lib/gcc/x86_64-pc-linux-gnu/7.3.0 -L/apps/packages/compilers/gcc/7.3.0/lib/gcc/x86_64-pc-linux-gnu/7.3.0 -Wl,-rpath,/apps/packages/compilers/gcc/7.3.0/lib64 -L/apps/packages/compilers/gcc/7.3.0/lib64 -Wl,-rpath,/apps/packages/compilers/gcc/7.3.0/lib -L/apps/packages/compilers/gcc/7.3.0/lib -lHYPRE -llapack -lblas -lcufft -lcublas -lcudart -lcusparse -lcusolver -lcurand -lX11 -lstdc++ -ldl -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lutil -lrt -lz -lgfortran -lm -lgfortran -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl -----------------------------------------