0 KSP Residual norm 1.658481321480e+03 1 KSP Residual norm 3.270999311989e+02 2 KSP Residual norm 3.129531485499e+01 3 KSP Residual norm 2.351754477084e+00 4 KSP Residual norm 1.898053977239e-01 5 KSP Residual norm 1.611209883991e-02 Residual norm 0.000135673 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## ./ex45 on a named glados.dl.ac.uk with 2 processors, by kchockalingam Fri Oct 8 11:37:51 2021 Using Petsc Release Version 3.15.3, Aug 06, 2021 Max Max/Min Avg Total Time (sec): 4.016e+01 1.000 4.016e+01 Objects: 4.600e+01 1.000 4.600e+01 Flop: 4.165e+08 1.012 4.141e+08 8.282e+08 Flop/sec: 1.037e+07 1.012 1.031e+07 2.062e+07 Memory: 3.595e+08 1.011 3.576e+08 7.151e+08 MPI Messages: 8.000e+00 1.000 8.000e+00 1.600e+01 MPI Message Lengths: 1.485e+06 1.000 1.856e+05 2.970e+06 MPI Reductions: 4.720e+02 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 4.0163e+01 100.0% 8.2816e+08 100.0% 1.600e+01 100.0% 1.856e+05 100.0% 4.530e+02 96.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 4 1.0 5.4248e-05 1.1 0.00e+00 0.0 2.0e+00 4.0e+00 8.0e+00 0 0 12 0 2 0 0 12 0 2 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 3 1.0 6.4760e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 MatMult 6 1.0 1.0370e-01 1.0 1.88e+08 1.0 1.6e+01 1.9e+05 2.0e+00 0 45100100 0 0 45100100 0 3611 112547 2 2.12e+02 0 0.00e+00 100 MatConvert 1 1.0 2.5040e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.2e+01 6 0 0 0 11 6 0 0 0 11 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyBegin 3 1.0 7.6855e-02308.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 3 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 3 1.0 1.2646e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.5e+01 3 0 0 0 10 3 0 0 0 10 0 0 0 0.00e+00 0 0.00e+00 0 MatCUSPARSCopyTo 2 1.0 3.7301e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 2.12e+02 0 0.00e+00 0 KSPSetUp 1 1.0 3.5023e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+01 0 0 0 0 8 0 0 0 0 9 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 6.2424e+00 1.0 3.75e+08 1.0 1.4e+01 1.8e+05 1.9e+02 16 90 88 85 40 16 90 88 85 42 120 54599 21 3.27e+02 16 9.65e+01 100 KSPGMRESOrthog 5 1.0 2.7125e-02 1.4 1.46e+08 1.0 0.0e+00 0.0e+00 4.5e+01 0 35 0 0 10 0 35 0 0 10 10677 95449 10 9.65e+01 5 1.33e-02 100 DMCreateMat 1 1.0 2.3494e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.5e+01 6 0 0 0 14 6 0 0 0 14 0 0 0 0.00e+00 0 0.00e+00 0 SFSetGraph 2 1.0 2.4669e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFSetUp 1 1.0 6.8134e-04 1.0 0.00e+00 0.0 4.0e+00 5.7e+04 2.0e+00 0 0 25 8 0 0 0 25 8 0 0 0 0 0.00e+00 0 0.00e+00 0 SFPack 6 1.0 4.9323e-06 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFUnpack 6 1.0 3.7476e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecMDot 5 1.0 1.7741e-02 1.0 7.28e+07 1.0 0.0e+00 0.0e+00 1.0e+01 0 17 0 0 2 0 17 0 0 2 8162 95721 5 9.65e+01 5 1.33e-02 100 VecNorm 7 1.0 4.1496e-03 1.5 3.40e+07 1.0 0.0e+00 0.0e+00 1.4e+01 0 8 0 0 3 0 8 0 0 3 16285 80156 1 1.93e+01 7 5.60e-05 100 VecScale 6 1.0 4.4610e-04 1.0 1.46e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 64920 66623 6 4.80e-05 0 0.00e+00 100 VecCopy 1 1.0 2.4004e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecSet 27 1.0 6.9875e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 2 1.0 3.3899e-03 1.0 9.71e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 5696 87237 3 1.93e+01 0 0.00e+00 100 VecMAXPY 6 1.0 2.0297e-03 1.0 9.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 23 0 0 0 0 23 0 0 0 95122 95527 6 1.60e-04 0 0.00e+00 100 VecScatterBegin 6 1.0 5.7104e-02 1.0 0.00e+00 0.0 1.6e+01 1.9e+05 2.0e+00 0 0100100 0 0 0100100 0 0 0 0 0.00e+00 0 0.00e+00 0 VecScatterEnd 6 1.0 6.1680e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecNormalize 6 1.0 4.4475e-03 1.5 4.37e+07 1.0 0.0e+00 0.0e+00 1.2e+01 0 10 0 0 3 0 10 0 0 3 19535 75736 7 1.93e+01 6 4.80e-05 100 VecCUDACopyTo 7 1.0 2.0760e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 7 1.35e+02 0 0.00e+00 0 VecCUDACopyFrom 5 1.0 1.4753e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 5 9.65e+01 0 PCSetUp 1 1.0 3.0190e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.6e+01 75 0 0 0 12 75 0 0 0 12 0 0 0 0.00e+00 0 0.00e+00 0 PCApply 6 1.0 6.0335e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 15 0 0 0 1 15 0 0 0 1 0 0 0 0.00e+00 5 9.65e+01 0 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Krylov Solver 1 1 19916 0. DMKSP interface 1 1 664 0. Matrix 5 5 300130304 0. Distributed Mesh 1 1 5560 0. Index Set 4 4 9942836 0. IS L to G Mapping 1 1 9825664 0. Star Forest Graph 4 4 4896 0. Discrete System 1 1 904 0. Weak Form 1 1 824 0. Vector 24 24 349857688 0. Preconditioner 1 1 1496 0. Viewer 2 1 848 0. ======================================================================================================================== Average time to get PetscTime(): 2.6077e-08 Average time for MPI_Barrier(): 9.49204e-07 Average time for zero size MPI_Send(): 5.80028e-06 #PETSc Option Table entries: -da_grid_x 169 -da_grid_y 169 -da_grid_z 169 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_gmres_restart 31 -ksp_monitor -ksp_type gmres -log_view -pc_hypre_boomeramg_strong_threshold 0.7 -pc_hypre_type boomeramg -pc_type hypre #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --package-prefix-hash=/home/kchockalingam/petsc-hash-pkgs --with-make-test-np=2 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-blaslapack=1 --download-hypre=1 --with-cuda-dir=/apps/packages/cuda/10.1/ --with-mpi-dir=/apps/packages/gcc/7.3.0/openmpi/3.1.2 PETSC_ARCH=arch-ci-linux-cuda11-double ----------------------------------------- Libraries compiled on 2021-10-05 14:38:14 on glados.dl.ac.uk Machine characteristics: Linux-4.18.0-193.6.3.el8_2.x86_64-x86_64-with-centos-8.2.2004-Core Using PETSc directory: /home/kchockalingam/tools/petsc-3.15.3 Using PETSc arch: ----------------------------------------- Using C compiler: /apps/packages/gcc/7.3.0/openmpi/3.1.2/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O Using Fortran compiler: /apps/packages/gcc/7.3.0/openmpi/3.1.2/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ----------------------------------------- Using include paths: -I/home/kchockalingam/tools/petsc-3.15.3/include -I/home/kchockalingam/tools/petsc-3.15.3/arch-ci-linux-cuda11-double/include -I/home/kchockalingam/petsc-hash-pkgs/d71384/include -I/apps/packages/gcc/7.3.0/openmpi/3.1.2/include -I/apps/packages/cuda/10.1/include ----------------------------------------- Using C linker: /apps/packages/gcc/7.3.0/openmpi/3.1.2/bin/mpicc Using Fortran linker: /apps/packages/gcc/7.3.0/openmpi/3.1.2/bin/mpif90 Using libraries: -Wl,-rpath,/home/kchockalingam/tools/petsc-3.15.3/lib -L/home/kchockalingam/tools/petsc-3.15.3/lib -lpetsc -Wl,-rpath,/home/kchockalingam/petsc-hash-pkgs/d71384/lib -L/home/kchockalingam/petsc-hash-pkgs/d71384/lib -Wl,-rpath,/apps/packages/cuda/10.1/lib64 -L/apps/packages/cuda/10.1/lib64 -Wl,-rpath,/apps/packages/gcc/7.3.0/openmpi/3.1.2/lib -L/apps/packages/gcc/7.3.0/openmpi/3.1.2/lib -Wl,-rpath,/apps/packages/compilers/gcc/7.3.0/lib/gcc/x86_64-pc-linux-gnu/7.3.0 -L/apps/packages/compilers/gcc/7.3.0/lib/gcc/x86_64-pc-linux-gnu/7.3.0 -Wl,-rpath,/apps/packages/compilers/gcc/7.3.0/lib64 -L/apps/packages/compilers/gcc/7.3.0/lib64 -Wl,-rpath,/apps/packages/compilers/gcc/7.3.0/lib -L/apps/packages/compilers/gcc/7.3.0/lib -lHYPRE -llapack -lblas -lcufft -lcublas -lcudart -lcusparse -lcusolver -lcurand -lX11 -lstdc++ -ldl -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -lgfortran -lm -lutil -lrt -lz -lgfortran -lm -lgfortran -lgcc_s -lquadmath -lpthread -lquadmath -lstdc++ -ldl ----------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ##########################################################