0 KSP Residual norm 7.451923898937e+05 1 KSP Residual norm 4.190843782941e+04 2 KSP Residual norm 1.798945700462e+04 3 KSP Residual norm 1.223044424436e+04 4 KSP Residual norm 4.174475088375e+03 5 KSP Residual norm 2.268425004557e+03 6 KSP Residual norm 1.421063929846e+03 7 KSP Residual norm 8.696045866455e+02 8 KSP Residual norm 6.215492727051e+02 9 KSP Residual norm 3.584359100324e+02 10 KSP Residual norm 2.222233124404e+02 11 KSP Residual norm 1.453955901066e+02 12 KSP Residual norm 1.147316865075e+02 13 KSP Residual norm 6.529686801408e+01 14 KSP Residual norm 3.243905036721e+01 15 KSP Residual norm 2.576701637974e+01 16 KSP Residual norm 2.016990220700e+01 17 KSP Residual norm 1.413112299877e+01 18 KSP Residual norm 9.367231136940e+00 19 KSP Residual norm 4.844577180017e+00 20 KSP Residual norm 3.906548874809e+00 21 KSP Residual norm 2.975179464759e+00 22 KSP Residual norm 2.127389637593e+00 23 KSP Residual norm 1.031794005392e+00 24 KSP Residual norm 8.259219463880e-01 25 KSP Residual norm 6.766239785615e-01 26 KSP Residual norm 4.583503891554e-01 27 KSP Residual norm 2.462126830750e-01 28 KSP Residual norm 1.925471110482e-01 29 KSP Residual norm 9.494938545903e-02 30 KSP Residual norm 6.330021551417e-02 KSP Object: 1 MPI processes type: dgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 Adaptive strategy is used: TRUE Frequency of extracted eigenvalues = 1 using Ritz values Total number of extracted eigenvalues = 0 Maximum number of eigenvalues set to be extracted = 9 relaxation parameter for the adaptive strategy(smv) = 1. Number of matvecs : 31 maximum iterations=10000, nonzero initial guess tolerances: relative=1e-07, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: hypre HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 25 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.25 Interpolation truncation factor 0.2 Interpolation: max elements per row 0 Number of levels of aggressive coarsening 4 Number of paths for aggressive coarsening 4 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse 1 Relax down l1scaled-Jacobi Relax up l1scaled-Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 1. Outer relax weight (all) 1. Not using CF-relaxation Not using more complex smoothers. Measure type local Coarsen type Falgout Interpolation type ext+i SpGEMM type cusparse linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaijcusparse rows=1468928, cols=1468928 total: nonzeros=13176768, allocated nonzeros=0 total number of mallocs used during MatSetValues calls=0 not using I-node routines solver time 2.97172329100000043 **************************************** *********************************************************************************************************************** *** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** **************************************************************************************************************************************************************** ------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## ./test_miniapp on a named r228n03 with 1 processor, by nvarini1 Thu Nov 24 16:25:34 2022 Using Petsc Release Version 3.17.5, unknown Max Max/Min Avg Total Time (sec): 4.042e+00 1.000 4.042e+00 Objects: 4.300e+01 1.000 4.300e+01 Flops: 3.737e+09 1.000 3.737e+09 3.737e+09 Flops/sec: 9.245e+08 1.000 9.245e+08 9.245e+08 Memory (bytes): 2.960e+07 1.000 2.960e+07 2.960e+07 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 4.0423e+00 100.0% 3.7372e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage MatMult 31 1.0 3.2221e-02 1.0 7.71e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 21 0 0 0 1 21 0 0 0 23942 66984 2 1.76e+02 0 0.00e+00 100 MatConvert 2 1.0 3.9585e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyBegin 2 1.0 3.6720e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 2 1.0 4.1741e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 1 1.0 3.0390e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatView 1 1.0 9.9941e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatCUSPARSCopyTo 1 1.0 1.9756e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 1 1.64e+02 0 0.00e+00 0 VecMDot 30 1.0 1.6379e-02 1.0 1.37e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 37 0 0 0 0 37 0 0 0 83408 101899 0 0.00e+00 0 0.00e+00 100 VecNorm 32 1.0 3.4793e-03 1.0 9.40e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 27020 34285 0 0.00e+00 0 0.00e+00 100 VecScale 31 1.0 1.7788e-03 1.0 4.55e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 25600 40041 0 0.00e+00 0 0.00e+00 100 VecCopy 1 1.0 2.6919e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 1 1.18e+01 0 VecSet 33 1.0 1.7317e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 2 1.0 4.2070e-04 1.0 5.88e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 13967 37766 1 1.18e+01 0 0.00e+00 100 VecMAXPY 31 1.0 2.3615e-02 1.0 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 39 0 0 0 1 39 0 0 0 61581 63417 0 0.00e+00 0 0.00e+00 100 VecAssemblyBegin 2 1.0 1.1410e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAssemblyEnd 2 1.0 9.2100e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecNormalize 31 1.0 5.2003e-03 1.0 1.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 26270 36055 0 0.00e+00 0 0.00e+00 100 VecCUDACopyTo 3 1.0 5.7760e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 3 3.53e+01 0 0.00e+00 0 VecCUDACopyFrom 3 1.0 6.9723e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 3 3.53e+01 0 KSPSetUp 1 1.0 2.9976e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 1.9746e-01 1.0 3.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00 5100 0 0 0 5100 0 0 0 18927 69104 4 1.99e+02 1 1.18e+01 100 KSPGMRESOrthog 30 1.0 3.9150e-02 1.0 2.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 73 0 0 0 1 73 0 0 0 69790 78168 0 0.00e+00 0 0.00e+00 100 PCSetUp 1 1.0 2.7708e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 69 0 0 0 0 69 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 PCApply 32 1.0 9.8630e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 1 1.18e+01 0 0.00e+00 0 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 2 2 5882160 0. Vector 37 37 35316728 0. Krylov Solver 1 1 19136 0. Preconditioner 1 1 1528 0. Viewer 2 1 856 0. ======================================================================================================================== Average time to get PetscTime(): 3.69e-08 #PETSc Option Table entries: -ksp_initial_guess_nonzero yes -ksp_monitor -ksp_reuse_preconditioner yes -ksp_rtol 1e-7 -ksp_type dgmres -ksp_view -log_view -mat_type seqaijcusparse -pc_hypre_boomeramg_agg_nl 4 -pc_hypre_boomeramg_agg_num_paths 4 -pc_hypre_boomeramg_coarsen_type Falgout -pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_no_CF false -pc_hypre_boomeramg_strong_threshold 0.25 -pc_hypre_boomeramg_truncfactor 0.2 -pc_hypre_type boomeramg -pc_type hypre -vec_type cuda #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/m100_work/FUAC6_GBS2N/petsc-install-3.17 --with-cxx=mpixlC --with-cc=mpixlc --with-fc=mpixlf --with-cuda=1 --download-hypre --download-hypre-configure-arguments=--enable-unified-memory --download-fblaslapack=1 ----------------------------------------- Libraries compiled on 2022-11-22 15:09:44 on login03 Machine characteristics: Linux-4.18.0-147.51.2.el8_1.ppc64le-ppc64le-with-redhat-8.1-Ootpa Using PETSc directory: /m100_work/FUAC6_GBS2N/petsc-install-3.17 Using PETSc arch: ----------------------------------------- Using C compiler: mpixlc -qPIC -g -O0 Using Fortran compiler: mpixlf -qPIC -g -O0 ----------------------------------------- Using include paths: -I/m100_work/FUAC6_GBS2N/petsc-install-3.17/include -I/cineca/prod/opt/compilers/cuda/11.0/none/include ----------------------------------------- Using C linker: mpixlc Using Fortran linker: mpixlf Using libraries: -Wl,-rpath,/m100_work/FUAC6_GBS2N/petsc-install-3.17/lib -L/m100_work/FUAC6_GBS2N/petsc-install-3.17/lib -lpetsc -Wl,-rpath,/m100_work/FUAC6_GBS2N/petsc-install-3.17/lib -L/m100_work/FUAC6_GBS2N/petsc-install-3.17/lib -Wl,-rpath,/cineca/prod/opt/compilers/cuda/11.0/none/lib64 -L/cineca/prod/opt/compilers/cuda/11.0/none/lib64 -L/cineca/prod/opt/compilers/cuda/11.0/none/lib64/stubs -Wl,-rpath,/cineca/prod/opt/compilers/spectrum_mpi/10.3.1/binary/lib -L/cineca/prod/opt/compilers/spectrum_mpi/10.3.1/binary/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlsmp/5.1.1/lib -L/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlsmp/5.1.1/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlmass/9.1.1/lib -L/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlmass/9.1.1/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlf/16.1.1/lib -L/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlf/16.1.1/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/lib -Wl,-rpath,/m100/prod/opt/compilers/gnu/8.4.0/none/lib/gcc/powerpc64le-unknown-linux-gnu/8.4.0 -L/m100/prod/opt/compilers/gnu/8.4.0/none/lib/gcc/powerpc64le-unknown-linux-gnu/8.4.0 -Wl,-rpath,/m100/prod/opt/compilers/gnu/8.4.0/none/lib/gcc -L/m100/prod/opt/compilers/gnu/8.4.0/none/lib/gcc -Wl,-rpath,/m100/prod/opt/compilers/gnu/8.4.0/none/lib64 -L/m100/prod/opt/compilers/gnu/8.4.0/none/lib64 -Wl,-rpath,/m100/prod/opt/compilers/gnu/8.4.0/none/lib -L/m100/prod/opt/compilers/gnu/8.4.0/none/lib -Wl,-rpath,/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlC/16.1.1/lib -L/m100/prod/opt/compilers/xl/16.1.1_sp4.1/binary/xlC/16.1.1/lib -lHYPRE -lflapack -lfblas -lcudart -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lX11 -ldl -lmpiprofilesupport -lmpi_ibm_usempi -lmpi_ibm_mpifh -lmpi_ibm -lxlf90_r -lxlopt -lxl -lxlfmath -lgcc_s -lrt -lpthread -lm -ldl -lmpiprofilesupport -lmpi_ibm -lxlopt -lxl -libmc++ -lstdc++ -lm -lgcc_s -lpthread -ldl ----------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ##########################################################