------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ _ on a named host1 with 1 process and CUDA architecture 80, by penazzi on Thu Jan 22 15:38:52 2026 Using PETSc Release Version 3.23.3, May 30, 2025 Max Max/Min Avg Total Time (sec): 2.294e+01 1.000 2.294e+01 Objects: 0.000e+00 0.000 0.000e+00 Flops: 4.171e+10 1.000 4.171e+10 4.171e+10 Flops/sec: 1.818e+09 1.000 1.818e+09 1.818e+09 MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 2.2942e+01 100.0% 4.1708e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 1e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 1e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 1e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 1e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage cuBLAS Init 1 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 DCtxCreate 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 DCtxSetUp 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 DCtxSetDevice 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 DCtxSync 4 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatMult 546 1.0 n/a n/a 1.42e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 MatSolve 548 1.0 n/a n/a 1.42e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 n/a n/a 2 3.20e+01 0 0.00e+00 100 MatCholFctrNum 2 1.0 n/a n/a 1.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 MatICCFactorSym 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatAssemblyBegin 4 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 4 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatCUSPARSCopyTo 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 2 3.52e+02 0 0.00e+00 0 MatSetPreallCOO 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 n/a n/a 2 3.52e+02 0 0.00e+00 0 MatSetValuesCOO 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 KSPSetUp 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 KSPSolve 2 1.0 1.5246e+00 1.0 4.16e+10 1.0 0.0e+00 0.0e+00 0.0e+00 7 100 0 0 0 7 100 0 0 0 27262 n/a 4 6.40e+01 0 0.00e+00 100 PCSetUp 2 1.0 n/a n/a 1.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 PCApply 548 1.0 n/a n/a 1.42e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 n/a n/a 2 3.20e+01 0 0.00e+00 100 VecTDot 1096 1.0 n/a n/a 4.38e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 n/a n/a 2 3.20e+01 0 0.00e+00 100 VecNorm 548 1.0 n/a n/a 2.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 5 0 0 0 6 5 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 VecCopy 6 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 VecSet 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 VecAXPY 1092 1.0 n/a n/a 4.37e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 VecAYPX 544 1.0 n/a n/a 2.18e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 VecCUDACopyTo 4 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 4 6.40e+01 0 0.00e+00 0 VecCUDACopyFrom 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 2 3.20e+01 0 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 7 0 PetscDeviceContext 2 0 Viewer 1 0 Krylov Solver 2 0 Matrix 4 0 Preconditioner 2 0 Vector 10 0 Index Set 4 4 ======================================================================================================================== Average time to get PetscTime(): 2.5006e-08 #PETSc Option Table entries: -error_output_none # (source: command line) -use_gpu_aware_mpi 0 # (source: command line) #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/dev/p/b/petsca1b0d185983c8/b/petsc_installation --with-petsc-arch=arch-curr --with-scalar-type=real --with-fc=mpiifort --with-single-library=1 --with-shared-libraries=1 --with-mpi-compilers=1 --with-mpi=1 --with-metis=1 --with-metis-include="[/dev/p/metisf9b87aca90b59/p/include]" --with-metis-lib="-L"/dev/p/metisf9b87aca90b59/p/lib" -lmetis -lm" --with-mumps=1 --with-mumps-include="[/dev/p/mumpsf03c1787b5a9d/p/include]" --with-mumps-lib="-L"/dev/p/mumpsf03c1787b5a9d/p/lib" -lmumps" --with-valgrind=0 --with-clanguage=c++ --with-x=0 --with-ssl=0 --with-blaslapack-lib="-L"/dev/p/mkl9d753c16edb92/p/lib/intel64" -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lm -ldl -L"/dev/p/iomp3a211aa45745e/p/lib" -liomp5" --with-scalapack=1 --with-scalapack-include="[/dev/p/mkl9d753c16edb92/p/include,/dev/p/mkl9d753c16edb92/p/include]" --with-scalapack-lib="-L"/dev/p/mkl9d753c16edb92/p/lib/intel64" -lmkl_scalapack_lp64 -L"/dev/p/mkl9d753c16edb92/p/lib/intel64" -lmkl_blacs_intelmpi_lp64" --with-petsc4py=0 --with-debugging=0 --with-openmp=0 --with-visibility=1 --with-fortran-bindings=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-cuda=1 --with-cuda-arch=70,80,90 --with-cc=mpicc -cc=/opt/ccache/compilers/gcc --with-cxx=mpicxx -cxx=/opt/ccache/compilers/g++ --with-shared-ld=ld --with-ar=ar ----------------------------------------- Libraries compiled on 2025-10-30 19:31:13 on 0b4c776bc739 Machine characteristics: Linux-5.15.0-152-generic-x86_64-with-glibc2.28 Using PETSc directory: /dev/p/b/petsca1b0d185983c8/b/petsc_installation Using PETSc arch: ----------------------------------------- Using C compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-psabi -fstack-protector -fvisibility=hidden -O3 -std=gnu++20 -fPIC Using Fortran compiler: mpiifort -fPIC -O3 ----------------------------------------- Using include paths: -I/dev/p/b/petsca1b0d185983c8/b/petsc_installation/include -I/dev/p/mumpsf03c1787b5a9d/p/include -I/dev/p/metisf9b87aca90b59/p/include -I/opt/cuda/cuda-12.9/include ----------------------------------------- Using C linker: mpicxx Using Fortran linker: mpiifort Using libraries: -Wl,-rpath,/dev/p/b/petsca1b0d185983c8/b/petsc_installation/lib -L/dev/p/b/petsca1b0d185983c8/b/petsc_installation/lib -lpetsc -Wl,-rpath,"/dev/p/mumpsf03c1787b5a9d/p/lib" -L"/dev/p/mumpsf03c1787b5a9d/p/lib" -Wl,-rpath,"/dev/p/mkl9d753c16edb92/p/lib/intel64" -L"/dev/p/mkl9d753c16edb92/p/lib/intel64" -Wl,-rpath,"/dev/p/iomp3a211aa45745e/p/lib" -L"/dev/p/iomp3a211aa45745e/p/lib" -Wl,-rpath,"/dev/p/metisf9b87aca90b59/p/lib" -L"/dev/p/metisf9b87aca90b59/p/lib" -Wl,-rpath,/opt/cuda/cuda-12.9/lib64 -L/opt/cuda/cuda-12.9/lib64 -Wl,-rpath,/opt/intel/compiler/2023.0.0/linux/compiler/lib/intel64_lin -L/opt/intel/compiler/2023.0.0/linux/compiler/lib/intel64_lin -Wl,-rpath,/opt/gcc/lib/gcc/x86_64-redhat-linux/13.3.0 -L/opt/gcc/lib/gcc/x86_64-redhat-linux/13.3.0 -Wl,-rpath,/opt/gcc/lib64 -L/opt/gcc/lib64 -Wl,-rpath,/opt/gcc/x86_64-redhat-linux/lib -L/opt/gcc/x86_64-redhat-linux/lib -Wl,-rpath,/opt/gcc/lib -L/opt/gcc/lib -lmumps -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lm -ldl -liomp5 -lmetis -lm -lcudart -lnvtx3interop -lcufft -lcublas -lcusparse -lcusolver -lcurand -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -ldl -lstdc++ -lquadmath -----------------------------------------