------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with GPU support and you've # # created PETSc/GPU objects, but you intentionally # # used -use_gpu_aware_mpi 0, requiring PETSc to copy # # additional data between the GPU and CPU. To obtain # # meaningful timing results on multi-rank runs, use # # GPU-aware MPI instead. # # # ########################################################## _ on a named host1 with 4 processes and CUDA architecture 80, by penazzi on Thu Jan 22 15:38:13 2026 Using PETSc Release Version 3.23.3, May 30, 2025 Max Max/Min Avg Total Time (sec): 2.029e+01 1.000 2.029e+01 Objects: 0.000e+00 0.000 0.000e+00 Flops: 1.194e+10 1.000 1.194e+10 4.775e+10 Flops/sec: 5.883e+08 1.000 5.883e+08 2.353e+09 MPI Msg Count: 1.272e+03 1.000 1.272e+03 5.088e+03 MPI Msg Len (bytes): 4.045e+08 1.000 3.180e+05 1.618e+09 MPI Reductions: 4.280e+02 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 2.0268e+01 99.9% 4.7752e+10 100.0% 5.080e+03 99.8% 3.185e+05 100.0% 1.000e+01 2.3% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 1e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 1e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 1e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 1e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage BuildTwoSided 6 1.0 n/a n/a 0.00e+00 0.0 1.6e+01 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 cuBLAS Init 1 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 DCtxCreate 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 DCtxSetUp 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 DCtxSetDevice 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 DCtxSync 6 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatMult 631 1.0 n/a n/a 4.10e+09 1.0 5.0e+03 3.2e+05 0.0e+00 2 34 99 100 0 2 34 99 100 0 n/a n/a 631 4.04e+02 631 4.04e+02 100 MatSolve 633 1.0 n/a n/a 4.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 n/a n/a 2 8.00e+00 0 0.00e+00 100 MatCholFctrNum 2 1.0 n/a n/a 3.38e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 MatICCFactorSym 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatConvert 4 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatAssemblyBegin 8 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 8 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatGetRowIJ 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatGetOrdering 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 MatCUSPARSCopyTo 4 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 4 8.93e+01 0 0.00e+00 0 MatSetPreallCOO 2 1.0 n/a n/a 0.00e+00 0.0 3.2e+01 8.0e+04 2.0e+00 6 0 1 0 0 6 0 1 0 20 n/a n/a 4 8.93e+01 0 0.00e+00 0 MatSetValuesCOO 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 KSPSetUp 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 KSPSolve 2 1.0 3.2179e+00 1.0 1.19e+10 1.0 5.0e+03 3.2e+05 0.0e+00 16 100 99 100 0 16 100 99 100 0 14798 n/a 635 4.20e+02 631 4.04e+02 100 SFSetGraph 6 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 SFSetUp 6 1.0 n/a n/a 0.00e+00 0.0 3.2e+01 8.0e+04 0.0e+00 0 0 1 0 0 0 0 1 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 SFReduceBegin 6 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 SFReduceEnd 6 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 SFFetchOpBegin 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 SFFetchOpEnd 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 SFPack 639 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 SFUnpack 641 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 VecTDot 1266 1.0 n/a n/a 1.27e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 n/a n/a 2 8.00e+00 0 0.00e+00 100 VecNorm 633 1.0 n/a n/a 6.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 13 5 0 0 0 13 5 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 VecCopy 6 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 VecSet 4 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 VecAXPY 1262 1.0 n/a n/a 1.26e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 VecAYPX 629 1.0 n/a n/a 6.29e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 631 1.0 n/a n/a 0.00e+00 0.0 5.0e+03 3.2e+05 0.0e+00 1 0 99 100 0 1 0 99 100 0 n/a n/a 0 0.00e+00 631 4.04e+02 0 VecScatterEnd 631 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 631 4.04e+02 0 0.00e+00 0 VecCUDACopyTo 4 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 4 1.60e+01 0 0.00e+00 0 VecCUDACopyFrom 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 2 8.00e+00 0 PCSetUp 2 1.0 n/a n/a 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 0 PCSetUpOnBlocks 2 1.0 n/a n/a 3.38e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 n/a n/a 0 0.00e+00 0 0.00e+00 100 PCApply 633 1.0 n/a n/a 4.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 n/a n/a 2 8.00e+00 0 0.00e+00 100 PCApplyOnBlocks 633 1.0 n/a n/a 4.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 34 0 0 0 0 34 0 0 0 n/a n/a 2 8.00e+00 0 0.00e+00 100 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Object Type Creations Destructions. Reports information only for process 0. --- Event Stage 0: Main Stage Container 7 0 PetscDeviceContext 2 0 Viewer 1 0 Krylov Solver 4 0 Matrix 12 4 Star Forest Graph 6 2 Vector 20 4 Index Set 8 8 Preconditioner 4 0 ======================================================================================================================== Average time to get PetscTime(): 2.5006e-08 Average time for MPI_Barrier(): 2.13003e-06 Average time for zero size MPI_Send(): 3.20002e-06 #PETSc Option Table entries: -error_output_none # (source: command line) -use_gpu_aware_mpi 0 # (source: command line) #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/dev/p/b/petsca1b0d185983c8/b/petsc_installation --with-petsc-arch=arch-curr --with-scalar-type=real --with-fc=mpiifort --with-single-library=1 --with-shared-libraries=1 --with-mpi-compilers=1 --with-mpi=1 --with-metis=1 --with-metis-include="[/dev/p/metisf9b87aca90b59/p/include]" --with-metis-lib="-L"/dev/p/metisf9b87aca90b59/p/lib" -lmetis -lm" --with-mumps=1 --with-mumps-include="[/dev/p/mumpsf03c1787b5a9d/p/include]" --with-mumps-lib="-L"/dev/p/mumpsf03c1787b5a9d/p/lib" -lmumps" --with-valgrind=0 --with-clanguage=c++ --with-x=0 --with-ssl=0 --with-blaslapack-lib="-L"/dev/p/mkl9d753c16edb92/p/lib/intel64" -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lm -ldl -L"/dev/p/iomp3a211aa45745e/p/lib" -liomp5" --with-scalapack=1 --with-scalapack-include="[/dev/p/mkl9d753c16edb92/p/include,/dev/p/mkl9d753c16edb92/p/include]" --with-scalapack-lib="-L"/dev/p/mkl9d753c16edb92/p/lib/intel64" -lmkl_scalapack_lp64 -L"/dev/p/mkl9d753c16edb92/p/lib/intel64" -lmkl_blacs_intelmpi_lp64" --with-petsc4py=0 --with-debugging=0 --with-openmp=0 --with-visibility=1 --with-fortran-bindings=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-cuda=1 --with-cuda-arch=70,80,90 --with-cc=mpicc -cc=/opt/ccache/compilers/gcc --with-cxx=mpicxx -cxx=/opt/ccache/compilers/g++ --with-shared-ld=ld --with-ar=ar ----------------------------------------- Libraries compiled on 2025-10-30 19:31:13 on 0b4c776bc739 Machine characteristics: Linux-5.15.0-152-generic-x86_64-with-glibc2.28 Using PETSc directory: /dev/p/b/petsca1b0d185983c8/b/petsc_installation Using PETSc arch: ----------------------------------------- Using C compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-psabi -fstack-protector -fvisibility=hidden -O3 -std=gnu++20 -fPIC Using Fortran compiler: mpiifort -fPIC -O3 ----------------------------------------- Using include paths: -I/dev/p/b/petsca1b0d185983c8/b/petsc_installation/include -I/dev/p/mumpsf03c1787b5a9d/p/include -I/dev/p/metisf9b87aca90b59/p/include -I/opt/cuda/cuda-12.9/include ----------------------------------------- Using C linker: mpicxx Using Fortran linker: mpiifort Using libraries: -Wl,-rpath,/dev/p/b/petsca1b0d185983c8/b/petsc_installation/lib -L/dev/p/b/petsca1b0d185983c8/b/petsc_installation/lib -lpetsc -Wl,-rpath,"/dev/p/mumpsf03c1787b5a9d/p/lib" -L"/dev/p/mumpsf03c1787b5a9d/p/lib" -Wl,-rpath,"/dev/p/mkl9d753c16edb92/p/lib/intel64" -L"/dev/p/mkl9d753c16edb92/p/lib/intel64" -Wl,-rpath,"/dev/p/iomp3a211aa45745e/p/lib" -L"/dev/p/iomp3a211aa45745e/p/lib" -Wl,-rpath,"/dev/p/metisf9b87aca90b59/p/lib" -L"/dev/p/metisf9b87aca90b59/p/lib" -Wl,-rpath,/opt/cuda/cuda-12.9/lib64 -L/opt/cuda/cuda-12.9/lib64 -Wl,-rpath,/opt/intel/compiler/2023.0.0/linux/compiler/lib/intel64_lin -L/opt/intel/compiler/2023.0.0/linux/compiler/lib/intel64_lin -Wl,-rpath,/opt/gcc/lib/gcc/x86_64-redhat-linux/13.3.0 -L/opt/gcc/lib/gcc/x86_64-redhat-linux/13.3.0 -Wl,-rpath,/opt/gcc/lib64 -L/opt/gcc/lib64 -Wl,-rpath,/opt/gcc/x86_64-redhat-linux/lib -L/opt/gcc/x86_64-redhat-linux/lib -Wl,-rpath,/opt/gcc/lib -L/opt/gcc/lib -lmumps -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lm -ldl -liomp5 -lmetis -lm -lcudart -lnvtx3interop -lcufft -lcublas -lcusparse -lcusolver -lcurand -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lpthread -lgcc_s -lirc_s -ldl -lstdc++ -lquadmath ----------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with GPU support and you've # # created PETSc/GPU objects, but you intentionally # # used -use_gpu_aware_mpi 0, requiring PETSc to copy # # additional data between the GPU and CPU. To obtain # # meaningful timing results on multi-rank runs, use # # GPU-aware MPI instead. # # # ##########################################################