[petsc-users] Reducing cost of MatSetValues
John Fettig
john.fettig at gmail.com
Wed Jun 1 14:53:07 CDT 2011
What are the recommendations for reducing the cost of inserting values
into an AIJ matrix? In my application (transient finite element
solution of flow and heat, linear elements), this is accounting for up
to 20% of overall runtime. Is this expected?
I have double checked that the matrices are preallocated correctly,
and I have set MAT_NEW_NONZERO_ALLOCATION_ERR and it runs without
error.
The matrices periodically change size/nonzero pattern, but until then
the values are zeroed out and MatSetOption( mat,
MAT_KEEP_NONZERO_PATTERN, PETSC_TRUE); is called.
The call to MatSetValues happens on a per-element basis on the local
element matrix, so one call per element. I re-activated the
MAT_SetValues event and have included a -log_summary from a short run.
Thanks,
John
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./test on a intel-opt named lagrange with 1 processor, by jfe Wed Jun
1 15:49:20 2011
Using Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 13:37:48 CDT 2011
Max Max/Min Avg Total
Time (sec): 2.242e+02 1.00000 2.242e+02
Objects: 2.718e+03 1.00000 2.718e+03
Flops: 2.239e+10 1.00000 2.239e+10 2.239e+10
Flops/sec: 9.986e+07 1.00000 9.986e+07 9.986e+07
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 2.062e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length
N --> 2N flops
and VecAXPY() for complex vectors of
length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- ---
Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 2.2423e+02 100.0% 2.2391e+10 100.0% 0.000e+00
0.0% 0.000e+00 0.0% 1.796e+04 87.1%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg
len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
KSPSetup 532 1.0 2.6963e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+02 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 277 1.0 3.2930e+01 1.0 2.17e+10 1.0 0.0e+00
0.0e+00 9.9e+03 15 97 0 0 48 15 97 0 0 55 660
PCSetUp 277 1.0 8.3035e+00 1.0 7.52e+08 1.0 0.0e+00
0.0e+00 1.4e+03 4 3 0 0 7 4 3 0 0 8 91
PCApply 4907 1.0 9.1353e+00 1.0 7.08e+09 1.0 0.0e+00
0.0e+00 2.0e+00 4 32 0 0 0 4 32 0 0 0 775
MatMult 12821 1.0 1.5110e+01 1.0 1.34e+10 1.0 0.0e+00
0.0e+00 0.0e+00 7 60 0 0 0 7 60 0 0 0 888
MatMultAdd 3668 1.0 6.9999e-01 1.0 2.83e+08 1.0 0.0e+00
0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 404
MatSolve 917 1.0 5.3644e-04 1.0 9.17e+02 1.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2
MatSOR 7336 1.0 5.7982e+00 1.0 4.62e+09 1.0 0.0e+00
0.0e+00 0.0e+00 3 21 0 0 0 3 21 0 0 0 797
MatLUFactorSym 51 1.0 3.1686e-04 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 5.1e+01 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 51 1.0 1.2851e-04 1.0 5.10e+01 1.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1068 1.0 1.4639e-04 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1068 1.0 7.9752e-01 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatSetValues 89511956 1.0 4.6221e+01 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 21 0 0 0 0 21 0 0 0 0 0
MatGetRowIJ 51 1.0 7.7248e-05 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 51 1.0 5.0688e-04 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+02 0 0 0 0 0 0 0 0 0 1 0
MatZeroEntries 133 1.0 1.7668e-01 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMax 85 1.0 6.9754e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 8.5e+01 0 0 0 0 0 0 0 0 0 0 0
VecDot 5479 1.0 3.5107e-01 1.0 8.84e+08 1.0 0.0e+00
0.0e+00 5.5e+03 0 4 0 0 27 0 4 0 0 31 2517
VecNorm 2937 1.0 1.5535e+00 1.0 4.81e+08 1.0 0.0e+00
0.0e+00 2.9e+03 1 2 0 0 14 1 2 0 0 16 310
VecCopy 955 1.0 8.6689e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 6718 1.0 9.4952e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 6937 1.0 6.1062e-01 1.0 1.15e+09 1.0 0.0e+00
0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1891
VecAYPX 4483 1.0 1.4344e-01 1.0 1.54e+08 1.0 0.0e+00
0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1073
VecWAXPY 6624 1.0 9.5558e-01 1.0 1.04e+09 1.0 0.0e+00
0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1089
VecAssemblyBegin 1286 1.0 5.1773e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 3.9e+03 0 0 0 0 19 0 0 0 0 21 0
VecAssemblyEnd 1286 1.0 1.2426e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 3990 1.0 6.7400e-01 1.0 3.57e+08 1.0 0.0e+00
0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 529
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 258 258 218416 0
Preconditioner 258 258 193856 0
Matrix 666 666 140921396 0
Vec 1383 1382 244199352 0
Index Set 153 153 81396 0
========================================================================================================================
Average time to get PetscTime(): 0
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Sun May 22 22:30:16 2011
Configure options: --with-x=0 --download-f-blas-lapack=0
--with-blas-lapack-dir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t
--with-mpi=1 --with-mpi-shared=1
--with-mpi-include=/usr/local/encap/hpmpi-8.01/include
--with-mpi-lib="[/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libmpi.so,/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libfmpi.so]"
--with-mpi=1 --download-mpich=no --with-debugging=0
--with-gnu-compilers=no --with-vendor-compilers=intel --with-cc=icc
--with-cxx=icpc --with-fc=ifort --with-shared=1 --with-c++-support
--with-clanguage=C --COPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info" --CXXOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info" --FOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info" --download-scalapack=no --download-blacs=no
--with-blacs=1 --with-blacs-lib=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_lp64.a
--with-blacs-include=/opt/intel/Compiler/11.1/072/mkl/include
--with-scalapack=1
--with-scalapack-lib="[/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_thread.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a]"
--with-scalapack-include=/opt/intel/Compiler/11.1/072/mkl/include
--download-umfpack=1 --download-parmetis=1 --download-superlu_dist=1
--download-mumps=1 --download-ml=1 --with-hypre=1 --download-hypre=yes
-----------------------------------------
Libraries compiled on Thu May 26 16:49:30 EDT 2011 on lagrange
Machine characteristics: Linux lagrange 2.6.39-0.el5.elrepo #1 SMP
PREEMPT Sat May 21 04:48:38 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /home/jfe/local/centos/petsc-3.1-p8
Using PETSc arch: intel-opt
-----------------------------------------
Using C compiler: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
Using Fortran compiler: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info
-----------------------------------------
Using include paths:
-I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
-I/home/jfe/local/centos/petsc-3.1-p8/include
-I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
-I/usr/local/encap/hpmpi-8.01/include
------------------------------------------
Using C linker: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
Using Fortran linker: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info
Using libraries:
-Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
-L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lpetsc
-Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
-L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lcmumps -ldmumps
-lsmumps -lzmumps -lmumps_common -lpord
-Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
-L/opt/intel/Compiler/11.1/072/mkl/lib/em64t -lmkl_scalapack_lp64
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_lp64
-lHYPRE -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
-Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
-Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lstdc++ -lml
-lstdc++ -lsuperlu_dist_2.4 -lparmetis -lmetis -lumfpack -lamd
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
-Wl,-rpath,/usr/local/encap/hpmpi-8.01/lib/linux_amd64
-L/usr/local/encap/hpmpi-8.01/lib/linux_amd64 -lmpi -lfmpi -ldl
-Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
-L/opt/intel/Compiler/11.1/072/lib/intel64
-Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
-L/opt/intel/Compiler/11.1/072/ipp/em64t/lib
-Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
-L/opt/intel/Compiler/11.1/072/mkl/lib/em64t
-Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
-L/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -limf -lsvml -lipgo -ldecimal
-lgcc_s -lirc -lirc_s -lifport -lifcore -lm -lpthread -lm -lstdc++
-lstdc++ -ldl -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl
------------------------------------------
More information about the petsc-users
mailing list