[petsc-users] Reducing cost of MatSetValues
Matthew Knepley
knepley at gmail.com
Wed Jun 1 15:06:01 CDT 2011
On Wed, Jun 1, 2011 at 2:53 PM, John Fettig <john.fettig at gmail.com> wrote:
> What are the recommendations for reducing the cost of inserting values
> into an AIJ matrix? In my application (transient finite element
> solution of flow and heat, linear elements), this is accounting for up
> to 20% of overall runtime. Is this expected?
>
It looks like the calls are taking 0.5 microseconds, but there are 89G
insertions
against only 13K MatMults that take about 1s apiece.
This seems like an awful lot of insertions into a matrix that can be applied
in 1s. The
one thing I can think of is for you to try something matrix-free. However,
this will typically
degrade your solver performance.
Matt
> I have double checked that the matrices are preallocated correctly,
> and I have set MAT_NEW_NONZERO_ALLOCATION_ERR and it runs without
> error.
>
> The matrices periodically change size/nonzero pattern, but until then
> the values are zeroed out and MatSetOption( mat,
> MAT_KEEP_NONZERO_PATTERN, PETSC_TRUE); is called.
>
> The call to MatSetValues happens on a per-element basis on the local
> element matrix, so one call per element. I re-activated the
> MAT_SetValues event and have included a -log_summary from a short run.
>
> Thanks,
> John
>
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
>
> ./test on a intel-opt named lagrange with 1 processor, by jfe Wed Jun
> 1 15:49:20 2011
> Using Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 13:37:48 CDT 2011
>
> Max Max/Min Avg Total
> Time (sec): 2.242e+02 1.00000 2.242e+02
> Objects: 2.718e+03 1.00000 2.718e+03
> Flops: 2.239e+10 1.00000 2.239e+10 2.239e+10
> Flops/sec: 9.986e+07 1.00000 9.986e+07 9.986e+07
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 2.062e+04 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length
> N --> 2N flops
> and VecAXPY() for complex vectors of
> length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 2.2423e+02 100.0% 2.2391e+10 100.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 1.796e+04 87.1%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message
> lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg
> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> KSPSetup 532 1.0 2.6963e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 2.0e+02 0 0 0 0 1 0 0 0 0 1 0
> KSPSolve 277 1.0 3.2930e+01 1.0 2.17e+10 1.0 0.0e+00
> 0.0e+00 9.9e+03 15 97 0 0 48 15 97 0 0 55 660
> PCSetUp 277 1.0 8.3035e+00 1.0 7.52e+08 1.0 0.0e+00
> 0.0e+00 1.4e+03 4 3 0 0 7 4 3 0 0 8 91
> PCApply 4907 1.0 9.1353e+00 1.0 7.08e+09 1.0 0.0e+00
> 0.0e+00 2.0e+00 4 32 0 0 0 4 32 0 0 0 775
> MatMult 12821 1.0 1.5110e+01 1.0 1.34e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00 7 60 0 0 0 7 60 0 0 0 888
> MatMultAdd 3668 1.0 6.9999e-01 1.0 2.83e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 404
> MatSolve 917 1.0 5.3644e-04 1.0 9.17e+02 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2
> MatSOR 7336 1.0 5.7982e+00 1.0 4.62e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00 3 21 0 0 0 3 21 0 0 0 797
> MatLUFactorSym 51 1.0 3.1686e-04 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 5.1e+01 0 0 0 0 0 0 0 0 0 0 0
> MatLUFactorNum 51 1.0 1.2851e-04 1.0 5.10e+01 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 1068 1.0 1.4639e-04 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1068 1.0 7.9752e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatSetValues 89511956 1.0 4.6221e+01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 21 0 0 0 0 21 0 0 0 0 0
> MatGetRowIJ 51 1.0 7.7248e-05 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 51 1.0 5.0688e-04 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 1.0e+02 0 0 0 0 0 0 0 0 0 1 0
> MatZeroEntries 133 1.0 1.7668e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecMax 85 1.0 6.9754e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 8.5e+01 0 0 0 0 0 0 0 0 0 0 0
> VecDot 5479 1.0 3.5107e-01 1.0 8.84e+08 1.0 0.0e+00
> 0.0e+00 5.5e+03 0 4 0 0 27 0 4 0 0 31 2517
> VecNorm 2937 1.0 1.5535e+00 1.0 4.81e+08 1.0 0.0e+00
> 0.0e+00 2.9e+03 1 2 0 0 14 1 2 0 0 16 310
> VecCopy 955 1.0 8.6689e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 6718 1.0 9.4952e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 6937 1.0 6.1062e-01 1.0 1.15e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1891
> VecAYPX 4483 1.0 1.4344e-01 1.0 1.54e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1073
> VecWAXPY 6624 1.0 9.5558e-01 1.0 1.04e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1089
> VecAssemblyBegin 1286 1.0 5.1773e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 3.9e+03 0 0 0 0 19 0 0 0 0 21 0
> VecAssemblyEnd 1286 1.0 1.2426e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecPointwiseMult 3990 1.0 6.7400e-01 1.0 3.57e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 529
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Krylov Solver 258 258 218416 0
> Preconditioner 258 258 193856 0
> Matrix 666 666 140921396 0
> Vec 1383 1382 244199352 0
> Index Set 153 153 81396 0
>
> ========================================================================================================================
> Average time to get PetscTime(): 0
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Sun May 22 22:30:16 2011
> Configure options: --with-x=0 --download-f-blas-lapack=0
> --with-blas-lapack-dir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> --with-mpi=1 --with-mpi-shared=1
> --with-mpi-include=/usr/local/encap/hpmpi-8.01/include
>
> --with-mpi-lib="[/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libmpi.so,/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libfmpi.so]"
> --with-mpi=1 --download-mpich=no --with-debugging=0
> --with-gnu-compilers=no --with-vendor-compilers=intel --with-cc=icc
> --with-cxx=icpc --with-fc=ifort --with-shared=1 --with-c++-support
> --with-clanguage=C --COPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info" --CXXOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info" --FOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info" --download-scalapack=no --download-blacs=no
> --with-blacs=1
> --with-blacs-lib=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_lp64.a
> --with-blacs-include=/opt/intel/Compiler/11.1/072/mkl/include
> --with-scalapack=1
>
> --with-scalapack-lib="[/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_thread.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a]"
> --with-scalapack-include=/opt/intel/Compiler/11.1/072/mkl/include
> --download-umfpack=1 --download-parmetis=1 --download-superlu_dist=1
> --download-mumps=1 --download-ml=1 --with-hypre=1 --download-hypre=yes
> -----------------------------------------
> Libraries compiled on Thu May 26 16:49:30 EDT 2011 on lagrange
> Machine characteristics: Linux lagrange 2.6.39-0.el5.elrepo #1 SMP
> PREEMPT Sat May 21 04:48:38 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /home/jfe/local/centos/petsc-3.1-p8
> Using PETSc arch: intel-opt
> -----------------------------------------
> Using C compiler: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
> Using Fortran compiler: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info
> -----------------------------------------
> Using include paths:
> -I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
> -I/home/jfe/local/centos/petsc-3.1-p8/include
> -I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
> -I/usr/local/encap/hpmpi-8.01/include
> ------------------------------------------
> Using C linker: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
> Using Fortran linker: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info
> Using libraries:
> -Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
> -L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lpetsc
> -Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
> -L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lcmumps -ldmumps
> -lsmumps -lzmumps -lmumps_common -lpord
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t -lmkl_scalapack_lp64
> -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_lp64
> -lHYPRE -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
>
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lstdc++ -lml
> -lstdc++ -lsuperlu_dist_2.4 -lparmetis -lmetis -lumfpack -lamd
> -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
> -Wl,-rpath,/usr/local/encap/hpmpi-8.01/lib/linux_amd64
> -L/usr/local/encap/hpmpi-8.01/lib/linux_amd64 -lmpi -lfmpi -ldl
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
> -L/opt/intel/Compiler/11.1/072/lib/intel64
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
> -L/opt/intel/Compiler/11.1/072/ipp/em64t/lib
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t
>
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
>
> -L/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2
> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -limf -lsvml -lipgo -ldecimal
> -lgcc_s -lirc -lirc_s -lifport -lifcore -lm -lpthread -lm -lstdc++
> -lstdc++ -ldl -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl
> ------------------------------------------
>
--
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110601/b8788684/attachment-0001.htm>
More information about the petsc-users
mailing list