[petsc-users] Reducing cost of MatSetValues
Barry Smith
bsmith at mcs.anl.gov
Wed Jun 1 16:15:41 CDT 2011
If you run with -info and grep the results for "Stash has" do you see much in the way of stashed values? These are values computed on one process that must be shipped to another process. Likely you have done a good job of partitioning and there are not that many relative to local number of values set. If the number of mallocs is high for this you can run with a big enough -matstash_initial_size so that no additional mallocs are needed.
Also if you are hardwired to the MPIAIJ format you can call MatSetValues_MPIAIJ() directly instead of MatSetValues() (just add a prototype for this function to your source code). This could shave off a couple percent.
I am hard pressed at seeing anyway to really shrink down the code in MatSetValues_MPIAIJ() anymore than it already is. Since your linear solves are pretty easy it is tough to make the setting values a small percent of the total time.
Using -snes_lag_jacobian 2 or 3 (or even bigger) might help.
Barry
On Jun 1, 2011, at 2:53 PM, John Fettig wrote:
> What are the recommendations for reducing the cost of inserting values
> into an AIJ matrix? In my application (transient finite element
> solution of flow and heat, linear elements), this is accounting for up
> to 20% of overall runtime. Is this expected?
>
> I have double checked that the matrices are preallocated correctly,
> and I have set MAT_NEW_NONZERO_ALLOCATION_ERR and it runs without
> error.
>
> The matrices periodically change size/nonzero pattern, but until then
> the values are zeroed out and MatSetOption( mat,
> MAT_KEEP_NONZERO_PATTERN, PETSC_TRUE); is called.
>
> The call to MatSetValues happens on a per-element basis on the local
> element matrix, so one call per element. I re-activated the
> MAT_SetValues event and have included a -log_summary from a short run.
>
> Thanks,
> John
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
>
> ./test on a intel-opt named lagrange with 1 processor, by jfe Wed Jun
> 1 15:49:20 2011
> Using Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 13:37:48 CDT 2011
>
> Max Max/Min Avg Total
> Time (sec): 2.242e+02 1.00000 2.242e+02
> Objects: 2.718e+03 1.00000 2.718e+03
> Flops: 2.239e+10 1.00000 2.239e+10 2.239e+10
> Flops/sec: 9.986e+07 1.00000 9.986e+07 9.986e+07
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 2.062e+04 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length
> N --> 2N flops
> and VecAXPY() for complex vectors of
> length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- ---
> Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 2.2423e+02 100.0% 2.2391e+10 100.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 1.796e+04 87.1%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message
> lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg
> len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> KSPSetup 532 1.0 2.6963e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 2.0e+02 0 0 0 0 1 0 0 0 0 1 0
> KSPSolve 277 1.0 3.2930e+01 1.0 2.17e+10 1.0 0.0e+00
> 0.0e+00 9.9e+03 15 97 0 0 48 15 97 0 0 55 660
> PCSetUp 277 1.0 8.3035e+00 1.0 7.52e+08 1.0 0.0e+00
> 0.0e+00 1.4e+03 4 3 0 0 7 4 3 0 0 8 91
> PCApply 4907 1.0 9.1353e+00 1.0 7.08e+09 1.0 0.0e+00
> 0.0e+00 2.0e+00 4 32 0 0 0 4 32 0 0 0 775
> MatMult 12821 1.0 1.5110e+01 1.0 1.34e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00 7 60 0 0 0 7 60 0 0 0 888
> MatMultAdd 3668 1.0 6.9999e-01 1.0 2.83e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 404
> MatSolve 917 1.0 5.3644e-04 1.0 9.17e+02 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2
> MatSOR 7336 1.0 5.7982e+00 1.0 4.62e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00 3 21 0 0 0 3 21 0 0 0 797
> MatLUFactorSym 51 1.0 3.1686e-04 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 5.1e+01 0 0 0 0 0 0 0 0 0 0 0
> MatLUFactorNum 51 1.0 1.2851e-04 1.0 5.10e+01 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 1068 1.0 1.4639e-04 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1068 1.0 7.9752e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatSetValues 89511956 1.0 4.6221e+01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 21 0 0 0 0 21 0 0 0 0 0
> MatGetRowIJ 51 1.0 7.7248e-05 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 51 1.0 5.0688e-04 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 1.0e+02 0 0 0 0 0 0 0 0 0 1 0
> MatZeroEntries 133 1.0 1.7668e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecMax 85 1.0 6.9754e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 8.5e+01 0 0 0 0 0 0 0 0 0 0 0
> VecDot 5479 1.0 3.5107e-01 1.0 8.84e+08 1.0 0.0e+00
> 0.0e+00 5.5e+03 0 4 0 0 27 0 4 0 0 31 2517
> VecNorm 2937 1.0 1.5535e+00 1.0 4.81e+08 1.0 0.0e+00
> 0.0e+00 2.9e+03 1 2 0 0 14 1 2 0 0 16 310
> VecCopy 955 1.0 8.6689e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 6718 1.0 9.4952e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 6937 1.0 6.1062e-01 1.0 1.15e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1891
> VecAYPX 4483 1.0 1.4344e-01 1.0 1.54e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1073
> VecWAXPY 6624 1.0 9.5558e-01 1.0 1.04e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1089
> VecAssemblyBegin 1286 1.0 5.1773e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 3.9e+03 0 0 0 0 19 0 0 0 0 21 0
> VecAssemblyEnd 1286 1.0 1.2426e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecPointwiseMult 3990 1.0 6.7400e-01 1.0 3.57e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 529
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Krylov Solver 258 258 218416 0
> Preconditioner 258 258 193856 0
> Matrix 666 666 140921396 0
> Vec 1383 1382 244199352 0
> Index Set 153 153 81396 0
> ========================================================================================================================
> Average time to get PetscTime(): 0
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Sun May 22 22:30:16 2011
> Configure options: --with-x=0 --download-f-blas-lapack=0
> --with-blas-lapack-dir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> --with-mpi=1 --with-mpi-shared=1
> --with-mpi-include=/usr/local/encap/hpmpi-8.01/include
> --with-mpi-lib="[/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libmpi.so,/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libfmpi.so]"
> --with-mpi=1 --download-mpich=no --with-debugging=0
> --with-gnu-compilers=no --with-vendor-compilers=intel --with-cc=icc
> --with-cxx=icpc --with-fc=ifort --with-shared=1 --with-c++-support
> --with-clanguage=C --COPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info" --CXXOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info" --FOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info" --download-scalapack=no --download-blacs=no
> --with-blacs=1 --with-blacs-lib=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_lp64.a
> --with-blacs-include=/opt/intel/Compiler/11.1/072/mkl/include
> --with-scalapack=1
> --with-scalapack-lib="[/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_thread.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a]"
> --with-scalapack-include=/opt/intel/Compiler/11.1/072/mkl/include
> --download-umfpack=1 --download-parmetis=1 --download-superlu_dist=1
> --download-mumps=1 --download-ml=1 --with-hypre=1 --download-hypre=yes
> -----------------------------------------
> Libraries compiled on Thu May 26 16:49:30 EDT 2011 on lagrange
> Machine characteristics: Linux lagrange 2.6.39-0.el5.elrepo #1 SMP
> PREEMPT Sat May 21 04:48:38 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /home/jfe/local/centos/petsc-3.1-p8
> Using PETSc arch: intel-opt
> -----------------------------------------
> Using C compiler: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
> Using Fortran compiler: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info
> -----------------------------------------
> Using include paths:
> -I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
> -I/home/jfe/local/centos/petsc-3.1-p8/include
> -I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
> -I/usr/local/encap/hpmpi-8.01/include
> ------------------------------------------
> Using C linker: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
> Using Fortran linker: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info
> Using libraries:
> -Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
> -L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lpetsc
> -Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
> -L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lcmumps -ldmumps
> -lsmumps -lzmumps -lmumps_common -lpord
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t -lmkl_scalapack_lp64
> -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_lp64
> -lHYPRE -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lstdc++ -lml
> -lstdc++ -lsuperlu_dist_2.4 -lparmetis -lmetis -lumfpack -lamd
> -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
> -Wl,-rpath,/usr/local/encap/hpmpi-8.01/lib/linux_amd64
> -L/usr/local/encap/hpmpi-8.01/lib/linux_amd64 -lmpi -lfmpi -ldl
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
> -L/opt/intel/Compiler/11.1/072/lib/intel64
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
> -L/opt/intel/Compiler/11.1/072/ipp/em64t/lib
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
> -L/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2
> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -limf -lsvml -lipgo -ldecimal
> -lgcc_s -lirc -lirc_s -lifport -lifcore -lm -lpthread -lm -lstdc++
> -lstdc++ -ldl -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl
> ------------------------------------------
More information about the petsc-users
mailing list