[petsc-users] Reducing cost of MatSetValues

Wed Jun 1 15:06:01 CDT 2011

On Wed, Jun 1, 2011 at 2:53 PM, John Fettig <john.fettig at gmail.com> wrote:

> What are the recommendations for reducing the cost of inserting values
> into an AIJ matrix?  In my application (transient finite element
> solution of flow and heat, linear elements), this is accounting for up
> to 20% of overall runtime.  Is this expected?
>

It looks like the calls are taking 0.5 microseconds, but there are 89G
insertions
against only 13K MatMults that take about 1s apiece.

This seems like an awful lot of insertions into a matrix that can be applied
in 1s. The
one thing I can think of is for you to try something matrix-free. However,
this will typically
degrade your solver performance.

   Matt

> I have double checked that the matrices are preallocated correctly,
> and I have set MAT_NEW_NONZERO_ALLOCATION_ERR and it runs without
> error.
>
> The matrices periodically change size/nonzero pattern, but until then
> the values are zeroed out and MatSetOption( mat,
> MAT_KEEP_NONZERO_PATTERN, PETSC_TRUE); is called.
>
> The call to MatSetValues happens on a per-element basis on the local
> element matrix, so one call per element.  I re-activated the
> MAT_SetValues event and have included a -log_summary from a short run.
>
> Thanks,
> John
>
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance
> Summary: ----------------------------------------------
>
> ./test on a intel-opt named lagrange with 1 processor, by jfe Wed Jun
> 1 15:49:20 2011
> Using Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 13:37:48 CDT 2011
>
>                         Max       Max/Min        Avg      Total
> Time (sec):           2.242e+02      1.00000   2.242e+02
> Objects:              2.718e+03      1.00000   2.718e+03
> Flops:                2.239e+10      1.00000   2.239e+10  2.239e+10
> Flops/sec:            9.986e+07      1.00000   9.986e+07  9.986e+07
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       2.062e+04      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                            e.g., VecAXPY() for real vectors of length
> N --> 2N flops
>                            and VecAXPY() for complex vectors of
> length N --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  ---
> Messages ---  -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 2.2423e+02 100.0%  2.2391e+10 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  1.796e+04  87.1%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>   Count: number of times phase was executed
>   Time and Flops: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>   Mess: number of messages sent
>   Avg. len: average message length
>   Reduct: number of global reductions
>   Global: entire computation
>   Stage: stages of a computation. Set stages with PetscLogStagePush()
> and PetscLogStagePop().
>      %T - percent time in this phase         %F - percent flops in this
> phase
>      %M - percent messages in this phase     %L - percent message
> lengths in this phase
>      %R - percent reductions in this phase
>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops
>          --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> KSPSetup             532 1.0 2.6963e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 2.0e+02  0  0  0  0  1   0  0  0  0  1     0
> KSPSolve             277 1.0 3.2930e+01 1.0 2.17e+10 1.0 0.0e+00
> 0.0e+00 9.9e+03 15 97  0  0 48  15 97  0  0 55   660
> PCSetUp              277 1.0 8.3035e+00 1.0 7.52e+08 1.0 0.0e+00
> 0.0e+00 1.4e+03  4  3  0  0  7   4  3  0  0  8    91
> PCApply             4907 1.0 9.1353e+00 1.0 7.08e+09 1.0 0.0e+00
> 0.0e+00 2.0e+00  4 32  0  0  0   4 32  0  0  0   775
> MatMult            12821 1.0 1.5110e+01 1.0 1.34e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00  7 60  0  0  0   7 60  0  0  0   888
> MatMultAdd          3668 1.0 6.9999e-01 1.0 2.83e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   404
> MatSolve             917 1.0 5.3644e-04 1.0 9.17e+02 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
> MatSOR              7336 1.0 5.7982e+00 1.0 4.62e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00  3 21  0  0  0   3 21  0  0  0   797
> MatLUFactorSym        51 1.0 3.1686e-04 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 5.1e+01  0  0  0  0  0   0  0  0  0  0     0
> MatLUFactorNum        51 1.0 1.2851e-04 1.0 5.10e+01 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin    1068 1.0 1.4639e-04 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd      1068 1.0 7.9752e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatSetValues     89511956 1.0 4.6221e+01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 21  0  0  0  0  21  0  0  0  0     0
> MatGetRowIJ           51 1.0 7.7248e-05 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering        51 1.0 5.0688e-04 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 1.0e+02  0  0  0  0  0   0  0  0  0  1     0
> MatZeroEntries       133 1.0 1.7668e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecMax                85 1.0 6.9754e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 8.5e+01  0  0  0  0  0   0  0  0  0  0     0
> VecDot              5479 1.0 3.5107e-01 1.0 8.84e+08 1.0 0.0e+00
> 0.0e+00 5.5e+03  0  4  0  0 27   0  4  0  0 31  2517
> VecNorm             2937 1.0 1.5535e+00 1.0 4.81e+08 1.0 0.0e+00
> 0.0e+00 2.9e+03  1  2  0  0 14   1  2  0  0 16   310
> VecCopy              955 1.0 8.6689e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              6718 1.0 9.4952e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY             6937 1.0 6.1062e-01 1.0 1.15e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  1891
> VecAYPX             4483 1.0 1.4344e-01 1.0 1.54e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1073
> VecWAXPY            6624 1.0 9.5558e-01 1.0 1.04e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  1089
> VecAssemblyBegin    1286 1.0 5.1773e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 3.9e+03  0  0  0  0 19   0  0  0  0 21     0
> VecAssemblyEnd      1286 1.0 1.2426e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecPointwiseMult    3990 1.0 6.7400e-01 1.0 3.57e+08 1.0 0.0e+00
> 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0   529
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>       Krylov Solver   258            258       218416     0
>      Preconditioner   258            258       193856     0
>              Matrix   666            666    140921396     0
>                 Vec  1383           1382    244199352     0
>           Index Set   153            153        81396     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 0
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Sun May 22 22:30:16 2011
> Configure options: --with-x=0 --download-f-blas-lapack=0
> --with-blas-lapack-dir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> --with-mpi=1 --with-mpi-shared=1
> --with-mpi-include=/usr/local/encap/hpmpi-8.01/include
>
> --with-mpi-lib="[/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libmpi.so,/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libfmpi.so]"
> --with-mpi=1 --download-mpich=no --with-debugging=0
> --with-gnu-compilers=no --with-vendor-compilers=intel --with-cc=icc
> --with-cxx=icpc --with-fc=ifort --with-shared=1 --with-c++-support
> --with-clanguage=C --COPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info" --CXXOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info" --FOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info" --download-scalapack=no --download-blacs=no
> --with-blacs=1
> --with-blacs-lib=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_lp64.a
> --with-blacs-include=/opt/intel/Compiler/11.1/072/mkl/include
> --with-scalapack=1
>
> --with-scalapack-lib="[/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_thread.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a]"
> --with-scalapack-include=/opt/intel/Compiler/11.1/072/mkl/include
> --download-umfpack=1 --download-parmetis=1 --download-superlu_dist=1
> --download-mumps=1 --download-ml=1 --with-hypre=1 --download-hypre=yes
> -----------------------------------------
> Libraries compiled on Thu May 26 16:49:30 EDT 2011 on lagrange
> Machine characteristics: Linux lagrange 2.6.39-0.el5.elrepo #1 SMP
> PREEMPT Sat May 21 04:48:38 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /home/jfe/local/centos/petsc-3.1-p8
> Using PETSc arch: intel-opt
> -----------------------------------------
> Using C compiler: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
> Using Fortran compiler: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info
> -----------------------------------------
> Using include paths:
> -I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
> -I/home/jfe/local/centos/petsc-3.1-p8/include
> -I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
> -I/usr/local/encap/hpmpi-8.01/include
> ------------------------------------------
> Using C linker: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
> Using Fortran linker: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
> inline_debug_info
> Using libraries:
> -Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
> -L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lpetsc
> -Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
> -L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lcmumps -ldmumps
> -lsmumps -lzmumps -lmumps_common -lpord
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t -lmkl_scalapack_lp64
> -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_lp64
> -lHYPRE -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
>
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lstdc++ -lml
> -lstdc++ -lsuperlu_dist_2.4 -lparmetis -lmetis -lumfpack -lamd
> -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
> -Wl,-rpath,/usr/local/encap/hpmpi-8.01/lib/linux_amd64
> -L/usr/local/encap/hpmpi-8.01/lib/linux_amd64 -lmpi -lfmpi -ldl
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
> -L/opt/intel/Compiler/11.1/072/lib/intel64
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
> -L/opt/intel/Compiler/11.1/072/ipp/em64t/lib
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
> -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t
>
> -Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
>
> -L/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
> -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2
> -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -limf -lsvml -lipgo -ldecimal
> -lgcc_s -lirc -lirc_s -lifport -lifcore -lm -lpthread -lm -lstdc++
> -lstdc++ -ldl -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl
> ------------------------------------------
>

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110601/b8788684/attachment-0001.htm>