[petsc-users] Reducing cost of MatSetValues

John Fettig john.fettig at gmail.com
Wed Jun 1 14:53:07 CDT 2011


What are the recommendations for reducing the cost of inserting values
into an AIJ matrix?  In my application (transient finite element
solution of flow and heat, linear elements), this is accounting for up
to 20% of overall runtime.  Is this expected?

I have double checked that the matrices are preallocated correctly,
and I have set MAT_NEW_NONZERO_ALLOCATION_ERR and it runs without
error.

The matrices periodically change size/nonzero pattern, but until then
the values are zeroed out and MatSetOption( mat,
MAT_KEEP_NONZERO_PATTERN, PETSC_TRUE); is called.

The call to MatSetValues happens on a per-element basis on the local
element matrix, so one call per element.  I re-activated the
MAT_SetValues event and have included a -log_summary from a short run.

Thanks,
John

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------

./test on a intel-opt named lagrange with 1 processor, by jfe Wed Jun
1 15:49:20 2011
Using Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 13:37:48 CDT 2011

                         Max       Max/Min        Avg      Total
Time (sec):           2.242e+02      1.00000   2.242e+02
Objects:              2.718e+03      1.00000   2.718e+03
Flops:                2.239e+10      1.00000   2.239e+10  2.239e+10
Flops/sec:            9.986e+07      1.00000   9.986e+07  9.986e+07
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       2.062e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length
N --> 2N flops
                            and VecAXPY() for complex vectors of
length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  ---
Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 2.2423e+02 100.0%  2.2391e+10 100.0%  0.000e+00
0.0%  0.000e+00        0.0%  1.796e+04  87.1%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message
lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops
          --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

KSPSetup             532 1.0 2.6963e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+02  0  0  0  0  1   0  0  0  0  1     0
KSPSolve             277 1.0 3.2930e+01 1.0 2.17e+10 1.0 0.0e+00
0.0e+00 9.9e+03 15 97  0  0 48  15 97  0  0 55   660
PCSetUp              277 1.0 8.3035e+00 1.0 7.52e+08 1.0 0.0e+00
0.0e+00 1.4e+03  4  3  0  0  7   4  3  0  0  8    91
PCApply             4907 1.0 9.1353e+00 1.0 7.08e+09 1.0 0.0e+00
0.0e+00 2.0e+00  4 32  0  0  0   4 32  0  0  0   775
MatMult            12821 1.0 1.5110e+01 1.0 1.34e+10 1.0 0.0e+00
0.0e+00 0.0e+00  7 60  0  0  0   7 60  0  0  0   888
MatMultAdd          3668 1.0 6.9999e-01 1.0 2.83e+08 1.0 0.0e+00
0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   404
MatSolve             917 1.0 5.3644e-04 1.0 9.17e+02 1.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
MatSOR              7336 1.0 5.7982e+00 1.0 4.62e+09 1.0 0.0e+00
0.0e+00 0.0e+00  3 21  0  0  0   3 21  0  0  0   797
MatLUFactorSym        51 1.0 3.1686e-04 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 5.1e+01  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum        51 1.0 1.2851e-04 1.0 5.10e+01 1.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin    1068 1.0 1.4639e-04 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd      1068 1.0 7.9752e-01 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatSetValues     89511956 1.0 4.6221e+01 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 21  0  0  0  0  21  0  0  0  0     0
MatGetRowIJ           51 1.0 7.7248e-05 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering        51 1.0 5.0688e-04 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+02  0  0  0  0  0   0  0  0  0  1     0
MatZeroEntries       133 1.0 1.7668e-01 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMax                85 1.0 6.9754e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 8.5e+01  0  0  0  0  0   0  0  0  0  0     0
VecDot              5479 1.0 3.5107e-01 1.0 8.84e+08 1.0 0.0e+00
0.0e+00 5.5e+03  0  4  0  0 27   0  4  0  0 31  2517
VecNorm             2937 1.0 1.5535e+00 1.0 4.81e+08 1.0 0.0e+00
0.0e+00 2.9e+03  1  2  0  0 14   1  2  0  0 16   310
VecCopy              955 1.0 8.6689e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              6718 1.0 9.4952e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             6937 1.0 6.1062e-01 1.0 1.15e+09 1.0 0.0e+00
0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  1891
VecAYPX             4483 1.0 1.4344e-01 1.0 1.54e+08 1.0 0.0e+00
0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1073
VecWAXPY            6624 1.0 9.5558e-01 1.0 1.04e+09 1.0 0.0e+00
0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  1089
VecAssemblyBegin    1286 1.0 5.1773e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 3.9e+03  0  0  0  0 19   0  0  0  0 21     0
VecAssemblyEnd      1286 1.0 1.2426e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult    3990 1.0 6.7400e-01 1.0 3.57e+08 1.0 0.0e+00
0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0   529
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver   258            258       218416     0
      Preconditioner   258            258       193856     0
              Matrix   666            666    140921396     0
                 Vec  1383           1382    244199352     0
           Index Set   153            153        81396     0
========================================================================================================================
Average time to get PetscTime(): 0
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Sun May 22 22:30:16 2011
Configure options: --with-x=0 --download-f-blas-lapack=0
--with-blas-lapack-dir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t
--with-mpi=1 --with-mpi-shared=1
--with-mpi-include=/usr/local/encap/hpmpi-8.01/include
--with-mpi-lib="[/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libmpi.so,/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libfmpi.so]"
--with-mpi=1 --download-mpich=no --with-debugging=0
--with-gnu-compilers=no --with-vendor-compilers=intel --with-cc=icc
--with-cxx=icpc --with-fc=ifort --with-shared=1 --with-c++-support
--with-clanguage=C --COPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info" --CXXOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info" --FOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info" --download-scalapack=no --download-blacs=no
--with-blacs=1 --with-blacs-lib=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_lp64.a
--with-blacs-include=/opt/intel/Compiler/11.1/072/mkl/include
--with-scalapack=1
--with-scalapack-lib="[/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_thread.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a]"
--with-scalapack-include=/opt/intel/Compiler/11.1/072/mkl/include
--download-umfpack=1 --download-parmetis=1 --download-superlu_dist=1
--download-mumps=1 --download-ml=1 --with-hypre=1 --download-hypre=yes
-----------------------------------------
Libraries compiled on Thu May 26 16:49:30 EDT 2011 on lagrange
Machine characteristics: Linux lagrange 2.6.39-0.el5.elrepo #1 SMP
PREEMPT Sat May 21 04:48:38 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /home/jfe/local/centos/petsc-3.1-p8
Using PETSc arch: intel-opt
-----------------------------------------
Using C compiler: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
Using Fortran compiler: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info
-----------------------------------------
Using include paths:
-I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
-I/home/jfe/local/centos/petsc-3.1-p8/include
-I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include
-I/usr/local/encap/hpmpi-8.01/include
------------------------------------------
Using C linker: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info
Using Fortran linker: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug
inline_debug_info
Using libraries:
-Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
-L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lpetsc
-Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib
-L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lcmumps -ldmumps
-lsmumps -lzmumps -lmumps_common -lpord
-Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
-L/opt/intel/Compiler/11.1/072/mkl/lib/em64t -lmkl_scalapack_lp64
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_lp64
-lHYPRE -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
-Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
-Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lstdc++ -lml
-lstdc++ -lsuperlu_dist_2.4 -lparmetis -lmetis -lumfpack -lamd
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
-Wl,-rpath,/usr/local/encap/hpmpi-8.01/lib/linux_amd64
-L/usr/local/encap/hpmpi-8.01/lib/linux_amd64 -lmpi -lfmpi -ldl
-Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64
-L/opt/intel/Compiler/11.1/072/lib/intel64
-Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib
-L/opt/intel/Compiler/11.1/072/ipp/em64t/lib
-Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t
-L/opt/intel/Compiler/11.1/072/mkl/lib/em64t
-Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
-L/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -limf -lsvml -lipgo -ldecimal
-lgcc_s -lirc -lirc_s -lifport -lifcore -lm -lpthread -lm -lstdc++
-lstdc++ -ldl -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl
------------------------------------------


More information about the petsc-users mailing list