On Wed, Jun 1, 2011 at 2:53 PM, John Fettig <span dir="ltr">&lt;<a href="mailto:john.fettig@gmail.com">john.fettig@gmail.com</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

What are the recommendations for reducing the cost of inserting values<br>

into an AIJ matrix?  In my application (transient finite element<br>

solution of flow and heat, linear elements), this is accounting for up<br>

to 20% of overall runtime.  Is this expected?<br></blockquote><div><br></div><div>It looks like the calls are taking 0.5 microseconds, but there are 89G insertions</div><div>against only 13K MatMults that take about 1s apiece.</div>

<div><br></div><div>This seems like an awful lot of insertions into a matrix that can be applied in 1s. The</div><div>one thing I can think of is for you to try something matrix-free. However, this will typically</div><div>

degrade your solver performance.</div><div><br></div><div>   Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

I have double checked that the matrices are preallocated correctly,<br>

and I have set MAT_NEW_NONZERO_ALLOCATION_ERR and it runs without<br>

error.<br>

<br>

The matrices periodically change size/nonzero pattern, but until then<br>

the values are zeroed out and MatSetOption( mat,<br>

MAT_KEEP_NONZERO_PATTERN, PETSC_TRUE); is called.<br>

<br>

The call to MatSetValues happens on a per-element basis on the local<br>

element matrix, so one call per element.  I re-activated the<br>

MAT_SetValues event and have included a -log_summary from a short run.<br>

<br>

Thanks,<br>

John<br>

<br>

************************************************************************************************************************<br>

***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use &#39;enscript -r<br>

-fCourier9&#39; to print this document            ***<br>

************************************************************************************************************************<br>

<br>

---------------------------------------------- PETSc Performance<br>

Summary: ----------------------------------------------<br>

<br>

./test on a intel-opt named lagrange with 1 processor, by jfe Wed Jun<br>

1 15:49:20 2011<br>

Using Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 13:37:48 CDT 2011<br>

<br>

                         Max       Max/Min        Avg      Total<br>

Time (sec):           2.242e+02      1.00000   2.242e+02<br>

Objects:              2.718e+03      1.00000   2.718e+03<br>

Flops:                2.239e+10      1.00000   2.239e+10  2.239e+10<br>

Flops/sec:            9.986e+07      1.00000   9.986e+07  9.986e+07<br>

MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00<br>

MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00<br>

MPI Reductions:       2.062e+04      1.00000<br>

<br>

Flop counting convention: 1 flop = 1 real number operation of type<br>

(multiply/divide/add/subtract)<br>

                            e.g., VecAXPY() for real vectors of length<br>

N --&gt; 2N flops<br>

                            and VecAXPY() for complex vectors of<br>

length N --&gt; 8N flops<br>

<br>

Summary of Stages:   ----- Time ------  ----- Flops -----  ---<br>

Messages ---  -- Message Lengths --  -- Reductions --<br>

                        Avg     %Total     Avg     %Total   counts<br>

%Total     Avg         %Total   counts   %Total<br>

 0:      Main Stage: 2.2423e+02 100.0%  2.2391e+10 100.0%  0.000e+00<br>

0.0%  0.000e+00        0.0%  1.796e+04  87.1%<br>

<br>

------------------------------------------------------------------------------------------------------------------------<br>

See the &#39;Profiling&#39; chapter of the users&#39; manual for details on<br>

interpreting output.<br>

Phase summary info:<br>

   Count: number of times phase was executed<br>

   Time and Flops: Max - maximum over all processors<br>

                   Ratio - ratio of maximum to minimum over all processors<br>

   Mess: number of messages sent<br>

   Avg. len: average message length<br>

   Reduct: number of global reductions<br>

   Global: entire computation<br>

   Stage: stages of a computation. Set stages with PetscLogStagePush()<br>

and PetscLogStagePop().<br>

      %T - percent time in this phase         %F - percent flops in this phase<br>

      %M - percent messages in this phase     %L - percent message<br>

lengths in this phase<br>

      %R - percent reductions in this phase<br>

   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time<br>

over all processors)<br>

------------------------------------------------------------------------------------------------------------------------<br>

Event                Count      Time (sec)     Flops<br>

          --- Global ---  --- Stage ---   Total<br>

                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg<br>

len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s<br>

------------------------------------------------------------------------------------------------------------------------<br>

<br>

--- Event Stage 0: Main Stage<br>

<br>

KSPSetup             532 1.0 2.6963e-02 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 2.0e+02  0  0  0  0  1   0  0  0  0  1     0<br>

KSPSolve             277 1.0 3.2930e+01 1.0 2.17e+10 1.0 0.0e+00<br>

0.0e+00 9.9e+03 15 97  0  0 48  15 97  0  0 55   660<br>

PCSetUp              277 1.0 8.3035e+00 1.0 7.52e+08 1.0 0.0e+00<br>

0.0e+00 1.4e+03  4  3  0  0  7   4  3  0  0  8    91<br>

PCApply             4907 1.0 9.1353e+00 1.0 7.08e+09 1.0 0.0e+00<br>

0.0e+00 2.0e+00  4 32  0  0  0   4 32  0  0  0   775<br>

MatMult            12821 1.0 1.5110e+01 1.0 1.34e+10 1.0 0.0e+00<br>

0.0e+00 0.0e+00  7 60  0  0  0   7 60  0  0  0   888<br>

MatMultAdd          3668 1.0 6.9999e-01 1.0 2.83e+08 1.0 0.0e+00<br>

0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0   404<br>

MatSolve             917 1.0 5.3644e-04 1.0 9.17e+02 1.0 0.0e+00<br>

0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     2<br>

MatSOR              7336 1.0 5.7982e+00 1.0 4.62e+09 1.0 0.0e+00<br>

0.0e+00 0.0e+00  3 21  0  0  0   3 21  0  0  0   797<br>

MatLUFactorSym        51 1.0 3.1686e-04 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 5.1e+01  0  0  0  0  0   0  0  0  0  0     0<br>

MatLUFactorNum        51 1.0 1.2851e-04 1.0 5.10e+01 1.0 0.0e+00<br>

0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

MatAssemblyBegin    1068 1.0 1.4639e-04 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

MatAssemblyEnd      1068 1.0 7.9752e-01 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

MatSetValues     89511956 1.0 4.6221e+01 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 0.0e+00 21  0  0  0  0  21  0  0  0  0     0<br>

MatGetRowIJ           51 1.0 7.7248e-05 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

MatGetOrdering        51 1.0 5.0688e-04 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 1.0e+02  0  0  0  0  0   0  0  0  0  1     0<br>

MatZeroEntries       133 1.0 1.7668e-01 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

VecMax                85 1.0 6.9754e-03 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 8.5e+01  0  0  0  0  0   0  0  0  0  0     0<br>

VecDot              5479 1.0 3.5107e-01 1.0 8.84e+08 1.0 0.0e+00<br>

0.0e+00 5.5e+03  0  4  0  0 27   0  4  0  0 31  2517<br>

VecNorm             2937 1.0 1.5535e+00 1.0 4.81e+08 1.0 0.0e+00<br>

0.0e+00 2.9e+03  1  2  0  0 14   1  2  0  0 16   310<br>

VecCopy              955 1.0 8.6689e-02 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

VecSet              6718 1.0 9.4952e-02 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

VecAXPY             6937 1.0 6.1062e-01 1.0 1.15e+09 1.0 0.0e+00<br>

0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  1891<br>

VecAYPX             4483 1.0 1.4344e-01 1.0 1.54e+08 1.0 0.0e+00<br>

0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1073<br>

VecWAXPY            6624 1.0 9.5558e-01 1.0 1.04e+09 1.0 0.0e+00<br>

0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  1089<br>

VecAssemblyBegin    1286 1.0 5.1773e-03 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 3.9e+03  0  0  0  0 19   0  0  0  0 21     0<br>

VecAssemblyEnd      1286 1.0 1.2426e-03 1.0 0.00e+00 0.0 0.0e+00<br>

0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

VecPointwiseMult    3990 1.0 6.7400e-01 1.0 3.57e+08 1.0 0.0e+00<br>

0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0   529<br>

------------------------------------------------------------------------------------------------------------------------<br>

<br>

Memory usage is given in bytes:<br>

<br>

Object Type          Creations   Destructions     Memory  Descendants&#39; Mem.<br>

Reports information only for process 0.<br>

<br>

--- Event Stage 0: Main Stage<br>

<br>

       Krylov Solver   258            258       218416     0<br>

      Preconditioner   258            258       193856     0<br>

              Matrix   666            666    140921396     0<br>

                 Vec  1383           1382    244199352     0<br>

           Index Set   153            153        81396     0<br>

========================================================================================================================<br>

Average time to get PetscTime(): 0<br>

#PETSc Option Table entries:<br>

-log_summary<br>

#End of PETSc Option Table entries<br>

Compiled without FORTRAN kernels<br>

Compiled with full precision matrices (default)<br>

sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8<br>

sizeof(PetscScalar) 8<br>

Configure run at: Sun May 22 22:30:16 2011<br>

Configure options: --with-x=0 --download-f-blas-lapack=0<br>

--with-blas-lapack-dir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t<br>

--with-mpi=1 --with-mpi-shared=1<br>

--with-mpi-include=/usr/local/encap/hpmpi-8.01/include<br>

--with-mpi-lib=&quot;[/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libmpi.so,/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libfmpi.so]&quot;<br>

--with-mpi=1 --download-mpich=no --with-debugging=0<br>

--with-gnu-compilers=no --with-vendor-compilers=intel --with-cc=icc<br>

--with-cxx=icpc --with-fc=ifort --with-shared=1 --with-c++-support<br>

--with-clanguage=C --COPTFLAGS=&quot;-fPIC -O3 -xSSE4.2 -g -debug<br>

inline_debug_info&quot; --CXXOPTFLAGS=&quot;-fPIC -O3 -xSSE4.2 -g -debug<br>

inline_debug_info&quot; --FOPTFLAGS=&quot;-fPIC -O3 -xSSE4.2 -g -debug<br>

inline_debug_info&quot; --download-scalapack=no --download-blacs=no<br>

--with-blacs=1 --with-blacs-lib=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_lp64.a<br>

--with-blacs-include=/opt/intel/Compiler/11.1/072/mkl/include<br>

--with-scalapack=1<br>

--with-scalapack-lib=&quot;[/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_thread.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a]&quot;<br>


--with-scalapack-include=/opt/intel/Compiler/11.1/072/mkl/include<br>

--download-umfpack=1 --download-parmetis=1 --download-superlu_dist=1<br>

--download-mumps=1 --download-ml=1 --with-hypre=1 --download-hypre=yes<br>

-----------------------------------------<br>

Libraries compiled on Thu May 26 16:49:30 EDT 2011 on lagrange<br>

Machine characteristics: Linux lagrange 2.6.39-0.el5.elrepo #1 SMP<br>

PREEMPT Sat May 21 04:48:38 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux<br>

Using PETSc directory: /home/jfe/local/centos/petsc-3.1-p8<br>

Using PETSc arch: intel-opt<br>

-----------------------------------------<br>

Using C compiler: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info<br>

Using Fortran compiler: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug<br>

inline_debug_info<br>

-----------------------------------------<br>

Using include paths:<br>

-I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include<br>

-I/home/jfe/local/centos/petsc-3.1-p8/include<br>

-I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include<br>

-I/usr/local/encap/hpmpi-8.01/include<br>

------------------------------------------<br>

Using C linker: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info<br>

Using Fortran linker: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug<br>

inline_debug_info<br>

Using libraries:<br>

-Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib<br>

-L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lpetsc<br>

-Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib<br>

-L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lcmumps -ldmumps<br>

-lsmumps -lzmumps -lmumps_common -lpord<br>

-Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t<br>

-L/opt/intel/Compiler/11.1/072/mkl/lib/em64t -lmkl_scalapack_lp64<br>

-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_lp64<br>

-lHYPRE -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64<br>

-Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib<br>

-Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib<br>

-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lstdc++ -lml<br>

-lstdc++ -lsuperlu_dist_2.4 -lparmetis -lmetis -lumfpack -lamd<br>

-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread<br>

-Wl,-rpath,/usr/local/encap/hpmpi-8.01/lib/linux_amd64<br>

-L/usr/local/encap/hpmpi-8.01/lib/linux_amd64 -lmpi -lfmpi -ldl<br>

-Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64<br>

-L/opt/intel/Compiler/11.1/072/lib/intel64<br>

-Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib<br>

-L/opt/intel/Compiler/11.1/072/ipp/em64t/lib<br>

-Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t<br>

-L/opt/intel/Compiler/11.1/072/mkl/lib/em64t<br>

-Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib<br>

-L/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib<br>

-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2<br>

-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -limf -lsvml -lipgo -ldecimal<br>

-lgcc_s -lirc -lirc_s -lifport -lifcore -lm -lpthread -lm -lstdc++<br>

-lstdc++ -ldl -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl<br>

------------------------------------------<br>

</blockquote></div><br><br clear="all"><br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener<br>