On Wed, Jun 1, 2011 at 2:53 PM, John Fettig <span dir="ltr"><<a href="mailto:john.fettig@gmail.com">john.fettig@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
What are the recommendations for reducing the cost of inserting values<br>
into an AIJ matrix? In my application (transient finite element<br>
solution of flow and heat, linear elements), this is accounting for up<br>
to 20% of overall runtime. Is this expected?<br></blockquote><div><br></div><div>It looks like the calls are taking 0.5 microseconds, but there are 89G insertions</div><div>against only 13K MatMults that take about 1s apiece.</div>
<div><br></div><div>This seems like an awful lot of insertions into a matrix that can be applied in 1s. The</div><div>one thing I can think of is for you to try something matrix-free. However, this will typically</div><div>
degrade your solver performance.</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
I have double checked that the matrices are preallocated correctly,<br>
and I have set MAT_NEW_NONZERO_ALLOCATION_ERR and it runs without<br>
error.<br>
<br>
The matrices periodically change size/nonzero pattern, but until then<br>
the values are zeroed out and MatSetOption( mat,<br>
MAT_KEEP_NONZERO_PATTERN, PETSC_TRUE); is called.<br>
<br>
The call to MatSetValues happens on a per-element basis on the local<br>
element matrix, so one call per element. I re-activated the<br>
MAT_SetValues event and have included a -log_summary from a short run.<br>
<br>
Thanks,<br>
John<br>
<br>
************************************************************************************************************************<br>
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r<br>
-fCourier9' to print this document ***<br>
************************************************************************************************************************<br>
<br>
---------------------------------------------- PETSc Performance<br>
Summary: ----------------------------------------------<br>
<br>
./test on a intel-opt named lagrange with 1 processor, by jfe Wed Jun<br>
1 15:49:20 2011<br>
Using Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 13:37:48 CDT 2011<br>
<br>
Max Max/Min Avg Total<br>
Time (sec): 2.242e+02 1.00000 2.242e+02<br>
Objects: 2.718e+03 1.00000 2.718e+03<br>
Flops: 2.239e+10 1.00000 2.239e+10 2.239e+10<br>
Flops/sec: 9.986e+07 1.00000 9.986e+07 9.986e+07<br>
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00<br>
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00<br>
MPI Reductions: 2.062e+04 1.00000<br>
<br>
Flop counting convention: 1 flop = 1 real number operation of type<br>
(multiply/divide/add/subtract)<br>
e.g., VecAXPY() for real vectors of length<br>
N --> 2N flops<br>
and VecAXPY() for complex vectors of<br>
length N --> 8N flops<br>
<br>
Summary of Stages: ----- Time ------ ----- Flops ----- ---<br>
Messages --- -- Message Lengths -- -- Reductions --<br>
Avg %Total Avg %Total counts<br>
%Total Avg %Total counts %Total<br>
0: Main Stage: 2.2423e+02 100.0% 2.2391e+10 100.0% 0.000e+00<br>
0.0% 0.000e+00 0.0% 1.796e+04 87.1%<br>
<br>
------------------------------------------------------------------------------------------------------------------------<br>
See the 'Profiling' chapter of the users' manual for details on<br>
interpreting output.<br>
Phase summary info:<br>
Count: number of times phase was executed<br>
Time and Flops: Max - maximum over all processors<br>
Ratio - ratio of maximum to minimum over all processors<br>
Mess: number of messages sent<br>
Avg. len: average message length<br>
Reduct: number of global reductions<br>
Global: entire computation<br>
Stage: stages of a computation. Set stages with PetscLogStagePush()<br>
and PetscLogStagePop().<br>
%T - percent time in this phase %F - percent flops in this phase<br>
%M - percent messages in this phase %L - percent message<br>
lengths in this phase<br>
%R - percent reductions in this phase<br>
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time<br>
over all processors)<br>
------------------------------------------------------------------------------------------------------------------------<br>
Event Count Time (sec) Flops<br>
--- Global --- --- Stage --- Total<br>
Max Ratio Max Ratio Max Ratio Mess Avg<br>
len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
KSPSetup 532 1.0 2.6963e-02 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 2.0e+02 0 0 0 0 1 0 0 0 0 1 0<br>
KSPSolve 277 1.0 3.2930e+01 1.0 2.17e+10 1.0 0.0e+00<br>
0.0e+00 9.9e+03 15 97 0 0 48 15 97 0 0 55 660<br>
PCSetUp 277 1.0 8.3035e+00 1.0 7.52e+08 1.0 0.0e+00<br>
0.0e+00 1.4e+03 4 3 0 0 7 4 3 0 0 8 91<br>
PCApply 4907 1.0 9.1353e+00 1.0 7.08e+09 1.0 0.0e+00<br>
0.0e+00 2.0e+00 4 32 0 0 0 4 32 0 0 0 775<br>
MatMult 12821 1.0 1.5110e+01 1.0 1.34e+10 1.0 0.0e+00<br>
0.0e+00 0.0e+00 7 60 0 0 0 7 60 0 0 0 888<br>
MatMultAdd 3668 1.0 6.9999e-01 1.0 2.83e+08 1.0 0.0e+00<br>
0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 404<br>
MatSolve 917 1.0 5.3644e-04 1.0 9.17e+02 1.0 0.0e+00<br>
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2<br>
MatSOR 7336 1.0 5.7982e+00 1.0 4.62e+09 1.0 0.0e+00<br>
0.0e+00 0.0e+00 3 21 0 0 0 3 21 0 0 0 797<br>
MatLUFactorSym 51 1.0 3.1686e-04 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 5.1e+01 0 0 0 0 0 0 0 0 0 0 0<br>
MatLUFactorNum 51 1.0 1.2851e-04 1.0 5.10e+01 1.0 0.0e+00<br>
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatAssemblyBegin 1068 1.0 1.4639e-04 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatAssemblyEnd 1068 1.0 7.9752e-01 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatSetValues 89511956 1.0 4.6221e+01 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 0.0e+00 21 0 0 0 0 21 0 0 0 0 0<br>
MatGetRowIJ 51 1.0 7.7248e-05 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatGetOrdering 51 1.0 5.0688e-04 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 1.0e+02 0 0 0 0 0 0 0 0 0 1 0<br>
MatZeroEntries 133 1.0 1.7668e-01 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecMax 85 1.0 6.9754e-03 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 8.5e+01 0 0 0 0 0 0 0 0 0 0 0<br>
VecDot 5479 1.0 3.5107e-01 1.0 8.84e+08 1.0 0.0e+00<br>
0.0e+00 5.5e+03 0 4 0 0 27 0 4 0 0 31 2517<br>
VecNorm 2937 1.0 1.5535e+00 1.0 4.81e+08 1.0 0.0e+00<br>
0.0e+00 2.9e+03 1 2 0 0 14 1 2 0 0 16 310<br>
VecCopy 955 1.0 8.6689e-02 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecSet 6718 1.0 9.4952e-02 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecAXPY 6937 1.0 6.1062e-01 1.0 1.15e+09 1.0 0.0e+00<br>
0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1891<br>
VecAYPX 4483 1.0 1.4344e-01 1.0 1.54e+08 1.0 0.0e+00<br>
0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1073<br>
VecWAXPY 6624 1.0 9.5558e-01 1.0 1.04e+09 1.0 0.0e+00<br>
0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1089<br>
VecAssemblyBegin 1286 1.0 5.1773e-03 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 3.9e+03 0 0 0 0 19 0 0 0 0 21 0<br>
VecAssemblyEnd 1286 1.0 1.2426e-03 1.0 0.00e+00 0.0 0.0e+00<br>
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecPointwiseMult 3990 1.0 6.7400e-01 1.0 3.57e+08 1.0 0.0e+00<br>
0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 529<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
Memory usage is given in bytes:<br>
<br>
Object Type Creations Destructions Memory Descendants' Mem.<br>
Reports information only for process 0.<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
Krylov Solver 258 258 218416 0<br>
Preconditioner 258 258 193856 0<br>
Matrix 666 666 140921396 0<br>
Vec 1383 1382 244199352 0<br>
Index Set 153 153 81396 0<br>
========================================================================================================================<br>
Average time to get PetscTime(): 0<br>
#PETSc Option Table entries:<br>
-log_summary<br>
#End of PETSc Option Table entries<br>
Compiled without FORTRAN kernels<br>
Compiled with full precision matrices (default)<br>
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8<br>
sizeof(PetscScalar) 8<br>
Configure run at: Sun May 22 22:30:16 2011<br>
Configure options: --with-x=0 --download-f-blas-lapack=0<br>
--with-blas-lapack-dir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t<br>
--with-mpi=1 --with-mpi-shared=1<br>
--with-mpi-include=/usr/local/encap/hpmpi-8.01/include<br>
--with-mpi-lib="[/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libmpi.so,/usr/local/encap/hpmpi-8.01/lib/linux_amd64/libfmpi.so]"<br>
--with-mpi=1 --download-mpich=no --with-debugging=0<br>
--with-gnu-compilers=no --with-vendor-compilers=intel --with-cc=icc<br>
--with-cxx=icpc --with-fc=ifort --with-shared=1 --with-c++-support<br>
--with-clanguage=C --COPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug<br>
inline_debug_info" --CXXOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug<br>
inline_debug_info" --FOPTFLAGS="-fPIC -O3 -xSSE4.2 -g -debug<br>
inline_debug_info" --download-scalapack=no --download-blacs=no<br>
--with-blacs=1 --with-blacs-lib=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_lp64.a<br>
--with-blacs-include=/opt/intel/Compiler/11.1/072/mkl/include<br>
--with-scalapack=1<br>
--with-scalapack-lib="[/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_thread.a,/opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a]"<br>
--with-scalapack-include=/opt/intel/Compiler/11.1/072/mkl/include<br>
--download-umfpack=1 --download-parmetis=1 --download-superlu_dist=1<br>
--download-mumps=1 --download-ml=1 --with-hypre=1 --download-hypre=yes<br>
-----------------------------------------<br>
Libraries compiled on Thu May 26 16:49:30 EDT 2011 on lagrange<br>
Machine characteristics: Linux lagrange 2.6.39-0.el5.elrepo #1 SMP<br>
PREEMPT Sat May 21 04:48:38 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux<br>
Using PETSc directory: /home/jfe/local/centos/petsc-3.1-p8<br>
Using PETSc arch: intel-opt<br>
-----------------------------------------<br>
Using C compiler: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info<br>
Using Fortran compiler: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug<br>
inline_debug_info<br>
-----------------------------------------<br>
Using include paths:<br>
-I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include<br>
-I/home/jfe/local/centos/petsc-3.1-p8/include<br>
-I/home/jfe/local/centos/petsc-3.1-p8/intel-opt/include<br>
-I/usr/local/encap/hpmpi-8.01/include<br>
------------------------------------------<br>
Using C linker: icc -fPIC -fPIC -O3 -xSSE4.2 -g -debug inline_debug_info<br>
Using Fortran linker: ifort -fPIC -fPIC -O3 -xSSE4.2 -g -debug<br>
inline_debug_info<br>
Using libraries:<br>
-Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib<br>
-L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lpetsc<br>
-Wl,-rpath,/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib<br>
-L/home/jfe/local/centos/petsc-3.1-p8/intel-opt/lib -lcmumps -ldmumps<br>
-lsmumps -lzmumps -lmumps_common -lpord<br>
-Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t<br>
-L/opt/intel/Compiler/11.1/072/mkl/lib/em64t -lmkl_scalapack_lp64<br>
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_lp64<br>
-lHYPRE -Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64<br>
-Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib<br>
-Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib<br>
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -lstdc++ -lml<br>
-lstdc++ -lsuperlu_dist_2.4 -lparmetis -lmetis -lumfpack -lamd<br>
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread<br>
-Wl,-rpath,/usr/local/encap/hpmpi-8.01/lib/linux_amd64<br>
-L/usr/local/encap/hpmpi-8.01/lib/linux_amd64 -lmpi -lfmpi -ldl<br>
-Wl,-rpath,/opt/intel/Compiler/11.1/072/lib/intel64<br>
-L/opt/intel/Compiler/11.1/072/lib/intel64<br>
-Wl,-rpath,/opt/intel/Compiler/11.1/072/ipp/em64t/lib<br>
-L/opt/intel/Compiler/11.1/072/ipp/em64t/lib<br>
-Wl,-rpath,/opt/intel/Compiler/11.1/072/mkl/lib/em64t<br>
-L/opt/intel/Compiler/11.1/072/mkl/lib/em64t<br>
-Wl,-rpath,/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib<br>
-L/opt/intel/Compiler/11.1/072/tbb/intel64/cc4.1.0_libc2.4_kernel2.6.16.21/lib<br>
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2<br>
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -limf -lsvml -lipgo -ldecimal<br>
-lgcc_s -lirc -lirc_s -lifport -lifcore -lm -lpthread -lm -lstdc++<br>
-lstdc++ -ldl -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lirc_s -ldl<br>
------------------------------------------<br>
</blockquote></div><br><br clear="all"><br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener<br>