[petsc-users] MatPtAP
David Knezevic
david.knezevic at akselos.com
Tue Feb 23 21:35:18 CST 2016
I'm using MatPtAP, which works well for me, but in some examples I've
tested the PtAP calculation dominates the overall solve time (e.g. see
attached -log_summary output).
In my case, A is a stiffness matrix, and P is the identity matrix except
for a small number of columns (e.g. about 10 or so) which are dense.
In this situation, I was wondering if there is a more efficient way to
proceed than using MatPtAP? For example, would it be noticeably faster to
calculate P^T A P directly using MatMults for the dense columns, rather
than using MatPtAP?
Thanks!
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160223/bb3f3f2c/attachment.html>
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Tue Feb 23 14:17:28 2016
Using Petsc Release Version 3.6.1, Jul, 22, 2015
Max Max/Min Avg Total
Time (sec): 1.214e+02 1.00000 1.214e+02
Objects: 4.150e+02 1.00000 4.150e+02
Flops: 1.294e+09 1.00000 1.294e+09 1.294e+09
Flops/sec: 1.066e+07 1.00000 1.066e+07 1.066e+07
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.2142e+02 100.0% 1.2945e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecNorm 15 1.0 2.5439e-04 1.0 1.12e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4384
VecCopy 102 1.0 2.6283e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 315 1.0 3.1855e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 104 1.0 3.3436e-03 1.0 1.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 374
VecWAXPY 7 1.0 4.2009e-04 1.0 2.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 620
VecAssemblyBegin 368 1.0 1.2109e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 368 1.0 2.5558e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 106 1.0 8.7023e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 21 1.0 5.3144e-04 1.0 1.56e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2940
VecReduceComm 7 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMultAdd 41 1.0 3.2815e-02 1.0 5.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1824
MatMultTrAdd 9 1.0 6.2582e-03 1.0 8.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1390
MatSolve 19 1.0 7.3646e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatCholFctrSym 2 1.0 8.0401e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatCholFctrNum 8 1.0 1.3437e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11 0 0 0 0 11 0 0 0 0 0
MatAssemblyBegin 117 1.0 2.6464e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 117 1.0 1.7539e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetValues 32 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 2 1.0 7.9417e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 48 1.0 5.5495e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 7 1.0 2.9074e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 8 1.0 8.5691e+01 1.0 1.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 71 94 0 0 0 71 94 0 0 0 14
MatPtAPSymbolic 1 1.0 2.2405e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18 0 0 0 0 18 0 0 0 0 0
MatPtAPNumeric 8 1.0 6.3285e+01 1.0 1.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 52 94 0 0 0 52 94 0 0 0 19
MatGetSymTrans 1 1.0 1.6050e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 19 1.0 8.3447e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 19 1.0 1.4982e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
PCSetUp 8 1.0 1.4245e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
PCApply 19 1.0 7.3649e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
SNESSolve 2 1.0 6.9693e+01 1.0 1.14e+09 1.0 0.0e+00 0.0e+00 0.0e+00 57 88 0 0 0 57 88 0 0 0 16
SNESFunctionEval 9 1.0 1.1762e+00 1.0 6.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 53
SNESJacobianEval 7 1.0 5.5601e+01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 46 83 0 0 0 46 83 0 0 0 19
SNESLineSearch 7 1.0 1.6607e-01 1.0 5.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 328
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 254 254 29415312 0
Vector Scatter 41 41 26896 0
Index Set 65 65 744616 0
IS L to G Mapping 16 16 775904 0
Matrix 22 22 149945168 0
Krylov Solver 2 2 2448 0
DMKSP interface 1 1 648 0
Preconditioner 2 2 2136 0
SNES 1 1 1332 0
SNESLineSearch 1 1 856 0
DMSNES 1 1 664 0
Distributed Mesh 2 2 8992 0
Star Forest Bipartite Graph 4 4 3200 0
Discrete System 2 2 1696 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-JSON_INIT /home/dknez/akselos-dev/data/instance/workers/fe_solver/810b60d7534448ce8ef67eb6a5e2267e/json_init.json
-JSON_INPUT /home/dknez/akselos-dev/data/instance/workers/fe_solver/810b60d7534448ce8ef67eb6a5e2267e/json_input.json
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-shared-libraries=1 --with-debugging=0 --download-suitesparse --download-parmetis --download-blacs --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps --download-metis --download-superlu_dist --prefix=/home/dknez/software/libmesh_install/opt_real/petsc --download-hypre --download-ml
-----------------------------------------
Libraries compiled on Thu Aug 13 16:37:37 2015 on david-Lenovo
Machine characteristics: Linux-3.13.0-61-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /home/dknez/software/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/dknez/software/petsc-3.6.1/arch-linux2-c-opt/include -I/home/dknez/software/petsc-3.6.1/include -I/home/dknez/software/petsc-3.6.1/include -I/home/dknez/software/petsc-3.6.1/arch-linux2-c-opt/include -I/home/dknez/software/libmesh_install/opt_real/petsc/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/dknez/software/petsc-3.6.1/arch-linux2-c-opt/lib -L/home/dknez/software/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lsuperlu_dist_4.0 -lHYPRE -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lml -lmpi_cxx -lstdc++ -Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lparmetis -lmetis -lm -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl
-----------------------------------------
More information about the petsc-users
mailing list