[petsc-users] MatPtAP

David Knezevic david.knezevic at akselos.com
Tue Feb 23 21:35:18 CST 2016

I'm using MatPtAP, which works well for me, but in some examples I've
tested the PtAP calculation dominates the overall solve time (e.g. see
attached -log_summary output).

In my case, A is a stiffness matrix, and P is the identity matrix except
for a small number of columns (e.g. about 10 or so) which are dense.

In this situation, I was wondering if there is a more efficient way to
proceed than using MatPtAP? For example, would it be noticeably faster to
calculate P^T A P directly using MatMults for the dense columns, rather
than using MatPtAP?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160223/bb3f3f2c/attachment.html>
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Tue Feb 23 14:17:28 2016
Using Petsc Release Version 3.6.1, Jul, 22, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.214e+02      1.00000   1.214e+02
Objects:              4.150e+02      1.00000   4.150e+02
Flops:                1.294e+09      1.00000   1.294e+09  1.294e+09
Flops/sec:            1.066e+07      1.00000   1.066e+07  1.066e+07
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.2142e+02 100.0%  1.2945e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

VecNorm               15 1.0 2.5439e-04 1.0 1.12e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4384
VecCopy              102 1.0 2.6283e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               315 1.0 3.1855e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              104 1.0 3.3436e-03 1.0 1.25e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   374
VecWAXPY               7 1.0 4.2009e-04 1.0 2.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   620
VecAssemblyBegin     368 1.0 1.2109e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd       368 1.0 2.5558e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      106 1.0 8.7023e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceArith        21 1.0 5.3144e-04 1.0 1.56e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2940
VecReduceComm          7 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMultAdd            41 1.0 3.2815e-02 1.0 5.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  1824
MatMultTrAdd           9 1.0 6.2582e-03 1.0 8.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1390
MatSolve              19 1.0 7.3646e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatCholFctrSym         2 1.0 8.0401e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatCholFctrNum         8 1.0 1.3437e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11  0  0  0  0  11  0  0  0  0     0
MatAssemblyBegin     117 1.0 2.6464e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       117 1.0 1.7539e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetValues          32 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         2 1.0 7.9417e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        48 1.0 5.5495e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                7 1.0 2.9074e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                8 1.0 8.5691e+01 1.0 1.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 71 94  0  0  0  71 94  0  0  0    14
MatPtAPSymbolic        1 1.0 2.2405e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  18  0  0  0  0     0
MatPtAPNumeric         8 1.0 6.3285e+01 1.0 1.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 52 94  0  0  0  52 94  0  0  0    19
MatGetSymTrans         1 1.0 1.6050e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp              19 1.0 8.3447e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              19 1.0 1.4982e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  12  0  0  0  0     0
PCSetUp                8 1.0 1.4245e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  12  0  0  0  0     0
PCApply               19 1.0 7.3649e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
SNESSolve              2 1.0 6.9693e+01 1.0 1.14e+09 1.0 0.0e+00 0.0e+00 0.0e+00 57 88  0  0  0  57 88  0  0  0    16
SNESFunctionEval       9 1.0 1.1762e+00 1.0 6.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0    53
SNESJacobianEval       7 1.0 5.5601e+01 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 46 83  0  0  0  46 83  0  0  0    19
SNESLineSearch         7 1.0 1.6607e-01 1.0 5.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0   328

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   254            254     29415312     0
      Vector Scatter    41             41        26896     0
           Index Set    65             65       744616     0
   IS L to G Mapping    16             16       775904     0
              Matrix    22             22    149945168     0
       Krylov Solver     2              2         2448     0
     DMKSP interface     1              1          648     0
      Preconditioner     2              2         2136     0
                SNES     1              1         1332     0
      SNESLineSearch     1              1          856     0
              DMSNES     1              1          664     0
    Distributed Mesh     2              2         8992     0
Star Forest Bipartite Graph     4              4         3200     0
     Discrete System     2              2         1696     0
              Viewer     1              0            0     0
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-JSON_INIT /home/dknez/akselos-dev/data/instance/workers/fe_solver/810b60d7534448ce8ef67eb6a5e2267e/json_init.json
-JSON_INPUT /home/dknez/akselos-dev/data/instance/workers/fe_solver/810b60d7534448ce8ef67eb6a5e2267e/json_input.json
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-shared-libraries=1 --with-debugging=0 --download-suitesparse --download-parmetis --download-blacs --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps --download-metis --download-superlu_dist --prefix=/home/dknez/software/libmesh_install/opt_real/petsc --download-hypre --download-ml
Libraries compiled on Thu Aug 13 16:37:37 2015 on david-Lenovo 
Machine characteristics: Linux-3.13.0-61-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /home/dknez/software/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O   ${FOPTFLAGS} ${FFLAGS} 

Using include paths: -I/home/dknez/software/petsc-3.6.1/arch-linux2-c-opt/include -I/home/dknez/software/petsc-3.6.1/include -I/home/dknez/software/petsc-3.6.1/include -I/home/dknez/software/petsc-3.6.1/arch-linux2-c-opt/include -I/home/dknez/software/libmesh_install/opt_real/petsc/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/dknez/software/petsc-3.6.1/arch-linux2-c-opt/lib -L/home/dknez/software/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lsuperlu_dist_4.0 -lHYPRE -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lml -lmpi_cxx -lstdc++ -Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lparmetis -lmetis -lm -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lmpi -lhwloc -lgcc_s -lpthread -ldl 

More information about the petsc-users mailing list