[petsc-users] MatPtAP

Samuel Lanthaler s.lanthaler at gmail.com
Fri Jun 1 11:11:32 CDT 2018


On 06/01/2018 04:55 PM, Smith, Barry F. wrote:
>     How do you know it is exactly the MatPtAP() routine that is taking the large time?

At least according to a simple timing (CALL CPU_TIME(...)), the call to 
MatPtAP appears to consume more time than the actual discretization of 
the PDE. Let me attach the output of -log_view.

>
>     Please send the output of a run with -log_view so we can see where the time is being spent.
>
>     Barry
>
>
>> On Jun 1, 2018, at 8:21 AM, Samuel Lanthaler <s.lanthaler at gmail.com> wrote:
>>
>> Hi,
>>
>> I was wondering what the most efficient way to use MatPtAP would be in the following situation: I am discretizing a PDE system. The discretization yields a matrix A that has a band structure (with k upper and lower bands, say). In order to implement the boundary conditions, I use a transformation matrix P which is essentially the unit matrix, except for the entries P_{ij} where i,j<k and n-i,n-j<k, so
>>
>> P =  [ B, 0, 0, 0, ..., 0, 0 ]
>>        [  0, 1, 0, 0, ..., 0, 0 ]
>>        [                              ]
>>        [                              ]
>>        [                  ..., 1, 0 ]
>>        [  0, 0, 0, 0, ..., 0, C ]
>>
>> with B,C are (k-by-k) matrices.
>> Right now, I'm simply constructing A, P and calling
>>
>>     CALL MatPtAP(petsc_matA,petsc_matP,MAT_INITIAL_MATRIX,PETSC_DEFAULT_REAL,petsc_matPtAP,ierr)
>>
>> where I haven't done anything to pestc_matPtAP, prior to this call. Is this the way to do it?
>>
>> I'm asking because, currently, setting up the matrices A and P takes very little time, whereas the operation MatPtAP is taking quite long, which seems very odd... The matrices are of type MPIAIJ. In my problem, the total matrix dimension is around 10'000 and the matrix blocks (B,C) are of size ~100.
>>
>> Thanks in advance for any ideas.
>>
>> Cheers,
>> Samuel

-------------- next part --------------
 *****  VENUS-LEVIS: rev Unversioned directory         *****
 ------------------------------
  DELTA-F COMPUTATION FOR 
    HYBRID KINETIC-MHD    
 ------------------------------
 ******************************
 Cleaning sub-directories and creating diagnostic tree
 -------------------
 ..... reading simulation data
 ..... reading equilibrium data
 ..... reading IO specification
 synchronize nparts,nprocs!
 nparts must be multiple of nprocs
 nparts,nprocs           1           4
 Resetting values to satisfy constraint:
 nparts,nprocs           4           4
 ..... initializing magnetic equilibrium
 Loading SFL (STRAIGHT-FIELD LINE) equilibrium
 - Plasma volume   197.391538791518     
  ---------------------------------------------- 
 | magnetic equilibrium:                         
 | --------------------------------------------- 
 | B0 =   0.984245824471487     
 | R0 =    10.1237829388870     
 | alfven velocity  =    4809804.25531587     
 | alfven frequency =    475099.504241706     
  ---------------------------------------------- 
 ----------------------------
 Profiles related to flow (on axis):
 R0:             10.1237829388870     
 mOmp:           2.00000000000000     
 T0 [keV]:       1.10111199122785     
 Omega0 [Hz]:   0.000000000000000E+000
 Mach0:         0.000000000000000E+000
 ----------------------------
 ..... initializing analytic distributions
 ..... bcasting analytic profiles data
 ..... calling test_analytic_profiles
 ..... reading equilibrium distribution data
 ..... bcasting equilibrium distribution data
 - TAE updaters
 ..... dumping analytic distribution; Ne,Np,Nmu          100         100
           5
 - TAE updaters
 ..... initializing delta-f computation
 ..... reading delta-f grid data
 ..... bcasting delta-f grid data
 -------------
 WARNING: Using df_coll_freq to set ftime%dE_prefix.
 ftime%dE_prefix (original):  (0.000000000000000E+000,0.000000000000000E+000)
 ftime%dE_prefix (set):       (4.75099504241706,237.549752120853)
 -------------
 ftime%grate =    4.75099504241706     
 ftime%freq =    237.549752120853     
 ftime%dE_prefix =  (4.75099504241706,237.549752120853)
 -----------------------------
 ftoro%n  =            1
 ftoro%dE_prefix =  (0.000000000000000E+000,-1.00000000000000)
 *****   STARTING SIMULATION   *****
 over            4  processors
 with            4  orbits
 ******************************
 initializing from MHD in
 ... setting type
 PDE.f90--> Chosen case: 
 *********************   MHDflow
 ... initializing elements
 initializing with bunching on rational surfaces:
 q =    1.00000000000000     
 q =    2.00000000000000     
 ------
 --------------------
 q = qvals found for:
 q =    1.00001265158867         at s =   0.272160000000131     
 q =    2.00000720104897         at s =   0.666679999999601     
 --------------------
 Nrad =          180
 mmin =           -3
 mmax =            5
 ntor =           -1
 test_mode =            7
 Initializing operator
 setting natural units in MHDflow            8
 setting natural units in MHDflow            8
 setting natural units in MHDflow            8
 setting natural units in MHDflow            8
 Discretizing operator
 Discretization in progress (computation of matrices)
 Assembly of matA,matB done.
 Accounting for use of mixed finite elements.
 Add boundary conditions
 Adding boundary conditions: Neumann/Dirichlet
 ... setting up BC
 ... Setting up R/T (transformation) matrices
 ... Applying BC to A/B
 ... A->RtAR, B->RtBR
 -----------------------------------
 time MatPtAP:               17.3900000000000     
 time MatZeroRowsColumns:   0.670999999999999     
 time MatAssembly:          4.000000000001336E-003
 -------------------------
 -----------------------------------
 time MatPtAP:               16.3570000000000     
 time MatZeroRowsColumns:   0.631999999999998     
 time MatAssembly:          3.000000000000114E-003
 -------------------------
 ... writing out matrices in boundary conditions...
 destroying local objects
 time set BC:    43.9070000000000     
 Discretization done.
 writing out matrices
 Writing matA/matB to: PDE/
 writing out operator options
 ***** WRITING AND UNLOADING MEMORY *****
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./testing.x on a  named spcpc340 with 4 processors, by lanthale Fri Jun  1 17:55:08 2018
Using Petsc Release Version 3.9.1, Apr, 29, 2018 

                         Max       Max/Min        Avg      Total 
Time (sec):           4.793e+01      1.00000   4.793e+01
Objects:              7.980e+02      1.00000   7.980e+02
Flop:                 1.549e+10      1.05057   1.512e+10  6.048e+10
Flop/sec:            3.232e+08      1.05057   3.154e+08  1.262e+09
MPI Messages:         1.176e+03      1.34554   1.020e+03  4.078e+03
MPI Message Lengths:  5.629e+08      2.85940   2.993e+05  1.221e+09
MPI Reductions:       8.430e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 4.7933e+01 100.0%  6.0476e+10 100.0%  4.078e+03 100.0%  2.993e+05      100.0%  8.330e+02  98.8% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided        234 1.0 5.9497e-03 1.1 0.00e+00 0.0 4.6e+02 4.0e+00 0.0e+00  0  0 11  0  0   0  0 11  0  0     0
BuildTwoSidedF        18 1.0 2.7772e+00 7.1 0.00e+00 0.0 8.0e+01 9.2e+05 0.0e+00  4  0  2  6  0   4  0  2  6  0     0
VecView                2 1.0 3.5000e-04 1.1 0.00e+00 0.0 6.0e+00 2.6e+04 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               243 1.0 5.0569e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin       3 1.0 1.3790e-03 1.6 0.00e+00 0.0 6.0e+01 1.6e+04 0.0e+00  0  0  1  0  0   0  0  1  0  0     0
VecAssemblyEnd         3 1.0 7.1049e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      235 1.0 2.3565e-03 1.5 0.00e+00 0.0 1.4e+03 5.1e+03 0.0e+00  0  0 35  1  0   0  0 35  1  0     0
VecScatterEnd        235 1.0 1.8451e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatConvert             1 1.0 1.3385e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale               6 1.0 4.4126e-02 1.1 1.14e+07 1.0 6.0e+00 2.3e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0  1026
MatAssemblyBegin      35 1.0 3.1616e+00 2.2 0.00e+00 0.0 2.0e+01 3.7e+06 0.0e+00  5  0  0  6  0   5  0  0  6  0     0
MatAssemblyEnd        35 1.0 3.3615e-01 1.0 0.00e+00 0.0 7.2e+01 1.1e+03 4.8e+01  1  0  2  0  6   1  0  2  0  6     0
MatZeroEntries         2 1.0 3.1345e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                8 1.0 1.8869e+00 1.0 0.00e+00 0.0 1.2e+02 9.0e+06 2.4e+01  4  0  3 88  3   4  0  3 88  3     0
MatMatMult             1 1.0 7.2594e+00 1.0 2.21e+09 1.0 3.6e+01 5.6e+05 1.2e+01 15 14  1  2  1  15 14  1  2  1  1195
MatMatMultSym          1 1.0 5.2231e+00 1.0 0.00e+00 0.0 3.0e+01 4.0e+05 1.2e+01 11  0  1  1  1  11  0  1  1  1     0
MatMatMultNum          1 1.0 2.0366e+00 1.0 2.21e+09 1.0 6.0e+00 1.3e+06 0.0e+00  4 14  0  1  0   4 14  0  1  0  4258
MatPtAP                2 1.0 3.3743e+01 1.0 1.33e+10 1.1 1.1e+02 1.0e+06 3.0e+01 70 86  3  9  4  70 86  3  9  4  1534
MatPtAPSymbolic        2 1.0 1.7436e+01 1.0 0.00e+00 0.0 7.2e+01 5.9e+05 1.4e+01 36  0  2  3  2  36  0  2  3  2     0
MatPtAPNumeric         2 1.0 1.6306e+01 1.0 1.33e+10 1.1 4.2e+01 1.7e+06 1.6e+01 34 86  1  6  2  34 86  1  6  2  3174
MatGetLocalMat         4 1.0 5.9245e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          4 1.0 1.8033e-02 1.4 0.00e+00 0.0 6.0e+01 7.4e+05 0.0e+00  0  0  1  4  0   0  0  1  4  0     0
SFSetGraph           234 1.0 5.5790e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp              234 1.0 7.6420e-03 1.1 0.00e+00 0.0 1.4e+03 4.0e+00 0.0e+00  0  0 34  0  0   0  0 34  0  0     0
SFReduceBegin        234 1.0 1.0035e-03 1.3 0.00e+00 0.0 9.2e+02 4.0e+00 0.0e+00  0  0 23  0  0   0  0 23  0  0     0
SFReduceEnd          234 1.0 4.1008e-04 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   487            485      8328944     0.
              Matrix    37             31    309181604     0.
           Index Set    12             12        22220     0.
         Vec Scatter     7              5         6160     0.
   Star Forest Graph   234            234       202176     0.
              Viewer    21             10         8400     0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 9.53674e-07
Average time for zero size MPI_Send(): 2.02656e-06
#PETSc Option Table entries:
-log_view
-no_signal_handler
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/usr/local/petsc-3.9.1/linux_intel17.0 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-debugging=0 --with-blaslapack-dir=/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 --with-scalapack=1 --with-scalapack-lib="-L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm" --with-scalapack-include=/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64/../../include --with-blacs=1 --with-blacs-lib="-L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm" --with-blacs-include=/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64/../../include --with-hypre=1 --download-hypre=1 --with-ml=1 --download-ml=1 --download-mumps=1 --download-parmetis=1 --download-metis=1 --download-superlu
-----------------------------------------
Libraries compiled on 2018-05-30 11:33:16 on medusa 
Machine characteristics: Linux-4.4.114-42-default-x86_64-with-SuSE-42.3-x86_64
Using PETSc directory: /usr/local/petsc-3.9.1/linux_intel17.0
Using PETSc arch: 
-----------------------------------------

Using C compiler: mpicc  -fPIC  -wd1572 -g -O3  
Using Fortran compiler: mpif90  -fPIC -g -O3    
-----------------------------------------

Using include paths: -I/usr/local/petsc-3.9.1//include -I/usr/local/petsc-3.9.1///include -I/usr/local/petsc-3.9.1//include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/usr/local/petsc-3.9.1/linux_intel17.0/lib -L/usr/local/petsc-3.9.1/linux_intel17.0/lib -lpetsc -Wl,-rpath,/usr/local/petsc-3.9.1/linux_intel17.0/lib -L/usr/local/petsc-3.9.1/linux_intel17.0/lib -L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 -Wl,-rpath,/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 -Wl,-rpath,/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mpi/intel64/lib/debug_mt -L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mpi/intel64/lib -L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mpi/intel64/lib -Wl,-rpath,/usr/local/hdf5-1.8.18-intel17.0/lib64 -L/usr/local/hdf5-1.8.18-intel17.0/lib64 -Wl,-rpath,/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/compiler/lib/intel64_lin -L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.8 -L/usr/lib64/gcc/x86_64-suse-linux/4.8 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -lsuperlu -lHYPRE -lml -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lparmetis -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lstdc++ -ldl
-----------------------------------------

 ***** TERMINATED NORMALLY *****


More information about the petsc-users mailing list