[petsc-users] MatPtAP
Samuel Lanthaler
s.lanthaler at gmail.com
Fri Jun 1 11:11:32 CDT 2018
On 06/01/2018 04:55 PM, Smith, Barry F. wrote:
> How do you know it is exactly the MatPtAP() routine that is taking the large time?
At least according to a simple timing (CALL CPU_TIME(...)), the call to
MatPtAP appears to consume more time than the actual discretization of
the PDE. Let me attach the output of -log_view.
>
> Please send the output of a run with -log_view so we can see where the time is being spent.
>
> Barry
>
>
>> On Jun 1, 2018, at 8:21 AM, Samuel Lanthaler <s.lanthaler at gmail.com> wrote:
>>
>> Hi,
>>
>> I was wondering what the most efficient way to use MatPtAP would be in the following situation: I am discretizing a PDE system. The discretization yields a matrix A that has a band structure (with k upper and lower bands, say). In order to implement the boundary conditions, I use a transformation matrix P which is essentially the unit matrix, except for the entries P_{ij} where i,j<k and n-i,n-j<k, so
>>
>> P = [ B, 0, 0, 0, ..., 0, 0 ]
>> [ 0, 1, 0, 0, ..., 0, 0 ]
>> [ ]
>> [ ]
>> [ ..., 1, 0 ]
>> [ 0, 0, 0, 0, ..., 0, C ]
>>
>> with B,C are (k-by-k) matrices.
>> Right now, I'm simply constructing A, P and calling
>>
>> CALL MatPtAP(petsc_matA,petsc_matP,MAT_INITIAL_MATRIX,PETSC_DEFAULT_REAL,petsc_matPtAP,ierr)
>>
>> where I haven't done anything to pestc_matPtAP, prior to this call. Is this the way to do it?
>>
>> I'm asking because, currently, setting up the matrices A and P takes very little time, whereas the operation MatPtAP is taking quite long, which seems very odd... The matrices are of type MPIAIJ. In my problem, the total matrix dimension is around 10'000 and the matrix blocks (B,C) are of size ~100.
>>
>> Thanks in advance for any ideas.
>>
>> Cheers,
>> Samuel
-------------- next part --------------
***** VENUS-LEVIS: rev Unversioned directory *****
------------------------------
DELTA-F COMPUTATION FOR
HYBRID KINETIC-MHD
------------------------------
******************************
Cleaning sub-directories and creating diagnostic tree
-------------------
..... reading simulation data
..... reading equilibrium data
..... reading IO specification
synchronize nparts,nprocs!
nparts must be multiple of nprocs
nparts,nprocs 1 4
Resetting values to satisfy constraint:
nparts,nprocs 4 4
..... initializing magnetic equilibrium
Loading SFL (STRAIGHT-FIELD LINE) equilibrium
- Plasma volume 197.391538791518
----------------------------------------------
| magnetic equilibrium:
| ---------------------------------------------
| B0 = 0.984245824471487
| R0 = 10.1237829388870
| alfven velocity = 4809804.25531587
| alfven frequency = 475099.504241706
----------------------------------------------
----------------------------
Profiles related to flow (on axis):
R0: 10.1237829388870
mOmp: 2.00000000000000
T0 [keV]: 1.10111199122785
Omega0 [Hz]: 0.000000000000000E+000
Mach0: 0.000000000000000E+000
----------------------------
..... initializing analytic distributions
..... bcasting analytic profiles data
..... calling test_analytic_profiles
..... reading equilibrium distribution data
..... bcasting equilibrium distribution data
- TAE updaters
..... dumping analytic distribution; Ne,Np,Nmu 100 100
5
- TAE updaters
..... initializing delta-f computation
..... reading delta-f grid data
..... bcasting delta-f grid data
-------------
WARNING: Using df_coll_freq to set ftime%dE_prefix.
ftime%dE_prefix (original): (0.000000000000000E+000,0.000000000000000E+000)
ftime%dE_prefix (set): (4.75099504241706,237.549752120853)
-------------
ftime%grate = 4.75099504241706
ftime%freq = 237.549752120853
ftime%dE_prefix = (4.75099504241706,237.549752120853)
-----------------------------
ftoro%n = 1
ftoro%dE_prefix = (0.000000000000000E+000,-1.00000000000000)
***** STARTING SIMULATION *****
over 4 processors
with 4 orbits
******************************
initializing from MHD in
... setting type
PDE.f90--> Chosen case:
********************* MHDflow
... initializing elements
initializing with bunching on rational surfaces:
q = 1.00000000000000
q = 2.00000000000000
------
--------------------
q = qvals found for:
q = 1.00001265158867 at s = 0.272160000000131
q = 2.00000720104897 at s = 0.666679999999601
--------------------
Nrad = 180
mmin = -3
mmax = 5
ntor = -1
test_mode = 7
Initializing operator
setting natural units in MHDflow 8
setting natural units in MHDflow 8
setting natural units in MHDflow 8
setting natural units in MHDflow 8
Discretizing operator
Discretization in progress (computation of matrices)
Assembly of matA,matB done.
Accounting for use of mixed finite elements.
Add boundary conditions
Adding boundary conditions: Neumann/Dirichlet
... setting up BC
... Setting up R/T (transformation) matrices
... Applying BC to A/B
... A->RtAR, B->RtBR
-----------------------------------
time MatPtAP: 17.3900000000000
time MatZeroRowsColumns: 0.670999999999999
time MatAssembly: 4.000000000001336E-003
-------------------------
-----------------------------------
time MatPtAP: 16.3570000000000
time MatZeroRowsColumns: 0.631999999999998
time MatAssembly: 3.000000000000114E-003
-------------------------
... writing out matrices in boundary conditions...
destroying local objects
time set BC: 43.9070000000000
Discretization done.
writing out matrices
Writing matA/matB to: PDE/
writing out operator options
***** WRITING AND UNLOADING MEMORY *****
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./testing.x on a named spcpc340 with 4 processors, by lanthale Fri Jun 1 17:55:08 2018
Using Petsc Release Version 3.9.1, Apr, 29, 2018
Max Max/Min Avg Total
Time (sec): 4.793e+01 1.00000 4.793e+01
Objects: 7.980e+02 1.00000 7.980e+02
Flop: 1.549e+10 1.05057 1.512e+10 6.048e+10
Flop/sec: 3.232e+08 1.05057 3.154e+08 1.262e+09
MPI Messages: 1.176e+03 1.34554 1.020e+03 4.078e+03
MPI Message Lengths: 5.629e+08 2.85940 2.993e+05 1.221e+09
MPI Reductions: 8.430e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.7933e+01 100.0% 6.0476e+10 100.0% 4.078e+03 100.0% 2.993e+05 100.0% 8.330e+02 98.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 234 1.0 5.9497e-03 1.1 0.00e+00 0.0 4.6e+02 4.0e+00 0.0e+00 0 0 11 0 0 0 0 11 0 0 0
BuildTwoSidedF 18 1.0 2.7772e+00 7.1 0.00e+00 0.0 8.0e+01 9.2e+05 0.0e+00 4 0 2 6 0 4 0 2 6 0 0
VecView 2 1.0 3.5000e-04 1.1 0.00e+00 0.0 6.0e+00 2.6e+04 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 243 1.0 5.0569e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyBegin 3 1.0 1.3790e-03 1.6 0.00e+00 0.0 6.0e+01 1.6e+04 0.0e+00 0 0 1 0 0 0 0 1 0 0 0
VecAssemblyEnd 3 1.0 7.1049e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 235 1.0 2.3565e-03 1.5 0.00e+00 0.0 1.4e+03 5.1e+03 0.0e+00 0 0 35 1 0 0 0 35 1 0 0
VecScatterEnd 235 1.0 1.8451e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 1 1.0 1.3385e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 6 1.0 4.4126e-02 1.1 1.14e+07 1.0 6.0e+00 2.3e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 1026
MatAssemblyBegin 35 1.0 3.1616e+00 2.2 0.00e+00 0.0 2.0e+01 3.7e+06 0.0e+00 5 0 0 6 0 5 0 0 6 0 0
MatAssemblyEnd 35 1.0 3.3615e-01 1.0 0.00e+00 0.0 7.2e+01 1.1e+03 4.8e+01 1 0 2 0 6 1 0 2 0 6 0
MatZeroEntries 2 1.0 3.1345e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 8 1.0 1.8869e+00 1.0 0.00e+00 0.0 1.2e+02 9.0e+06 2.4e+01 4 0 3 88 3 4 0 3 88 3 0
MatMatMult 1 1.0 7.2594e+00 1.0 2.21e+09 1.0 3.6e+01 5.6e+05 1.2e+01 15 14 1 2 1 15 14 1 2 1 1195
MatMatMultSym 1 1.0 5.2231e+00 1.0 0.00e+00 0.0 3.0e+01 4.0e+05 1.2e+01 11 0 1 1 1 11 0 1 1 1 0
MatMatMultNum 1 1.0 2.0366e+00 1.0 2.21e+09 1.0 6.0e+00 1.3e+06 0.0e+00 4 14 0 1 0 4 14 0 1 0 4258
MatPtAP 2 1.0 3.3743e+01 1.0 1.33e+10 1.1 1.1e+02 1.0e+06 3.0e+01 70 86 3 9 4 70 86 3 9 4 1534
MatPtAPSymbolic 2 1.0 1.7436e+01 1.0 0.00e+00 0.0 7.2e+01 5.9e+05 1.4e+01 36 0 2 3 2 36 0 2 3 2 0
MatPtAPNumeric 2 1.0 1.6306e+01 1.0 1.33e+10 1.1 4.2e+01 1.7e+06 1.6e+01 34 86 1 6 2 34 86 1 6 2 3174
MatGetLocalMat 4 1.0 5.9245e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 4 1.0 1.8033e-02 1.4 0.00e+00 0.0 6.0e+01 7.4e+05 0.0e+00 0 0 1 4 0 0 0 1 4 0 0
SFSetGraph 234 1.0 5.5790e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 234 1.0 7.6420e-03 1.1 0.00e+00 0.0 1.4e+03 4.0e+00 0.0e+00 0 0 34 0 0 0 0 34 0 0 0
SFReduceBegin 234 1.0 1.0035e-03 1.3 0.00e+00 0.0 9.2e+02 4.0e+00 0.0e+00 0 0 23 0 0 0 0 23 0 0 0
SFReduceEnd 234 1.0 4.1008e-04 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 487 485 8328944 0.
Matrix 37 31 309181604 0.
Index Set 12 12 22220 0.
Vec Scatter 7 5 6160 0.
Star Forest Graph 234 234 202176 0.
Viewer 21 10 8400 0.
========================================================================================================================
Average time to get PetscTime(): 0.
Average time for MPI_Barrier(): 9.53674e-07
Average time for zero size MPI_Send(): 2.02656e-06
#PETSc Option Table entries:
-log_view
-no_signal_handler
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/usr/local/petsc-3.9.1/linux_intel17.0 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-debugging=0 --with-blaslapack-dir=/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 --with-scalapack=1 --with-scalapack-lib="-L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm" --with-scalapack-include=/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64/../../include --with-blacs=1 --with-blacs-lib="-L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm" --with-blacs-include=/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64/../../include --with-hypre=1 --download-hypre=1 --with-ml=1 --download-ml=1 --download-mumps=1 --download-parmetis=1 --download-metis=1 --download-superlu
-----------------------------------------
Libraries compiled on 2018-05-30 11:33:16 on medusa
Machine characteristics: Linux-4.4.114-42-default-x86_64-with-SuSE-42.3-x86_64
Using PETSc directory: /usr/local/petsc-3.9.1/linux_intel17.0
Using PETSc arch:
-----------------------------------------
Using C compiler: mpicc -fPIC -wd1572 -g -O3
Using Fortran compiler: mpif90 -fPIC -g -O3
-----------------------------------------
Using include paths: -I/usr/local/petsc-3.9.1//include -I/usr/local/petsc-3.9.1///include -I/usr/local/petsc-3.9.1//include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/usr/local/petsc-3.9.1/linux_intel17.0/lib -L/usr/local/petsc-3.9.1/linux_intel17.0/lib -lpetsc -Wl,-rpath,/usr/local/petsc-3.9.1/linux_intel17.0/lib -L/usr/local/petsc-3.9.1/linux_intel17.0/lib -L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 -Wl,-rpath,/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64 -Wl,-rpath,/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mpi/intel64/lib/debug_mt -L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mpi/intel64/lib -L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/mpi/intel64/lib -Wl,-rpath,/usr/local/hdf5-1.8.18-intel17.0/lib64 -L/usr/local/hdf5-1.8.18-intel17.0/lib64 -Wl,-rpath,/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/compiler/lib/intel64_lin -L/usr/local/intel/17.0/compilers_and_libraries_2017.0.098/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/4.8 -L/usr/lib64/gcc/x86_64-suse-linux/4.8 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -lsuperlu -lHYPRE -lml -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lparmetis -lmetis -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lifport -lifcoremt_pic -limf -lsvml -lm -lipgo -lirc -lgcc_s -lirc_s -lstdc++ -ldl
-----------------------------------------
***** TERMINATED NORMALLY *****
More information about the petsc-users
mailing list