[petsc-dev] OpenMP in PETSc when calling from Fortran?
Åsmund Ervik
Asmund.Ervik at sintef.no
Wed Mar 6 03:39:26 CST 2013
Hi again,
On 01. mars 2013 20:06, Jed Brown wrote:
>
> Matrix and vector operations are probably running in parallel, but probably
> not the operations that are taking time. Always send -log_summary if you
> have a performance question.
>
I don't think they are running in parallel. When I analyze my code in
Intel Vtune Amplifier, the only routines running in parallel are my own
OpenMP ones. Indeed, if I comment out my OpenMP pragmas and recompile my
code, it never uses more than one thread.
-log_summary is shown below; this is using -pc_type lu -ksp_type bcgs.
The fastest PC for my cases is usually BoomerAMG from HYPRE, so i used
LU instead here in order to limit the test to PETSc only. The summary
agrees with Vtune that MatLUFactorNumeric is the most time-consuming
routine; in general it seems that the PC is always the most time-consuming.
Any advice on how to get OpenMP working?
Regards,
Åsmund
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./run on a arch-linux2-c-opt named vsl161 with 1 processor, by asmunder
Wed Mar 6 10:14:55 2013
Using Petsc Development HG revision:
58cc6199509f1642f637843f1ca468283bf5ced9 HG Date: Wed Jan 30 00:39:35
2013 -0600
Max Max/Min Avg Total
Time (sec): 4.446e+02 1.00000 4.446e+02
Objects: 2.017e+03 1.00000 2.017e+03
Flops: 3.919e+11 1.00000 3.919e+11 3.919e+11
Flops/sec: 8.815e+08 1.00000 8.815e+08 8.815e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 2.818e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length
N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 4.4460e+02 100.0% 3.9191e+11 100.0% 0.000e+00
0.0% 0.000e+00 0.0% 2.817e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this
phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDot 802 1.0 9.2811e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 2117
VecDotNorm2 401 1.0 7.1333e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
4.0e+02 0 0 0 0 14 0 0 0 0 14 2755
VecNorm 1203 1.0 7.8265e-02 1.0 2.95e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 3766
VecCopy 802 1.0 1.1754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1211 1.0 9.9961e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 401 1.0 4.5847e-02 1.0 9.82e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 2143
VecAXPBYCZ 802 1.0 1.3489e-01 1.0 3.93e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 2913
VecWAXPY 802 1.0 1.2292e-01 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1599
VecAssemblyBegin 802 1.0 2.4509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 802 1.0 6.7234e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 1203 1.0 1.1513e+00 1.0 1.32e+09 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1149
MatSolve 1604 1.0 1.4714e+01 1.0 2.07e+10 1.0 0.0e+00 0.0e+00
0.0e+00 3 5 0 0 0 3 5 0 0 0 1405
MatLUFactorSym 401 1.0 4.0197e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+03 9 0 0 0 43 9 0 0 0 43 0
MatLUFactorNum 401 1.0 2.3728e+02 1.0 3.69e+11 1.0 0.0e+00 0.0e+00
0.0e+00 53 94 0 0 0 53 94 0 0 0 1553
MatAssemblyBegin 401 1.0 1.7977e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 401 1.0 3.1975e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 401 1.0 9.1545e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 401 1.0 2.0361e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
8.0e+02 5 0 0 0 28 5 0 0 0 28 0
KSPSetUp 401 1.0 4.1821e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 401 1.0 3.1511e+02 1.0 3.92e+11 1.0 0.0e+00 0.0e+00
2.8e+03 71100 0 0100 71100 0 0100 1244
PCSetUp 401 1.0 2.9844e+02 1.0 3.69e+11 1.0 0.0e+00 0.0e+00
2.0e+03 67 94 0 0 71 67 94 0 0 71 1235
PCApply 1604 1.0 1.4717e+01 1.0 2.07e+10 1.0 0.0e+00 0.0e+00
0.0e+00 3 5 0 0 0 3 5 0 0 0 1405
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 409 409 401422048 0
Matrix 402 402 31321054412 0
Krylov Solver 1 1 1128 0
Preconditioner 1 1 1152 0
Index Set 1203 1203 393903904 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-ksp_type bcgs
-log_summary
-pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Fri Mar 1 12:53:06 2013
Configure options: --with-pthreadclasses --with-openmp
--with-debugging=0 --with-shared-libraries=1 --download-mpich
--download-hypre --with-boost-dir=/usr COPTFLAGS=-O3 FOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Fri Mar 1 12:53:06 2013 on vsl161
Machine characteristics: Linux-3.7.9-1-ARCH-x86_64-with-glibc2.2.5
Using PETSc directory: /opt/petsc/petsc-dev-install
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler:
/opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpicc -fPIC -Wall
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -fopenmp
${COPTFLAGS} ${CFLAGS}
Using Fortran compiler:
/opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpif90 -fPIC -Wall
-Wno-unused-variable -Wno-unused-dummy-argument -O3 -fopenmp
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/opt/petsc/petsc-dev-install/arch-linux2-c-opt/include
-I/opt/petsc/petsc-dev-install/include
-I/opt/petsc/petsc-dev-install/include
-I/opt/petsc/petsc-dev-install/arch-linux2-c-opt/include -I/usr/include
-----------------------------------------
Using C linker: /opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpicc
Using Fortran linker:
/opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpif90
Using libraries:
-Wl,-rpath,/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib
-L/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib
-L/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib -lHYPRE
-Wl,-rpath,/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.2
-L/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.2
-Wl,-rpath,/opt/intel/composer_xe_2013.1.117/compiler/lib/intel64
-L/opt/intel/composer_xe_2013.1.117/compiler/lib/intel64
-Wl,-rpath,/opt/intel/composer_xe_2013.1.117/ipp/lib/intel64
-L/opt/intel/composer_xe_2013.1.117/ipp/lib/intel64
-Wl,-rpath,/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64
-L/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64
-Wl,-rpath,/opt/intel/composer_xe_2013.1.117/tbb/lib/intel64
-L/opt/intel/composer_xe_2013.1.117/tbb/lib/intel64 -lmpichcxx -lstdc++
-llapack -lblas -lX11 -lpthread -lm -lmpichf90 -lgfortran -lm -lgfortran
-lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt
-lgcc_s -ldl
-----------------------------------------
More information about the petsc-dev
mailing list