[petsc-users] Speedup problem when using OpenMP?
Danyang Su
danyang.su at gmail.com
Mon Nov 4 18:26:18 CST 2013
Hi All,
I have test the same example under Ubuntu12.04 X64. The PETSc-dev
version is update to date (GIT Date: 2013-11-01 14:59:20 -0500) and the
installation is smooth without any error. The speedup of MPI version is
linear scalable but the speedup of OpenMP version does not change. *From
the CPU usage, the program still run in one thread when use OpenMP. *
The commands to run the test are as follows:
openmp
./ex2f -threadcomm_type openmp -threadcomm_nthreads 4 -m 1000 -n 1000
-log_summary log_ex2f_1000x1000_ubuntu1204_omp_p4.log
mpi
mpiexec -n 4 ./ex2f -m 1000 -n 1000 -log_summary
log_ex2f_1000x1000_ubuntu1204_mpi_p4.log
This problem is so tricky to me. Can anybody confirm if KSP solver is
parallelized for OpenMP version?
Thanks and regards,
Danyang
On 31/10/2013 4:54 PM, Danyang Su wrote:
> Hi All,
>
> I have a question on the speedup of PETSc when using OpenMP. I can get
> good speedup when using MPI, but no speedup when using OpenMP.
> The example is ex2f with m=100 and n=100. The number of available
> processors is 16 (32 threads) and the OS is Windows Server 2012. The
> log files for 4 and 8 processors are attached.
>
> The commands I used to run with 4 processors are as follows:
> Run using MPI
> mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary
> log_100x100_mpi_p4.log
>
> Run using OpenMP
> Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4
> -m 100 -n 100 -log_summary log_100x100_openmp_p4.log
>
> The PETSc used for this test is PETSc for Windows
> http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this
> is not the problem because the same problem exists when I use
> PETSc-dev in Cygwin. I don't know if this problem exists in Linux,
> would anybody help to test?
>
> Thanks and regards,
>
> Danyang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131104/79feaabf/attachment-0001.html>
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov 4 15:35:47 2013
With 4 threads per MPI_Comm
Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500
Max Max/Min Avg Total
Time (sec): 2.376e+02 1.00000 2.376e+02
Objects: 4.500e+01 1.00000 4.500e+01
Flops: 2.203e+11 1.00000 2.203e+11 2.203e+11
Flops/sec: 9.271e+08 1.00000 9.271e+08 9.271e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.3759e+02 100.0% 2.2028e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 2657 1.0 4.1715e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 573
MatSolve 2657 1.0 6.4028e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 373
MatLUFactorNum 1 1.0 1.1149e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 99
MatILUFactorSym 1 1.0 8.2365e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 7.8678e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 9.1023e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 1.0014e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.2122e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 2571 1.0 5.1144e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 0.0e+00 22 36 0 0 0 22 36 0 0 0 1555
VecNorm 2658 1.0 5.4516e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 975
VecScale 2657 1.0 3.8631e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 688
VecCopy 86 1.0 2.2233e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 88 1.0 1.1501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 172 1.0 4.4589e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 771
VecMAXPY 2657 1.0 6.9213e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38 0 0 0 29 38 0 0 0 1223
VecNormalize 2657 1.0 9.3968e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 848
KSPGMRESOrthog 2571 1.0 1.1630e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 49 72 0 0 0 49 72 0 0 0 1367
KSPSetUp 1 1.0 2.8520e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.3699e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 0.0e+00100100 0 0 0 100100 0 0 0 929
PCSetUp 1 1.0 2.0609e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 53
PCApply 2657 1.0 6.4088e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 373
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 2 2 151957404 0
Vector 37 37 296057424 0
Krylov Solver 1 1 18368 0
Preconditioner 1 1 984 0
Index Set 3 3 4002304 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 6.50883e-06
#PETSc Option Table entries:
-log_summary log_1000x1000_omp_p4.log
-m 1000
-n 1000
-threadcomm_nthreads 4
-threadcomm_type openmp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Nov 4 15:22:12 2013
Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp --with-debugging=0
-----------------------------------------
Libraries compiled on Mon Nov 4 15:22:12 2013 on dsu-pc
Machine characteristics: Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise
Using PETSc directory: /home/dsu/petsc
Using PETSc arch: linux-gnu-omp-opt
-----------------------------------------
Using C compiler: gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fopenmp ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: gfortran -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O -fopenmp ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni
-----------------------------------------
Using C linker: gcc
Using Fortran linker: gfortran
Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ -ldl -lgcc_s -ldl
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov 4 15:31:30 2013
Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500
Max Max/Min Avg Total
Time (sec): 2.388e+02 1.00000 2.388e+02
Objects: 4.500e+01 1.00000 4.500e+01
Flops: 2.203e+11 1.00000 2.203e+11 2.203e+11
Flops/sec: 9.224e+08 1.00000 9.224e+08 9.224e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.3881e+02 100.0% 2.2028e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 2657 1.0 4.0429e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 17 11 0 0 0 17 11 0 0 0 591
MatSolve 2657 1.0 6.3888e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 374
MatLUFactorNum 1 1.0 1.2874e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 85
MatILUFactorSym 1 1.0 1.3501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 6.8491e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 1.1921e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.3066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 2571 1.0 5.2507e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 0.0e+00 22 36 0 0 0 22 36 0 0 0 1514
VecNorm 2658 1.0 5.4426e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 977
VecScale 2657 1.0 3.8871e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 684
VecCopy 86 1.0 1.9921e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 88 1.0 1.0965e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 172 1.0 4.0171e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 856
VecMAXPY 2657 1.0 7.0096e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38 0 0 0 29 38 0 0 0 1208
VecNormalize 2657 1.0 9.4060e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 4 0 0 0 4 4 0 0 0 847
KSPGMRESOrthog 2571 1.0 1.1847e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 50 72 0 0 0 50 72 0 0 0 1342
KSPSetUp 1 1.0 3.7805e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.3820e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 0.0e+00100100 0 0 0 100100 0 0 0 925
PCSetUp 1 1.0 2.7698e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 40
PCApply 2657 1.0 6.3946e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 374
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 2 2 151957404 0
Vector 37 37 296057424 0
Krylov Solver 1 1 18368 0
Preconditioner 1 1 984 0
Index Set 3 3 4002304 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 8.51154e-06
#PETSc Option Table entries:
-log_summary log_1000x1000_omp_p1.log
-m 1000
-n 1000
-threadcomm_nthreads 1
-threadcomm_type openmp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Nov 4 15:22:12 2013
Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp --with-debugging=0
-----------------------------------------
Libraries compiled on Mon Nov 4 15:22:12 2013 on dsu-pc
Machine characteristics: Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise
Using PETSc directory: /home/dsu/petsc
Using PETSc arch: linux-gnu-omp-opt
-----------------------------------------
Using C compiler: gcc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fopenmp ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: gfortran -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O -fopenmp ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni
-----------------------------------------
Using C linker: gcc
Using Fortran linker: gfortran
Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ -ldl -lgcc_s -ldl
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex2f on a linux-gnu-opt named dsu-pc with 4 processors, by root Mon Nov 4 16:10:24 2013
Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500
Max Max/Min Avg Total
Time (sec): 5.364e+01 1.00045 5.362e+01
Objects: 5.600e+01 1.00000 5.600e+01
Flops: 2.837e+10 1.00010 2.837e+10 1.135e+11
Flops/sec: 5.291e+08 1.00054 5.290e+08 2.116e+09
MPI Messages: 2.744e+03 2.00000 2.058e+03 8.232e+03
MPI Message Lengths: 2.193e+07 2.00000 7.991e+03 6.578e+07
MPI Reductions: 2.720e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 5.3623e+01 100.0% 1.1347e+11 100.0% 8.232e+03 100.0% 7.991e+03 100.0% 2.719e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1370 1.0 8.7882e+00 1.0 3.08e+09 1.0 8.2e+03 8.0e+03 0.0e+00 16 11100100 0 16 11100100 0 1402
MatSolve 1370 1.0 9.0304e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 17 11 0 0 0 17 11 0 0 0 1362
MatLUFactorNum 1 1.0 3.3336e-02 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 329
MatILUFactorSym 1 1.0 7.1875e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 7.2212e-0241.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 5.4802e-02 1.0 0.00e+00 0.0 1.2e+01 2.0e+03 9.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 1.2875e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 4.8881e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 1325 1.0 1.4754e+01 1.0 1.02e+10 1.0 0.0e+00 0.0e+00 1.3e+03 27 36 0 0 49 27 36 0 0 49 2776
VecNorm 1371 1.0 1.9989e+00 1.1 6.86e+08 1.0 0.0e+00 0.0e+00 1.4e+03 4 2 0 0 50 4 2 0 0 50 1372
VecScale 1370 1.0 4.9844e-01 1.1 3.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2749
VecCopy 45 1.0 4.4863e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1418 1.0 6.2273e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 90 1.0 1.0165e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1771
VecMAXPY 1370 1.0 1.5635e+01 1.0 1.09e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38 0 0 0 29 38 0 0 0 2789
VecScatterBegin 1370 1.0 1.6159e-01 1.8 0.00e+00 0.0 8.2e+03 8.0e+03 0.0e+00 0 0100100 0 0 0100100 0 0
VecScatterEnd 1370 1.0 9.6929e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecNormalize 1370 1.0 2.5033e+00 1.1 1.03e+09 1.0 0.0e+00 0.0e+00 1.4e+03 5 4 0 0 50 5 4 0 0 50 1642
KSPGMRESOrthog 1325 1.0 2.9419e+01 1.0 2.05e+10 1.0 0.0e+00 0.0e+00 1.3e+03 54 72 0 0 49 54 72 0 0 49 2784
KSPSetUp 2 1.0 2.1291e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 5.2989e+01 1.0 2.84e+10 1.0 8.2e+03 8.0e+03 2.7e+03 99100100100 99 99100100100 99 2141
PCSetUp 2 1.0 1.4600e-01 1.1 2.74e+06 1.0 0.0e+00 0.0e+00 5.0e+00 0 0 0 0 0 0 0 0 0 0 75
PCSetUpOnBlocks 1 1.0 1.1017e-01 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 100
PCApply 1370 1.0 9.7092e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 1267
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 4 4 56984588 0
Vector 41 41 74071440 0
Vector Scatter 1 1 1060 0
Index Set 5 5 1007832 0
Krylov Solver 2 2 19520 0
Preconditioner 2 2 1864 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 6.19888e-06
Average time for MPI_Barrier(): 0.000529623
Average time for zero size MPI_Send(): 0.000117242
#PETSc Option Table entries:
-log_summary log_1000x1000_mpi_p4.log
-m 1000
-n 1000
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Nov 4 14:29:26 2013
Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran --with-debugging=0 --download-f-blas-lapack --download-mpich
-----------------------------------------
Libraries compiled on Mon Nov 4 14:29:26 2013 on dsu-pc
Machine characteristics: Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise
Using PETSc directory: /home/dsu/petsc
Using PETSc arch: linux-gnu-opt
-----------------------------------------
Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-opt/include
-----------------------------------------
Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc
Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90
Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex2f on a linux-gnu-opt named dsu-pc with 1 processor, by root Mon Nov 4 16:14:37 2013
Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7 GIT Date: 2013-11-01 14:59:20 -0500
Max Max/Min Avg Total
Time (sec): 2.295e+02 1.00000 2.295e+02
Objects: 4.500e+01 1.00000 4.500e+01
Flops: 2.203e+11 1.00000 2.203e+11 2.203e+11
Flops/sec: 9.597e+08 1.00000 9.597e+08 9.597e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 5.236e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.2953e+02 100.0% 2.2028e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 5.235e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 2657 1.0 4.0388e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 592
MatSolve 2657 1.0 6.1962e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 386
MatLUFactorNum 1 1.0 1.2718e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 86
MatILUFactorSym 1 1.0 9.5901e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 1.2159e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 6.3241e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.2885e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 2571 1.0 4.9771e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 2.6e+03 22 36 0 0 49 22 36 0 0 49 1598
VecNorm 2658 1.0 5.2489e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 2.7e+03 2 2 0 0 51 2 2 0 0 51 1013
VecScale 2657 1.0 3.5420e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 750
VecCopy 86 1.0 2.0908e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 88 1.0 1.1408e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 172 1.0 4.3620e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 789
VecMAXPY 2657 1.0 6.6513e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38 0 0 0 29 38 0 0 0 1273
VecNormalize 2657 1.0 8.8659e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 2.7e+03 4 4 0 0 51 4 4 0 0 51 899
KSPGMRESOrthog 2571 1.0 1.1234e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 2.6e+03 49 72 0 0 49 49 72 0 0 49 1416
KSPSetUp 1 1.0 2.9065e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.2896e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 5.2e+03100100 0 0100 100100 0 0100 962
PCSetUp 1 1.0 2.3610e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 47
PCApply 2657 1.0 6.2019e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11 0 0 0 27 11 0 0 0 385
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 2 2 151957404 0
Vector 37 37 296057424 0
Krylov Solver 1 1 18368 0
Preconditioner 1 1 984 0
Index Set 3 3 4002304 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 5.81741e-06
#PETSc Option Table entries:
-log_summary log_1000x1000_mpi_p1.log
-m 1000
-n 1000
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Nov 4 14:29:26 2013
Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran --with-debugging=0 --download-f-blas-lapack --download-mpich
-----------------------------------------
Libraries compiled on Mon Nov 4 14:29:26 2013 on dsu-pc
Machine characteristics: Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise
Using PETSc directory: /home/dsu/petsc
Using PETSc arch: linux-gnu-opt
-----------------------------------------
Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-opt/include
-----------------------------------------
Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc
Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90
Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
-----------------------------------------
More information about the petsc-users
mailing list