[petsc-users] Speedup problem when using OpenMP?

Danyang Su danyang.su at gmail.com
Mon Nov 4 18:26:18 CST 2013


Hi All,

I have test the same example under Ubuntu12.04 X64.  The PETSc-dev 
version is update to date (GIT Date: 2013-11-01 14:59:20 -0500) and the 
installation is smooth without any error. The speedup of MPI version is 
linear scalable but the speedup of OpenMP version does not change. *From 
the CPU usage, the program still run in one thread when use OpenMP. *

The commands to run the test are as follows:

openmp
./ex2f -threadcomm_type openmp -threadcomm_nthreads 4 -m 1000 -n 1000 
-log_summary log_ex2f_1000x1000_ubuntu1204_omp_p4.log

mpi
mpiexec -n 4 ./ex2f -m 1000 -n 1000 -log_summary 
log_ex2f_1000x1000_ubuntu1204_mpi_p4.log

This problem is so tricky to me. Can anybody confirm if KSP solver is 
parallelized for OpenMP version?

Thanks and regards,

Danyang

On 31/10/2013 4:54 PM, Danyang Su wrote:
> Hi All,
>
> I have a question on the speedup of PETSc when using OpenMP. I can get 
> good speedup when using MPI, but no speedup when using OpenMP.
> The example is ex2f with m=100 and n=100. The number of available 
> processors is 16 (32 threads) and the OS is Windows Server 2012. The 
> log files for 4 and 8 processors are attached.
>
> The commands I used to run with 4 processors are as follows:
> Run using MPI
> mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary 
> log_100x100_mpi_p4.log
>
> Run using OpenMP
> Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4 
> -m 100 -n 100 -log_summary log_100x100_openmp_p4.log
>
> The PETSc used for this test is PETSc for Windows 
> http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this 
> is not the problem because the same problem exists when I use 
> PETSc-dev in Cygwin. I don't know if this problem exists in Linux, 
> would anybody help to test?
>
> Thanks and regards,
>
> Danyang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20131104/79feaabf/attachment-0001.html>
-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov  4 15:35:47 2013
With 4 threads per MPI_Comm
Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7  GIT Date: 2013-11-01 14:59:20 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           2.376e+02      1.00000   2.376e+02
Objects:              4.500e+01      1.00000   4.500e+01
Flops:                2.203e+11      1.00000   2.203e+11  2.203e+11
Flops/sec:            9.271e+08      1.00000   9.271e+08  9.271e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.3759e+02 100.0%  2.2028e+11 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             2657 1.0 4.1715e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 18 11  0  0  0  18 11  0  0  0   573
MatSolve            2657 1.0 6.4028e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   373
MatLUFactorNum         1 1.0 1.1149e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    99
MatILUFactorSym        1 1.0 8.2365e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 7.8678e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 9.1023e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 1.0014e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2122e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot             2571 1.0 5.1144e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 0.0e+00 22 36  0  0  0  22 36  0  0  0  1555
VecNorm             2658 1.0 5.4516e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   975
VecScale            2657 1.0 3.8631e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   688
VecCopy               86 1.0 2.2233e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                88 1.0 1.1501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              172 1.0 4.4589e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   771
VecMAXPY            2657 1.0 6.9213e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38  0  0  0  29 38  0  0  0  1223
VecNormalize        2657 1.0 9.3968e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   848
KSPGMRESOrthog      2571 1.0 1.1630e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 49 72  0  0  0  49 72  0  0  0  1367
KSPSetUp               1 1.0 2.8520e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.3699e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 0.0e+00100100  0  0  0 100100  0  0  0   929
PCSetUp                1 1.0 2.0609e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    53
PCApply             2657 1.0 6.4088e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   373
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     2              2    151957404     0
              Vector    37             37    296057424     0
       Krylov Solver     1              1        18368     0
      Preconditioner     1              1          984     0
           Index Set     3              3      4002304     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 6.50883e-06
#PETSc Option Table entries:
-log_summary log_1000x1000_omp_p4.log
-m 1000
-n 1000
-threadcomm_nthreads 4
-threadcomm_type openmp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Nov  4 15:22:12 2013
Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp --with-debugging=0
-----------------------------------------
Libraries compiled on Mon Nov  4 15:22:12 2013 on dsu-pc 
Machine characteristics: Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise
Using PETSc directory: /home/dsu/petsc
Using PETSc arch: linux-gnu-omp-opt
-----------------------------------------

Using C compiler: gcc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fopenmp  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: gfortran  -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O  -fopenmp  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni
-----------------------------------------

Using C linker: gcc
Using Fortran linker: gfortran
Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ -ldl -lgcc_s -ldl 
-----------------------------------------

-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ex2f on a linux-gnu-omp-opt named dsu-pc with 1 processor, by root Mon Nov  4 15:31:30 2013
Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7  GIT Date: 2013-11-01 14:59:20 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           2.388e+02      1.00000   2.388e+02
Objects:              4.500e+01      1.00000   4.500e+01
Flops:                2.203e+11      1.00000   2.203e+11  2.203e+11
Flops/sec:            9.224e+08      1.00000   9.224e+08  9.224e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.3881e+02 100.0%  2.2028e+11 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             2657 1.0 4.0429e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 17 11  0  0  0  17 11  0  0  0   591
MatSolve            2657 1.0 6.3888e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   374
MatLUFactorNum         1 1.0 1.2874e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    85
MatILUFactorSym        1 1.0 1.3501e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 8.1062e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.8491e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 1.1921e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.3066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot             2571 1.0 5.2507e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 0.0e+00 22 36  0  0  0  22 36  0  0  0  1514
VecNorm             2658 1.0 5.4426e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   977
VecScale            2657 1.0 3.8871e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   684
VecCopy               86 1.0 1.9921e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                88 1.0 1.0965e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              172 1.0 4.0171e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   856
VecMAXPY            2657 1.0 7.0096e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38  0  0  0  29 38  0  0  0  1208
VecNormalize        2657 1.0 9.4060e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 0.0e+00  4  4  0  0  0   4  4  0  0  0   847
KSPGMRESOrthog      2571 1.0 1.1847e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 50 72  0  0  0  50 72  0  0  0  1342
KSPSetUp               1 1.0 3.7805e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.3820e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 0.0e+00100100  0  0  0 100100  0  0  0   925
PCSetUp                1 1.0 2.7698e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    40
PCApply             2657 1.0 6.3946e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   374
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     2              2    151957404     0
              Vector    37             37    296057424     0
       Krylov Solver     1              1        18368     0
      Preconditioner     1              1          984     0
           Index Set     3              3      4002304     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 8.51154e-06
#PETSc Option Table entries:
-log_summary log_1000x1000_omp_p1.log
-m 1000
-n 1000
-threadcomm_nthreads 1
-threadcomm_type openmp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Nov  4 15:22:12 2013
Configure options: PETSC_ARCH=linux-gnu-omp-opt --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-mpi=0 --with-openmp --with-debugging=0
-----------------------------------------
Libraries compiled on Mon Nov  4 15:22:12 2013 on dsu-pc 
Machine characteristics: Linux-3.2.0-55-generic-x86_64-with-Ubuntu-12.04-precise
Using PETSc directory: /home/dsu/petsc
Using PETSc arch: linux-gnu-omp-opt
-----------------------------------------

Using C compiler: gcc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O -fopenmp  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: gfortran  -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O  -fopenmp  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-omp-opt/include -I/home/dsu/petsc/include/mpiuni
-----------------------------------------

Using C linker: gcc
Using Fortran linker: gfortran
Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-omp-opt/lib -L/home/dsu/petsc/linux-gnu-omp-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lgfortran -lm -lgfortran -lm -lquadmath -lm -lstdc++ -ldl -lgcc_s -ldl 
-----------------------------------------

-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ex2f on a linux-gnu-opt named dsu-pc with 4 processors, by root Mon Nov  4 16:10:24 2013
Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7  GIT Date: 2013-11-01 14:59:20 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           5.364e+01      1.00045   5.362e+01
Objects:              5.600e+01      1.00000   5.600e+01
Flops:                2.837e+10      1.00010   2.837e+10  1.135e+11
Flops/sec:            5.291e+08      1.00054   5.290e+08  2.116e+09
MPI Messages:         2.744e+03      2.00000   2.058e+03  8.232e+03
MPI Message Lengths:  2.193e+07      2.00000   7.991e+03  6.578e+07
MPI Reductions:       2.720e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 5.3623e+01 100.0%  1.1347e+11 100.0%  8.232e+03 100.0%  7.991e+03      100.0%  2.719e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1370 1.0 8.7882e+00 1.0 3.08e+09 1.0 8.2e+03 8.0e+03 0.0e+00 16 11100100  0  16 11100100  0  1402
MatSolve            1370 1.0 9.0304e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 17 11  0  0  0  17 11  0  0  0  1362
MatLUFactorNum         1 1.0 3.3336e-02 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   329
MatILUFactorSym        1 1.0 7.1875e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 7.2212e-0241.6 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 5.4802e-02 1.0 0.00e+00 0.0 1.2e+01 2.0e+03 9.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 1.2875e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 4.8881e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot             1325 1.0 1.4754e+01 1.0 1.02e+10 1.0 0.0e+00 0.0e+00 1.3e+03 27 36  0  0 49  27 36  0  0 49  2776
VecNorm             1371 1.0 1.9989e+00 1.1 6.86e+08 1.0 0.0e+00 0.0e+00 1.4e+03  4  2  0  0 50   4  2  0  0 50  1372
VecScale            1370 1.0 4.9844e-01 1.1 3.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2749
VecCopy               45 1.0 4.4863e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1418 1.0 6.2273e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY               90 1.0 1.0165e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1771
VecMAXPY            1370 1.0 1.5635e+01 1.0 1.09e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38  0  0  0  29 38  0  0  0  2789
VecScatterBegin     1370 1.0 1.6159e-01 1.8 0.00e+00 0.0 8.2e+03 8.0e+03 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd       1370 1.0 9.6929e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecNormalize        1370 1.0 2.5033e+00 1.1 1.03e+09 1.0 0.0e+00 0.0e+00 1.4e+03  5  4  0  0 50   5  4  0  0 50  1642
KSPGMRESOrthog      1325 1.0 2.9419e+01 1.0 2.05e+10 1.0 0.0e+00 0.0e+00 1.3e+03 54 72  0  0 49  54 72  0  0 49  2784
KSPSetUp               2 1.0 2.1291e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 5.2989e+01 1.0 2.84e+10 1.0 8.2e+03 8.0e+03 2.7e+03 99100100100 99  99100100100 99  2141
PCSetUp                2 1.0 1.4600e-01 1.1 2.74e+06 1.0 0.0e+00 0.0e+00 5.0e+00  0  0  0  0  0   0  0  0  0  0    75
PCSetUpOnBlocks        1 1.0 1.1017e-01 1.0 2.74e+06 1.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0   100
PCApply             1370 1.0 9.7092e+00 1.0 3.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 11  0  0  0  18 11  0  0  0  1267
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     4              4     56984588     0
              Vector    41             41     74071440     0
      Vector Scatter     1              1         1060     0
           Index Set     5              5      1007832     0
       Krylov Solver     2              2        19520     0
      Preconditioner     2              2         1864     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 6.19888e-06
Average time for MPI_Barrier(): 0.000529623
Average time for zero size MPI_Send(): 0.000117242
#PETSc Option Table entries:
-log_summary log_1000x1000_mpi_p4.log
-m 1000
-n 1000
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Nov  4 14:29:26 2013
Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran --with-debugging=0 --download-f-blas-lapack --download-mpich
-----------------------------------------
Libraries compiled on Mon Nov  4 14:29:26 2013 on dsu-pc 
Machine characteristics: Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise
Using PETSc directory: /home/dsu/petsc
Using PETSc arch: linux-gnu-opt
-----------------------------------------

Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90  -fPIC  -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-opt/include
-----------------------------------------

Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc
Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90
Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl 
-----------------------------------------

-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./ex2f on a linux-gnu-opt named dsu-pc with 1 processor, by root Mon Nov  4 16:14:37 2013
Using Petsc Development GIT revision: 1beacf92f04482972e84431be0032cb960d262c7  GIT Date: 2013-11-01 14:59:20 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           2.295e+02      1.00000   2.295e+02
Objects:              4.500e+01      1.00000   4.500e+01
Flops:                2.203e+11      1.00000   2.203e+11  2.203e+11
Flops/sec:            9.597e+08      1.00000   9.597e+08  9.597e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       5.236e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.2953e+02 100.0%  2.2028e+11 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  5.235e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             2657 1.0 4.0388e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 18 11  0  0  0  18 11  0  0  0   592
MatSolve            2657 1.0 6.1962e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   386
MatLUFactorNum         1 1.0 1.2718e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    86
MatILUFactorSym        1 1.0 9.5901e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 1.2159e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 6.3241e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2885e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot             2571 1.0 4.9771e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 2.6e+03 22 36  0  0 49  22 36  0  0 49  1598
VecNorm             2658 1.0 5.2489e+00 1.0 5.32e+09 1.0 0.0e+00 0.0e+00 2.7e+03  2  2  0  0 51   2  2  0  0 51  1013
VecScale            2657 1.0 3.5420e+00 1.0 2.66e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   750
VecCopy               86 1.0 2.0908e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                88 1.0 1.1408e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              172 1.0 4.3620e-01 1.0 3.44e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   789
VecMAXPY            2657 1.0 6.6513e+01 1.0 8.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 38  0  0  0  29 38  0  0  0  1273
VecNormalize        2657 1.0 8.8659e+00 1.0 7.97e+09 1.0 0.0e+00 0.0e+00 2.7e+03  4  4  0  0 51   4  4  0  0 51   899
KSPGMRESOrthog      2571 1.0 1.1234e+02 1.0 1.59e+11 1.0 0.0e+00 0.0e+00 2.6e+03 49 72  0  0 49  49 72  0  0 49  1416
KSPSetUp               1 1.0 2.9065e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.2896e+02 1.0 2.20e+11 1.0 0.0e+00 0.0e+00 5.2e+03100100  0  0100 100100  0  0100   962
PCSetUp                1 1.0 2.3610e-01 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0    47
PCApply             2657 1.0 6.2019e+01 1.0 2.39e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 11  0  0  0  27 11  0  0  0   385
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     2              2    151957404     0
              Vector    37             37    296057424     0
       Krylov Solver     1              1        18368     0
      Preconditioner     1              1          984     0
           Index Set     3              3      4002304     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 5.81741e-06
#PETSc Option Table entries:
-log_summary log_1000x1000_mpi_p1.log
-m 1000
-n 1000
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Nov  4 14:29:26 2013
Configure options: PETSC_ARCH=linux-gnu-opt --with-cc=gcc --with-fc=gfortran --with-debugging=0 --download-f-blas-lapack --download-mpich
-----------------------------------------
Libraries compiled on Mon Nov  4 14:29:26 2013 on dsu-pc 
Machine characteristics: Linux-3.2.0-41-generic-x86_64-with-Ubuntu-12.04-precise
Using PETSc directory: /home/dsu/petsc
Using PETSc arch: linux-gnu-opt
-----------------------------------------

Using C compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/dsu/petsc/linux-gnu-opt/bin/mpif90  -fPIC  -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/dsu/petsc/linux-gnu-opt/include -I/home/dsu/petsc/include -I/home/dsu/petsc/include -I/home/dsu/petsc/linux-gnu-opt/include
-----------------------------------------

Using C linker: /home/dsu/petsc/linux-gnu-opt/bin/mpicc
Using Fortran linker: /home/dsu/petsc/linux-gnu-opt/bin/mpif90
Using libraries: -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lpetsc -Wl,-rpath,/home/dsu/petsc/linux-gnu-opt/lib -L/home/dsu/petsc/linux-gnu-opt/lib -lflapack -lfblas -lpthread -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl 
-----------------------------------------



More information about the petsc-users mailing list