[petsc-dev] Performance of Petsc + ViennaCL 1.5.1 (branch:petsc-dev/next)

Mani Chandra mc0710 at gmail.com
Sat Feb 22 17:05:26 CST 2014


Hi Everyone,

I tested the updated implementation of the viennacl bindings in
petsc-dev/next and I get rather poor performance when using viennacl on
either cpu or gpu. I am using the TS module (type:theta) with a simple
advection equation in 2D with resolution 256x256 and 8 variables. I tested
with the following cases:

1) Single cpu with petsc's old aij mat and vec implementation
2) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the
residual evaluation function on an intel cpu with intel's opencl.
3) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the
residual evaluation function on an nvidia gpu.

The first case is the fastest and the other cases are 2-3 times slower.
Attached are the log summaries for each cases and the code I used to test
with. I am running using the following command:

time ./petsc_opencl -ts_monitor -snes_monitor -ts_dt 0.01 -ts_max_steps 10
-ts_type theta -log_summary

Cheers,
Mani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140222/74e479c0/attachment.html>
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./petsc_opencl on a arch-linux2-c-opt named aristophanes with 1 processor, by manic Sat Feb 22 16:51:12 2014
Using Petsc Development GIT revision: v3.4.3-4603-g1457e6e  GIT Date: 2014-02-22 21:52:12 +0100

                         Max       Max/Min        Avg      Total
Time (sec):           9.010e+00      1.00000   9.010e+00
Objects:              5.130e+02      1.00000   5.130e+02
Flops:                5.812e+09      1.00000   5.812e+09  5.812e+09
Flops/sec:            6.451e+08      1.00000   6.451e+08  6.451e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       5.410e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 9.0100e+00 100.0%  5.8124e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  5.400e+02  99.8%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                10 1.0 6.3357e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  2   0  0  0  0  2  1655
VecMDot               10 1.0 5.3320e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  2   0  0  0  0  2  1967
VecNorm               60 1.0 3.1898e-02 1.0 6.29e+07 1.0 0.0e+00 0.0e+00 6.0e+01  0  1  0  0 11   0  1  0  0 11  1972
VecScale              20 1.0 7.7820e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1347
VecCopy             2060 1.0 1.2747e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  14  0  0  0  0     0
VecSet                46 1.0 9.8627e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             2020 1.0 1.4122e+00 1.0 2.12e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 36  0  0  0  16 36  0  0  0  1500
VecAXPBYCZ          2030 1.0 1.8831e+00 1.0 3.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 55  0  0  0  21 55  0  0  0  1696
VecWAXPY              10 1.0 8.4498e-03 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   620
VecMAXPY              20 1.0 1.3520e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1551
VecReduceArith        20 1.0 1.0745e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1952
VecReduceComm         10 1.0 1.3113e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize          20 1.0 1.8276e-02 1.0 3.15e+07 1.0 0.0e+00 0.0e+00 2.0e+01  0  1  0  0  4   0  1  0  0  4  1721
MatMult               20 1.0 1.0664e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0  1662
MatSolve              20 1.0 1.5286e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  3  0  0  0   2  3  0  0  0  1160
MatLUFactorNum        10 1.0 2.7001e-01 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0    19
MatILUFactorSym        1 1.0 3.0684e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      27 1.0 1.0252e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        27 1.0 8.5994e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetRow        2097152 1.0 1.1401e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetRowIJ            1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.3179e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        10 1.0 2.2097e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatFDColorCreate       1 1.0 1.3528e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatFDColorSetUp        1 1.0 1.7801e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02  2  0  0  0 38   2  0  0  0 38     0
MatFDColorApply       10 1.0 7.3000e+00 1.0 5.25e+09 1.0 0.0e+00 0.0e+00 1.0e+01 81 90  0  0  2  81 90  0  0  2   720
MatFDColorFunc      2000 1.0 4.2932e+00 1.0 3.15e+09 1.0 0.0e+00 0.0e+00 0.0e+00 48 54  0  0  0  48 54  0  0  0   733
TSStep                10 1.0 8.8878e+00 1.0 5.81e+09 1.0 0.0e+00 0.0e+00 3.4e+02 99100  0  0 62  99100  0  0 62   654
TSFunctionEval      2020 1.0 2.4572e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 27  0  0  0  0  27  0  0  0  0     0
SNESSolve             10 1.0 8.2366e+00 1.0 5.79e+09 1.0 0.0e+00 0.0e+00 3.0e+02 91100  0  0 55  91100  0  0 56   702
SNESFunctionEval      20 1.0 4.5297e-02 1.0 3.15e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   694
SNESJacobianEval      10 1.0 7.5115e+00 1.0 5.25e+09 1.0 0.0e+00 0.0e+00 2.3e+02 83 90  0  0 42  83 90  0  0 42   699
SNESLineSearch        10 1.0 1.2340e-01 1.0 1.62e+08 1.0 0.0e+00 0.0e+00 3.0e+01  1  3  0  0  6   1  3  0  0  6  1313
KSPGMRESOrthog        10 1.0 1.1343e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  2   0  0  0  0  2  1849
KSPSetUp              10 1.0 6.6357e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              10 1.0 5.7194e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 3.3e+01  6  6  0  0  6   6  6  0  0  6   602
PCSetUp               10 1.0 3.0309e-01 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 3.0e+00  3  0  0  0  1   3  0  0  0  1    17
PCApply               20 1.0 1.5289e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  3  0  0  0   2  3  0  0  0  1160
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    77             77    100912144     0
      Vector Scatter    33             33        21516     0
              Matrix    18             18    258915948     0
  Matrix FD Coloring     1              1    106564692     0
    Distributed Mesh    18             18      4413456     0
     Bipartite Graph    36             36        29376     0
           Index Set   285            285      8609632     0
   IS L to G Mapping    34             34      2183464     0
             TSAdapt     2              2         2400     0
                  TS     1              1         1272     0
                DMTS     1              1          752     0
                SNES     1              1         1348     0
      SNESLineSearch     1              1          880     0
              DMSNES     1              1          680     0
       Krylov Solver     1              1        18376     0
     DMKSP interface     1              1          664     0
      Preconditioner     1              1          992     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
#PETSc Option Table entries:
-log_summary
-snes_monitor
-ts_dt 0.01
-ts_max_steps 10
-ts_monitor
-ts_type theta
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Feb 22 16:45:37 2014
Configure options: --prefix=/home/manic/petsc_viennacl --with-debugging=0 COPTFLAGS="-O3 -march=native" --with-viennacl=1 --download-viennacl=yes --with-clean=1 --with-opencl=1 -download-f-blas-lapack=yes
-----------------------------------------
Libraries compiled on Sat Feb 22 16:45:37 2014 on aristophanes
Machine characteristics: Linux-3.11.0-15-generic-x86_64-with-Ubuntu-13.10-saucy
Using PETSc directory: /home/manic/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/home/manic/petsc/arch-linux2-c-opt/include -I/home/manic/petsc/include -I/home/manic/petsc/include -I/home/manic/petsc/arch-linux2-c-opt/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lflapack -lfblas -lX11 -lpthread -lOpenCL -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -lgfortran -lm -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
-----------------------------------------
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./petsc_opencl on a arch-linux2-c-opt named aristophanes with 1 processor, by manic Sat Feb 22 16:53:08 2014
Using Petsc Development GIT revision: v3.4.3-4603-g1457e6e  GIT Date: 2014-02-22 21:52:12 +0100

                         Max       Max/Min        Avg      Total
Time (sec):           3.355e+01      1.00000   3.355e+01
Objects:              5.130e+02      1.00000   5.130e+02
Flops:                5.729e+09      1.00000   5.729e+09  5.729e+09
Flops/sec:            1.708e+08      1.00000   1.708e+08  1.708e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       5.410e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.3551e+01 100.0%  5.7290e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  5.400e+02  99.8%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                10 1.0 1.2514e-02 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  2   0  0  0  0  2   838
VecMDot               10 1.0 5.5301e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  2   0  0  0  0  2  1896
VecNorm               60 1.0 4.4416e-02 1.0 6.29e+07 1.0 0.0e+00 0.0e+00 6.0e+01  0  1  0  0 11   0  1  0  0 11  1416
VecScale              20 1.0 7.7469e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1354
VecCopy             2060 1.0 1.2895e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
VecSet                46 1.0 3.0879e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             2020 1.0 1.6231e+01 1.0 2.12e+09 1.0 0.0e+00 0.0e+00 0.0e+00 48 37  0  0  0  48 37  0  0  0   130
VecAXPBYCZ          2030 1.0 1.9063e+00 1.0 3.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00  6 56  0  0  0   6 56  0  0  0  1675
VecWAXPY              10 1.0 8.4739e-03 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   619
VecMAXPY              20 1.0 1.6943e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1238
VecReduceArith        20 1.0 1.0716e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1957
VecReduceComm         10 1.0 1.3113e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize          20 1.0 1.8334e-02 1.0 3.15e+07 1.0 0.0e+00 0.0e+00 2.0e+01  0  1  0  0  4   0  1  0  0  4  1716
VecViennaCLCopyTo    4070 1.0 4.7962e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  14  0  0  0  0     0
VecViennaCLCopyFrom    2042 1.0 1.1187e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatMult               20 1.0 5.4041e-01 1.0 9.39e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0   174
MatSolve              20 1.0 2.3844e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0   743
MatLUFactorNum        10 1.0 4.9576e-01 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    11
MatILUFactorSym        1 1.0 6.1537e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      27 1.0 8.5831e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        27 1.0 7.4730e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatGetRow        2097152 1.0 1.1437e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 7.3321e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        10 1.0 2.2906e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatFDColorCreate       1 1.0 4.1401e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatFDColorSetUp        1 1.0 1.7998e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02  1  0  0  0 38   1  0  0  0 38     0
MatFDColorApply       10 1.0 3.0321e+01 1.0 5.25e+09 1.0 0.0e+00 0.0e+00 1.0e+01 90 92  0  0  2  90 92  0  0  2   173
MatFDColorFunc      2000 1.0 1.1093e+01 1.0 3.15e+09 1.0 0.0e+00 0.0e+00 0.0e+00 33 55  0  0  0  33 55  0  0  0   284
MatViennaCLCopyTo      11 1.0 6.6829e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
TSStep                10 1.0 3.3248e+01 1.0 5.73e+09 1.0 0.0e+00 0.0e+00 3.4e+02 99100  0  0 62  99100  0  0 62   172
TSFunctionEval      2020 1.0 9.3544e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 28  0  0  0  0  28  0  0  0  0     0
SNESSolve             10 1.0 3.2240e+01 1.0 5.70e+09 1.0 0.0e+00 0.0e+00 3.0e+02 96100  0  0 55  96100  0  0 56   177
SNESFunctionEval      20 1.0 1.7578e-01 1.0 3.15e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   179
SNESJacobianEval      10 1.0 3.0540e+01 1.0 5.25e+09 1.0 0.0e+00 0.0e+00 2.3e+02 91 92  0  0 42  91 92  0  0 42   172
SNESLineSearch        10 1.0 1.8292e-01 1.0 1.20e+08 1.0 0.0e+00 0.0e+00 3.0e+01  1  2  0  0  6   1  2  0  0  6   658
KSPGMRESOrthog        10 1.0 1.4776e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  2   0  0  0  0  2  1419
KSPSetUp              10 1.0 1.1441e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              10 1.0 1.3843e+00 1.0 3.03e+08 1.0 0.0e+00 0.0e+00 3.3e+01  4  5  0  0  6   4  5  0  0  6   219
PCSetUp               10 1.0 5.6497e-01 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 3.0e+00  2  0  0  0  1   2  0  0  0  1     9
PCApply               20 1.0 2.3849e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0   743
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    77             77    100912144     0
      Vector Scatter    33             33        21516     0
              Matrix    18             18    258915948     0
  Matrix FD Coloring     1              1    106564692     0
    Distributed Mesh    18             18      4413456     0
     Bipartite Graph    36             36        29376     0
           Index Set   285            285      8609632     0
   IS L to G Mapping    34             34      2183464     0
             TSAdapt     2              2         2400     0
                  TS     1              1         1272     0
                DMTS     1              1          752     0
                SNES     1              1         1348     0
      SNESLineSearch     1              1          880     0
              DMSNES     1              1          680     0
       Krylov Solver     1              1        18376     0
     DMKSP interface     1              1          664     0
      Preconditioner     1              1          992     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 0
#PETSc Option Table entries:
-log_summary
-snes_monitor
-ts_dt 0.01
-ts_max_steps 10
-ts_monitor
-ts_type theta
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Feb 22 16:45:37 2014
Configure options: --prefix=/home/manic/petsc_viennacl --with-debugging=0 COPTFLAGS="-O3 -march=native" --with-viennacl=1 --download-viennacl=yes --with-clean=1 --with-opencl=1 -download-f-blas-lapack=yes
-----------------------------------------
Libraries compiled on Sat Feb 22 16:45:37 2014 on aristophanes
Machine characteristics: Linux-3.11.0-15-generic-x86_64-with-Ubuntu-13.10-saucy
Using PETSc directory: /home/manic/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/home/manic/petsc/arch-linux2-c-opt/include -I/home/manic/petsc/include -I/home/manic/petsc/include -I/home/manic/petsc/arch-linux2-c-opt/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lflapack -lfblas -lX11 -lpthread -lOpenCL -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -lgfortran -lm -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
-----------------------------------------
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./petsc_opencl on a arch-linux2-c-opt named aristophanes with 1 processor, by manic Sat Feb 22 16:55:35 2014
Using Petsc Development GIT revision: v3.4.3-4603-g1457e6e  GIT Date: 2014-02-22 21:52:12 +0100

                         Max       Max/Min        Avg      Total
Time (sec):           2.308e+01      1.00000   2.308e+01
Objects:              5.130e+02      1.00000   5.130e+02
Flops:                1.142e+10      1.00000   1.142e+10  1.142e+10
Flops/sec:            4.948e+08      1.00000   4.948e+08  4.948e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       6.010e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 2.3077e+01 100.0%  1.1418e+10 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  6.000e+02  99.8%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                20 1.0 2.9288e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  3   0  0  0  0  3   716
VecMDot               20 1.0 1.0628e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  3   0  0  0  0  3  1973
VecNorm              100 1.0 7.8110e-02 1.0 1.05e+08 1.0 0.0e+00 0.0e+00 1.0e+02  0  1  0  0 17   0  1  0  0 17  1342
VecScale              40 1.0 1.5139e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1385
VecCopy             4100 1.0 2.2702e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10  0  0  0  0  10  0  0  0  0     0
VecSet                66 1.0 1.3499e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             4030 1.0 7.6309e-01 1.0 4.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00  3 37  0  0  0   3 37  0  0  0  5538
VecAXPBYCZ          4040 1.0 3.5373e+00 1.0 6.35e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 56  0  0  0  15 56  0  0  0  1796
VecWAXPY              20 1.0 1.6954e-02 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   618
VecMAXPY              40 1.0 2.8030e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1496
VecReduceArith        40 1.0 2.1408e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1959
VecReduceComm         20 1.0 2.7657e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize          40 1.0 3.6133e-02 1.0 6.29e+07 1.0 0.0e+00 0.0e+00 4.0e+01  0  1  0  0  7   0  1  0  0  7  1741
VecViennaCLCopyTo    8120 1.0 7.0096e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 30  0  0  0  0  30  0  0  0  0     0
VecViennaCLCopyFrom    4072 1.0 3.3473e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 15  0  0  0  0  15  0  0  0  0     0
MatMult               40 1.0 1.2443e+00 1.0 1.88e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  2  0  0  0   5  2  0  0  0   151
MatSolve              40 1.0 3.2208e-01 1.0 3.55e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0  1101
MatLUFactorNum        20 1.0 5.5978e-01 1.0 3.35e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0    60
MatILUFactorSym        1 1.0 3.0341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      37 1.0 1.2875e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        37 1.0 1.4193e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
MatGetRow        2097152 1.0 1.1309e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.4181e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        20 1.0 4.4028e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatFDColorCreate       1 1.0 6.0701e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatFDColorSetUp        1 1.0 1.8087e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02  1  0  0  0 34   1  0  0  0 34     0
MatFDColorApply       20 1.0 1.8863e+01 1.0 1.05e+10 1.0 0.0e+00 0.0e+00 1.0e+01 82 92  0  0  2  82 92  0  0  2   556
MatFDColorFunc      4000 1.0 1.1333e+01 1.0 6.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00 49 55  0  0  0  49 55  0  0  0   555
MatViennaCLCopyTo      21 1.0 1.2915e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  6  0  0  0  0   6  0  0  0  0     0
TSStep                10 1.0 2.2924e+01 1.0 1.14e+10 1.0 0.0e+00 0.0e+00 4.0e+02 99100  0  0 66  99100  0  0 66   498
TSFunctionEval      4030 1.0 7.8679e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34  0  0  0  0  34  0  0  0  0     0
SNESSolve             10 1.0 2.1611e+01 1.0 1.14e+10 1.0 0.0e+00 0.0e+00 3.6e+02 94100  0  0 60  94100  0  0 60   527
SNESFunctionEval      30 1.0 8.8007e-02 1.0 4.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   536
SNESJacobianEval      20 1.0 1.9106e+01 1.0 1.05e+10 1.0 0.0e+00 0.0e+00 2.3e+02 83 92  0  0 38  83 92  0  0 38   549
SNESLineSearch        20 1.0 3.1601e-01 1.0 2.41e+08 1.0 0.0e+00 0.0e+00 6.0e+01  1  2  0  0 10   1  2  0  0 10   762
KSPGMRESOrthog        20 1.0 2.3219e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  3   0  0  0  0  3  1806
KSPSetUp              20 1.0 2.3451e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              20 1.0 2.1427e+00 1.0 6.29e+08 1.0 0.0e+00 0.0e+00 6.3e+01  9  6  0  0 10   9  6  0  0 10   293
PCSetUp               20 1.0 5.9167e-01 1.0 3.35e+07 1.0 0.0e+00 0.0e+00 3.0e+00  3  0  0  0  0   3  0  0  0  0    57
PCApply               40 1.0 3.2215e-01 1.0 3.55e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0  1101
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    77             77    100912144     0
      Vector Scatter    33             33        21516     0
              Matrix    18             18    258915948     0
  Matrix FD Coloring     1              1    106564692     0
    Distributed Mesh    18             18      4413456     0
     Bipartite Graph    36             36        29376     0
           Index Set   285            285      8609632     0
   IS L to G Mapping    34             34      2183464     0
             TSAdapt     2              2         2400     0
                  TS     1              1         1272     0
                DMTS     1              1          752     0
                SNES     1              1         1348     0
      SNESLineSearch     1              1          880     0
              DMSNES     1              1          680     0
       Krylov Solver     1              1        18376     0
     DMKSP interface     1              1          664     0
      Preconditioner     1              1          992     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 0
#PETSc Option Table entries:
-log_summary
-snes_monitor
-ts_dt 0.01
-ts_max_steps 10
-ts_monitor
-ts_type theta
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Feb 22 16:45:37 2014
Configure options: --prefix=/home/manic/petsc_viennacl --with-debugging=0 COPTFLAGS="-O3 -march=native" --with-viennacl=1 --download-viennacl=yes --with-clean=1 --with-opencl=1 -download-f-blas-lapack=yes
-----------------------------------------
Libraries compiled on Sat Feb 22 16:45:37 2014 on aristophanes
Machine characteristics: Linux-3.11.0-15-generic-x86_64-with-Ubuntu-13.10-saucy
Using PETSc directory: /home/manic/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90  -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/home/manic/petsc/arch-linux2-c-opt/include -I/home/manic/petsc/include -I/home/manic/petsc/include -I/home/manic/petsc/arch-linux2-c-opt/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lflapack -lfblas -lX11 -lpthread -lOpenCL -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -lgfortran -lm -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_opencl_fixed.tar.gz
Type: application/x-gzip
Size: 52248 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140222/74e479c0/attachment.gz>


More information about the petsc-dev mailing list