[petsc-dev] Performance of Petsc + ViennaCL 1.5.1 (branch:petsc-dev/next)
Mani Chandra
mc0710 at gmail.com
Sat Feb 22 17:05:26 CST 2014
Hi Everyone,
I tested the updated implementation of the viennacl bindings in
petsc-dev/next and I get rather poor performance when using viennacl on
either cpu or gpu. I am using the TS module (type:theta) with a simple
advection equation in 2D with resolution 256x256 and 8 variables. I tested
with the following cases:
1) Single cpu with petsc's old aij mat and vec implementation
2) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the
residual evaluation function on an intel cpu with intel's opencl.
3) Viennacl mat and vec and using VecViennaCLGetArrayRead/Write in the
residual evaluation function on an nvidia gpu.
The first case is the fastest and the other cases are 2-3 times slower.
Attached are the log summaries for each cases and the code I used to test
with. I am running using the following command:
time ./petsc_opencl -ts_monitor -snes_monitor -ts_dt 0.01 -ts_max_steps 10
-ts_type theta -log_summary
Cheers,
Mani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140222/74e479c0/attachment.html>
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./petsc_opencl on a arch-linux2-c-opt named aristophanes with 1 processor, by manic Sat Feb 22 16:51:12 2014
Using Petsc Development GIT revision: v3.4.3-4603-g1457e6e GIT Date: 2014-02-22 21:52:12 +0100
Max Max/Min Avg Total
Time (sec): 9.010e+00 1.00000 9.010e+00
Objects: 5.130e+02 1.00000 5.130e+02
Flops: 5.812e+09 1.00000 5.812e+09 5.812e+09
Flops/sec: 6.451e+08 1.00000 6.451e+08 6.451e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 5.410e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 9.0100e+00 100.0% 5.8124e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 5.400e+02 99.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDot 10 1.0 6.3357e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 2 0 0 0 0 2 1655
VecMDot 10 1.0 5.3320e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 2 0 0 0 0 2 1967
VecNorm 60 1.0 3.1898e-02 1.0 6.29e+07 1.0 0.0e+00 0.0e+00 6.0e+01 0 1 0 0 11 0 1 0 0 11 1972
VecScale 20 1.0 7.7820e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1347
VecCopy 2060 1.0 1.2747e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 0
VecSet 46 1.0 9.8627e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 2020 1.0 1.4122e+00 1.0 2.12e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 36 0 0 0 16 36 0 0 0 1500
VecAXPBYCZ 2030 1.0 1.8831e+00 1.0 3.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 55 0 0 0 21 55 0 0 0 1696
VecWAXPY 10 1.0 8.4498e-03 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 620
VecMAXPY 20 1.0 1.3520e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1551
VecReduceArith 20 1.0 1.0745e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1952
VecReduceComm 10 1.0 1.3113e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 20 1.0 1.8276e-02 1.0 3.15e+07 1.0 0.0e+00 0.0e+00 2.0e+01 0 1 0 0 4 0 1 0 0 4 1721
MatMult 20 1.0 1.0664e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 1662
MatSolve 20 1.0 1.5286e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 1160
MatLUFactorNum 10 1.0 2.7001e-01 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 19
MatILUFactorSym 1 1.0 3.0684e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 27 1.0 1.0252e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 27 1.0 8.5994e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatGetRow 2097152 1.0 1.1401e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatGetRowIJ 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 2.3179e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 10 1.0 2.2097e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorCreate 1 1.0 1.3528e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 1 0 0 0 0 1 0
MatFDColorSetUp 1 1.0 1.7801e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 2 0 0 0 38 2 0 0 0 38 0
MatFDColorApply 10 1.0 7.3000e+00 1.0 5.25e+09 1.0 0.0e+00 0.0e+00 1.0e+01 81 90 0 0 2 81 90 0 0 2 720
MatFDColorFunc 2000 1.0 4.2932e+00 1.0 3.15e+09 1.0 0.0e+00 0.0e+00 0.0e+00 48 54 0 0 0 48 54 0 0 0 733
TSStep 10 1.0 8.8878e+00 1.0 5.81e+09 1.0 0.0e+00 0.0e+00 3.4e+02 99100 0 0 62 99100 0 0 62 654
TSFunctionEval 2020 1.0 2.4572e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 27 0 0 0 0 27 0 0 0 0 0
SNESSolve 10 1.0 8.2366e+00 1.0 5.79e+09 1.0 0.0e+00 0.0e+00 3.0e+02 91100 0 0 55 91100 0 0 56 702
SNESFunctionEval 20 1.0 4.5297e-02 1.0 3.15e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 694
SNESJacobianEval 10 1.0 7.5115e+00 1.0 5.25e+09 1.0 0.0e+00 0.0e+00 2.3e+02 83 90 0 0 42 83 90 0 0 42 699
SNESLineSearch 10 1.0 1.2340e-01 1.0 1.62e+08 1.0 0.0e+00 0.0e+00 3.0e+01 1 3 0 0 6 1 3 0 0 6 1313
KSPGMRESOrthog 10 1.0 1.1343e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 2 0 0 0 0 2 1849
KSPSetUp 10 1.0 6.6357e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 10 1.0 5.7194e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 3.3e+01 6 6 0 0 6 6 6 0 0 6 602
PCSetUp 10 1.0 3.0309e-01 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 3.0e+00 3 0 0 0 1 3 0 0 0 1 17
PCApply 20 1.0 1.5289e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 1160
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 77 77 100912144 0
Vector Scatter 33 33 21516 0
Matrix 18 18 258915948 0
Matrix FD Coloring 1 1 106564692 0
Distributed Mesh 18 18 4413456 0
Bipartite Graph 36 36 29376 0
Index Set 285 285 8609632 0
IS L to G Mapping 34 34 2183464 0
TSAdapt 2 2 2400 0
TS 1 1 1272 0
DMTS 1 1 752 0
SNES 1 1 1348 0
SNESLineSearch 1 1 880 0
DMSNES 1 1 680 0
Krylov Solver 1 1 18376 0
DMKSP interface 1 1 664 0
Preconditioner 1 1 992 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
#PETSc Option Table entries:
-log_summary
-snes_monitor
-ts_dt 0.01
-ts_max_steps 10
-ts_monitor
-ts_type theta
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Feb 22 16:45:37 2014
Configure options: --prefix=/home/manic/petsc_viennacl --with-debugging=0 COPTFLAGS="-O3 -march=native" --with-viennacl=1 --download-viennacl=yes --with-clean=1 --with-opencl=1 -download-f-blas-lapack=yes
-----------------------------------------
Libraries compiled on Sat Feb 22 16:45:37 2014 on aristophanes
Machine characteristics: Linux-3.11.0-15-generic-x86_64-with-Ubuntu-13.10-saucy
Using PETSc directory: /home/manic/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/manic/petsc/arch-linux2-c-opt/include -I/home/manic/petsc/include -I/home/manic/petsc/include -I/home/manic/petsc/arch-linux2-c-opt/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lflapack -lfblas -lX11 -lpthread -lOpenCL -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -lgfortran -lm -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
-----------------------------------------
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./petsc_opencl on a arch-linux2-c-opt named aristophanes with 1 processor, by manic Sat Feb 22 16:53:08 2014
Using Petsc Development GIT revision: v3.4.3-4603-g1457e6e GIT Date: 2014-02-22 21:52:12 +0100
Max Max/Min Avg Total
Time (sec): 3.355e+01 1.00000 3.355e+01
Objects: 5.130e+02 1.00000 5.130e+02
Flops: 5.729e+09 1.00000 5.729e+09 5.729e+09
Flops/sec: 1.708e+08 1.00000 1.708e+08 1.708e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 5.410e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.3551e+01 100.0% 5.7290e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 5.400e+02 99.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDot 10 1.0 1.2514e-02 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 2 0 0 0 0 2 838
VecMDot 10 1.0 5.5301e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 2 0 0 0 0 2 1896
VecNorm 60 1.0 4.4416e-02 1.0 6.29e+07 1.0 0.0e+00 0.0e+00 6.0e+01 0 1 0 0 11 0 1 0 0 11 1416
VecScale 20 1.0 7.7469e-03 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1354
VecCopy 2060 1.0 1.2895e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 0
VecSet 46 1.0 3.0879e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 2020 1.0 1.6231e+01 1.0 2.12e+09 1.0 0.0e+00 0.0e+00 0.0e+00 48 37 0 0 0 48 37 0 0 0 130
VecAXPBYCZ 2030 1.0 1.9063e+00 1.0 3.19e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 56 0 0 0 6 56 0 0 0 1675
VecWAXPY 10 1.0 8.4739e-03 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 619
VecMAXPY 20 1.0 1.6943e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1238
VecReduceArith 20 1.0 1.0716e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1957
VecReduceComm 10 1.0 1.3113e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 20 1.0 1.8334e-02 1.0 3.15e+07 1.0 0.0e+00 0.0e+00 2.0e+01 0 1 0 0 4 0 1 0 0 4 1716
VecViennaCLCopyTo 4070 1.0 4.7962e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 0
VecViennaCLCopyFrom 2042 1.0 1.1187e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
MatMult 20 1.0 5.4041e-01 1.0 9.39e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 174
MatSolve 20 1.0 2.3844e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 743
MatLUFactorNum 10 1.0 4.9576e-01 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 11
MatILUFactorSym 1 1.0 6.1537e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 27 1.0 8.5831e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 27 1.0 7.4730e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatGetRow 2097152 1.0 1.1437e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 7.3321e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 10 1.0 2.2906e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorCreate 1 1.0 4.1401e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 1 0 0 0 0 1 0
MatFDColorSetUp 1 1.0 1.7998e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 1 0 0 0 38 1 0 0 0 38 0
MatFDColorApply 10 1.0 3.0321e+01 1.0 5.25e+09 1.0 0.0e+00 0.0e+00 1.0e+01 90 92 0 0 2 90 92 0 0 2 173
MatFDColorFunc 2000 1.0 1.1093e+01 1.0 3.15e+09 1.0 0.0e+00 0.0e+00 0.0e+00 33 55 0 0 0 33 55 0 0 0 284
MatViennaCLCopyTo 11 1.0 6.6829e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
TSStep 10 1.0 3.3248e+01 1.0 5.73e+09 1.0 0.0e+00 0.0e+00 3.4e+02 99100 0 0 62 99100 0 0 62 172
TSFunctionEval 2020 1.0 9.3544e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 28 0 0 0 0 28 0 0 0 0 0
SNESSolve 10 1.0 3.2240e+01 1.0 5.70e+09 1.0 0.0e+00 0.0e+00 3.0e+02 96100 0 0 55 96100 0 0 56 177
SNESFunctionEval 20 1.0 1.7578e-01 1.0 3.15e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 179
SNESJacobianEval 10 1.0 3.0540e+01 1.0 5.25e+09 1.0 0.0e+00 0.0e+00 2.3e+02 91 92 0 0 42 91 92 0 0 42 172
SNESLineSearch 10 1.0 1.8292e-01 1.0 1.20e+08 1.0 0.0e+00 0.0e+00 3.0e+01 1 2 0 0 6 1 2 0 0 6 658
KSPGMRESOrthog 10 1.0 1.4776e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 2 0 0 0 0 2 1419
KSPSetUp 10 1.0 1.1441e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 10 1.0 1.3843e+00 1.0 3.03e+08 1.0 0.0e+00 0.0e+00 3.3e+01 4 5 0 0 6 4 5 0 0 6 219
PCSetUp 10 1.0 5.6497e-01 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 3.0e+00 2 0 0 0 1 2 0 0 0 1 9
PCApply 20 1.0 2.3849e-01 1.0 1.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 743
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 77 77 100912144 0
Vector Scatter 33 33 21516 0
Matrix 18 18 258915948 0
Matrix FD Coloring 1 1 106564692 0
Distributed Mesh 18 18 4413456 0
Bipartite Graph 36 36 29376 0
Index Set 285 285 8609632 0
IS L to G Mapping 34 34 2183464 0
TSAdapt 2 2 2400 0
TS 1 1 1272 0
DMTS 1 1 752 0
SNES 1 1 1348 0
SNESLineSearch 1 1 880 0
DMSNES 1 1 680 0
Krylov Solver 1 1 18376 0
DMKSP interface 1 1 664 0
Preconditioner 1 1 992 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 0
#PETSc Option Table entries:
-log_summary
-snes_monitor
-ts_dt 0.01
-ts_max_steps 10
-ts_monitor
-ts_type theta
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Feb 22 16:45:37 2014
Configure options: --prefix=/home/manic/petsc_viennacl --with-debugging=0 COPTFLAGS="-O3 -march=native" --with-viennacl=1 --download-viennacl=yes --with-clean=1 --with-opencl=1 -download-f-blas-lapack=yes
-----------------------------------------
Libraries compiled on Sat Feb 22 16:45:37 2014 on aristophanes
Machine characteristics: Linux-3.11.0-15-generic-x86_64-with-Ubuntu-13.10-saucy
Using PETSc directory: /home/manic/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/manic/petsc/arch-linux2-c-opt/include -I/home/manic/petsc/include -I/home/manic/petsc/include -I/home/manic/petsc/arch-linux2-c-opt/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lflapack -lfblas -lX11 -lpthread -lOpenCL -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -lgfortran -lm -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
-----------------------------------------
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./petsc_opencl on a arch-linux2-c-opt named aristophanes with 1 processor, by manic Sat Feb 22 16:55:35 2014
Using Petsc Development GIT revision: v3.4.3-4603-g1457e6e GIT Date: 2014-02-22 21:52:12 +0100
Max Max/Min Avg Total
Time (sec): 2.308e+01 1.00000 2.308e+01
Objects: 5.130e+02 1.00000 5.130e+02
Flops: 1.142e+10 1.00000 1.142e+10 1.142e+10
Flops/sec: 4.948e+08 1.00000 4.948e+08 4.948e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 6.010e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.3077e+01 100.0% 1.1418e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 6.000e+02 99.8%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDot 20 1.0 2.9288e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 3 0 0 0 0 3 716
VecMDot 20 1.0 1.0628e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 3 0 0 0 0 3 1973
VecNorm 100 1.0 7.8110e-02 1.0 1.05e+08 1.0 0.0e+00 0.0e+00 1.0e+02 0 1 0 0 17 0 1 0 0 17 1342
VecScale 40 1.0 1.5139e-02 1.0 2.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1385
VecCopy 4100 1.0 2.2702e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 0
VecSet 66 1.0 1.3499e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 4030 1.0 7.6309e-01 1.0 4.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 37 0 0 0 3 37 0 0 0 5538
VecAXPBYCZ 4040 1.0 3.5373e+00 1.0 6.35e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 56 0 0 0 15 56 0 0 0 1796
VecWAXPY 20 1.0 1.6954e-02 1.0 1.05e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 618
VecMAXPY 40 1.0 2.8030e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1496
VecReduceArith 40 1.0 2.1408e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1959
VecReduceComm 20 1.0 2.7657e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 40 1.0 3.6133e-02 1.0 6.29e+07 1.0 0.0e+00 0.0e+00 4.0e+01 0 1 0 0 7 0 1 0 0 7 1741
VecViennaCLCopyTo 8120 1.0 7.0096e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 30 0 0 0 0 30 0 0 0 0 0
VecViennaCLCopyFrom 4072 1.0 3.3473e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 0
MatMult 40 1.0 1.2443e+00 1.0 1.88e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 2 0 0 0 5 2 0 0 0 151
MatSolve 40 1.0 3.2208e-01 1.0 3.55e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 1101
MatLUFactorNum 20 1.0 5.5978e-01 1.0 3.35e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 60
MatILUFactorSym 1 1.0 3.0341e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 37 1.0 1.2875e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 37 1.0 1.4193e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 0
MatGetRow 2097152 1.0 1.1309e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.4181e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 20 1.0 4.4028e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorCreate 1 1.0 6.0701e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorSetUp 1 1.0 1.8087e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+02 1 0 0 0 34 1 0 0 0 34 0
MatFDColorApply 20 1.0 1.8863e+01 1.0 1.05e+10 1.0 0.0e+00 0.0e+00 1.0e+01 82 92 0 0 2 82 92 0 0 2 556
MatFDColorFunc 4000 1.0 1.1333e+01 1.0 6.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00 49 55 0 0 0 49 55 0 0 0 555
MatViennaCLCopyTo 21 1.0 1.2915e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 6 0 0 0 0 0
TSStep 10 1.0 2.2924e+01 1.0 1.14e+10 1.0 0.0e+00 0.0e+00 4.0e+02 99100 0 0 66 99100 0 0 66 498
TSFunctionEval 4030 1.0 7.8679e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0
SNESSolve 10 1.0 2.1611e+01 1.0 1.14e+10 1.0 0.0e+00 0.0e+00 3.6e+02 94100 0 0 60 94100 0 0 60 527
SNESFunctionEval 30 1.0 8.8007e-02 1.0 4.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 536
SNESJacobianEval 20 1.0 1.9106e+01 1.0 1.05e+10 1.0 0.0e+00 0.0e+00 2.3e+02 83 92 0 0 38 83 92 0 0 38 549
SNESLineSearch 20 1.0 3.1601e-01 1.0 2.41e+08 1.0 0.0e+00 0.0e+00 6.0e+01 1 2 0 0 10 1 2 0 0 10 762
KSPGMRESOrthog 20 1.0 2.3219e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 3 0 0 0 0 3 1806
KSPSetUp 20 1.0 2.3451e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 20 1.0 2.1427e+00 1.0 6.29e+08 1.0 0.0e+00 0.0e+00 6.3e+01 9 6 0 0 10 9 6 0 0 10 293
PCSetUp 20 1.0 5.9167e-01 1.0 3.35e+07 1.0 0.0e+00 0.0e+00 3.0e+00 3 0 0 0 0 3 0 0 0 0 57
PCApply 40 1.0 3.2215e-01 1.0 3.55e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 1101
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 77 77 100912144 0
Vector Scatter 33 33 21516 0
Matrix 18 18 258915948 0
Matrix FD Coloring 1 1 106564692 0
Distributed Mesh 18 18 4413456 0
Bipartite Graph 36 36 29376 0
Index Set 285 285 8609632 0
IS L to G Mapping 34 34 2183464 0
TSAdapt 2 2 2400 0
TS 1 1 1272 0
DMTS 1 1 752 0
SNES 1 1 1348 0
SNESLineSearch 1 1 880 0
DMSNES 1 1 680 0
Krylov Solver 1 1 18376 0
DMKSP interface 1 1 664 0
Preconditioner 1 1 992 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 0
#PETSc Option Table entries:
-log_summary
-snes_monitor
-ts_dt 0.01
-ts_max_steps 10
-ts_monitor
-ts_type theta
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Feb 22 16:45:37 2014
Configure options: --prefix=/home/manic/petsc_viennacl --with-debugging=0 COPTFLAGS="-O3 -march=native" --with-viennacl=1 --download-viennacl=yes --with-clean=1 --with-opencl=1 -download-f-blas-lapack=yes
-----------------------------------------
Libraries compiled on Sat Feb 22 16:45:37 2014 on aristophanes
Machine characteristics: Linux-3.11.0-15-generic-x86_64-with-Ubuntu-13.10-saucy
Using PETSc directory: /home/manic/petsc
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -march=native ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/manic/petsc/arch-linux2-c-opt/include -I/home/manic/petsc/include -I/home/manic/petsc/include -I/home/manic/petsc/arch-linux2-c-opt/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/manic/petsc/arch-linux2-c-opt/lib -L/home/manic/petsc/arch-linux2-c-opt/lib -lflapack -lfblas -lX11 -lpthread -lOpenCL -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.6 -L/usr/lib/gcc/x86_64-linux-gnu/4.6 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -lgfortran -lm -lm -lquadmath -lm -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: petsc_opencl_fixed.tar.gz
Type: application/x-gzip
Size: 52248 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20140222/74e479c0/attachment.gz>
More information about the petsc-dev
mailing list