[petsc-dev] [petsc-maint #87339] Re: ex19 on GPU
Barry Smith
bsmith at mcs.anl.gov
Sat Sep 17 22:48:41 CDT 2011
Run the first one with -da_vec_type seqcusp and -da_mat_type seqaijcusp
> VecScatterBegin 2097 1.0 1.0270e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
> VecCUSPCopyTo 2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
> VecCUSPCopyFrom 2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
Why is it doing all these vector copy ups and downs? It is run on one process it shouldn't be doing more than a handful total.
Barry
On Sep 17, 2011, at 9:56 PM, Shiyuan wrote:
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_show_devices
> CUDA device 0: Tesla M2050
> CUDA device 1: Tesla M2050
> lid velocity = 0.0001, prandtl # = 1, grashof # = 1
> Number of SNES iterations = 2
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex19 on a gpu00CCT- named gpu00.cct.lsu.edu with 1 processor, by sgu Sat Sep 17 20:34:38 2011
> Using Petsc Development HG revision: 94fea4d40b1fcca2e886a14e7fdb916b8f6fecf3 HG Date: Sat Sep 17 00:48:29 2011 -0500
>
> Max Max/Min Avg Total
> Time (sec): 1.928e+01 1.00000 1.928e+01
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 9.039e+09 1.00000 9.039e+09 9.039e+09
> Flops/sec: 4.687e+08 1.00000 4.687e+08 4.687e+08
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 4.3905e+00 22.8% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 6.0178e-02 0.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 1.4834e+01 76.9% 9.0389e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> PetscBarrier 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
> --- Event Stage 1: SetUp
>
> MatAssemblyBegin 1 1.0 1.1921e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 2.0661e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
> MatFDColorCreate 1 1.0 1.8455e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 31 0 0 0 0 0
>
> --- Event Stage 2: Solve
>
> VecDot 2 1.0 1.6947e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 94
> VecMDot 2024 1.0 8.6724e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 45 28 0 0 0 58 28 0 0 0 293
> VecNorm 2096 1.0 1.5712e+00 1.0 3.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 4 0 0 0 11 4 0 0 0 213
> VecScale 2092 1.0 3.7956e-01 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 3 1 0 0 0 220
> VecCopy 2072 1.0 3.8405e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
> VecSet 70 1.0 1.3284e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 108 1.0 4.7269e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 183
> VecWAXPY 68 1.0 1.2537e-02 1.0 2.72e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 217
> VecMAXPY 2092 1.0 6.4375e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 30 0 0 0 4 30 0 0 0 4203
> VecScatterBegin 2097 1.0 1.0270e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
> VecReduceArith 2 1.0 3.7239e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 43
> VecReduceComm 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecCUSPCopyTo 2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
> VecCUSPCopyFrom 2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
> SNESSolve 1 1.0 1.4807e+01 1.0 9.04e+09 1.0 0.0e+00 0.0e+00 0.0e+00 77100 0 0 0 100100 0 0 0 610
> SNESLineSearch 2 1.0 1.2360e-02 1.0 5.81e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 470
> SNESFunctionEval 3 1.0 2.7061e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 931
> SNESJacobianEval 2 1.0 2.4291e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 158
> KSPGMRESOrthog 2024 1.0 9.2966e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 48 56 0 0 0 63 56 0 0 0 547
> KSPSetup 2 1.0 6.2943e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 2 1.0 1.4543e+01 1.0 8.99e+09 1.0 0.0e+00 0.0e+00 0.0e+00 75 99 0 0 0 98 99 0 0 0 618
> PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 2024 1.0 3.8127e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
> MatMult 2092 1.0 2.8551e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 37 0 0 0 19 37 0 0 0 1163
> MatAssemblyBegin 2 1.0 1.8120e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 2 1.0 3.1030e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 2 1.0 1.8611e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatFDColorApply 2 1.0 2.4285e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 158
> MatFDColorFunc 42 1.0 1.2794e-02 1.0 3.53e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2758
> MatCUSPCopyTo 4 1.0 1.6344e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>
> --- Event Stage 1: SetUp
>
> Distributed Mesh 1 0 0 0
> Vector 11 3 4424 0
> Vector Scatter 4 0 0 0
> Index Set 29 9 46600 0
> IS L to G Mapping 3 0 0 0
> SNES 1 0 0 0
> Krylov Solver 2 1 1064 0
> Preconditioner 2 1 752 0
> Matrix 3 0 0 0
> Matrix FD Coloring 1 0 0 0
>
> --- Event Stage 2: Solve
>
> Distributed Mesh 0 1 204840 0
> Vector 74 82 13242416 0
> Vector Scatter 0 4 2448 0
> Index Set 0 20 174720 0
> IS L to G Mapping 0 3 161668 0
> SNES 0 1 1288 0
> Krylov Solver 0 1 18864 0
> Preconditioner 0 1 952 0
> Matrix 0 3 10810468 0
> Matrix FD Coloring 0 1 6510068 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> #PETSc Option Table entries:
> -cuda_show_devices
> -cusp_synchronize
> -da_grid_x 100
> -da_grid_y 100
> -da_mat_type mpiaijcusp
> -da_vec_type mpicusp
> -dmmg_nlevels 1
> -log_summary
> -mat_no_inode
> -pc_type none
> -preload off
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
> Configure run at: Sat Sep 17 11:25:49 2011
> Configure options: PETSC_DIR=/home/sgu/softwares/petsc-dev PETSC_ARCH=gpu00CCT-cxx-nompi-release -with-clanguage=cxx --with-mpi=0 --download-f2cblaslapack=1 --download-f-blas-lapack=1 --with-debugging=0 --with-c2html=0 --with-valgrind-dir=~/softwares/valgrind --with-cuda=1 --with-cusp=1 --with-thrust=1 --with-cuda-arch=sm_20
> -----------------------------------------
> Libraries compiled on Sat Sep 17 11:25:49 2011 on gpu00.cct.lsu.edu
> Machine characteristics: Linux-2.6.32-131.6.1.el6.x86_64-x86_64-with-redhat-6.1-Santiago
> Using PETSc directory: /home/sgu/softwares/petsc-dev
> Using PETSc arch: gpu00CCT-cxx-nompi-release
> -----------------------------------------
>
> Using C compiler: g++ -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/valgrind/include -I/usr/local/cuda/include -I/home/sgu/softwares/petsc-dev/include/mpiuni
> -----------------------------------------
>
> Using C linker: g++
> Using libraries: -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lcublas -lcudart -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lf2clapack -lf2cblas -lm -lm -lstdc++ -ldl
> -----------------------------------------
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 200 -da_grid_y 200 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0
> lid velocity = 2.5e-05, prandtl # = 1, grashof # = 1
> Number of SNES iterations = 2
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex19 on a gpu00CCT- named gpu00.cct.lsu.edu with 1 processor, by sgu Sat Sep 17 20:36:14 2011
> Using Petsc Development HG revision: 94fea4d40b1fcca2e886a14e7fdb916b8f6fecf3 HG Date: Sat Sep 17 00:48:29 2011 -0500
>
> Max Max/Min Avg Total
> Time (sec): 5.042e+01 1.00000 5.042e+01
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 8.283e+10 1.00000 8.283e+10 8.283e+10
> Flops/sec: 1.643e+09 1.00000 1.643e+09 1.643e+09
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 4.6509e+00 9.2% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 2.5148e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 4.5517e+01 90.3% 8.2826e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> PetscBarrier 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
> --- Event Stage 1: SetUp
>
> MatAssemblyBegin 1 1.0 1.4067e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 8.0690e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
> MatFDColorCreate 1 1.0 7.4871e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 30 0 0 0 0 0
>
> --- Event Stage 2: Solve
>
> VecDot 2 1.0 1.6088e-03 1.0 6.40e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 398
> VecMDot 4637 1.0 2.1155e+01 1.0 2.34e+10 1.0 0.0e+00 0.0e+00 0.0e+00 42 28 0 0 0 46 28 0 0 0 1104
> VecNorm 4796 1.0 3.7077e+00 1.0 3.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 7 4 0 0 0 8 4 0 0 0 828
> VecScale 4792 1.0 9.7300e-01 1.0 7.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 788
> VecCopy 4685 1.0 9.9265e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> VecSet 157 1.0 3.0819e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 195 1.0 9.1851e-02 1.0 6.24e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 679
> VecWAXPY 155 1.0 3.3326e-02 1.0 2.48e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 744
> VecMAXPY 4792 1.0 2.6158e+00 1.0 2.48e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 30 0 0 0 6 30 0 0 0 9498
> VecScatterBegin 4797 1.0 4.9713e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 11 0 0 0 0 0
> VecReduceArith 2 1.0 5.0960e-03 1.0 6.40e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126
> VecReduceComm 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecCUSPCopyTo 4840 1.0 6.1929e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecCUSPCopyFrom 4835 1.0 5.0045e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 11 0 0 0 0 0
> SNESSolve 1 1.0 4.5474e+01 1.0 8.28e+10 1.0 0.0e+00 0.0e+00 0.0e+00 90100 0 0 0 100100 0 0 0 1821
> SNESLineSearch 2 1.0 2.3559e-02 1.0 2.33e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 989
> SNESFunctionEval 3 1.0 8.9130e-03 1.0 1.01e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1131
> SNESJacobianEval 2 1.0 9.7259e-01 1.0 1.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 158
> KSPGMRESOrthog 4637 1.0 2.3658e+01 1.0 4.67e+10 1.0 0.0e+00 0.0e+00 0.0e+00 47 56 0 0 0 52 56 0 0 0 1975
> KSPSetup 2 1.0 6.1035e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 2 1.0 4.4465e+01 1.0 8.26e+10 1.0 0.0e+00 0.0e+00 0.0e+00 88100 0 0 0 98100 0 0 0 1859
> PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 4637 1.0 9.8032e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> MatMult 4792 1.0 1.4925e+01 1.0 3.05e+10 1.0 0.0e+00 0.0e+00 0.0e+00 30 37 0 0 0 33 37 0 0 0 2047
> MatAssemblyBegin 2 1.0 2.0027e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 2 1.0 1.2705e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 2 1.0 7.4351e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatFDColorApply 2 1.0 9.7253e-01 1.0 1.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 158
> MatFDColorFunc 42 1.0 5.1462e-02 1.0 1.41e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2742
> MatCUSPCopyTo 4 1.0 4.9795e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>
> --- Event Stage 1: SetUp
>
> Distributed Mesh 1 0 0 0
> Vector 11 3 4424 0
> Vector Scatter 4 0 0 0
> Index Set 29 9 166600 0
> IS L to G Mapping 3 0 0 0
> SNES 1 0 0 0
> Krylov Solver 2 1 1064 0
> Preconditioner 2 1 752 0
> Matrix 3 0 0 0
> Matrix FD Coloring 1 0 0 0
>
> --- Event Stage 2: Solve
>
> Distributed Mesh 0 1 804840 0
> Vector 74 82 52602416 0
> Vector Scatter 0 4 2448 0
> Index Set 0 20 654720 0
> IS L to G Mapping 0 3 641668 0
> SNES 0 1 1288 0
> Krylov Solver 0 1 18864 0
> Preconditioner 0 1 952 0
> Matrix 0 3 43373668 0
> Matrix FD Coloring 0 1 26138868 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 0
> #PETSc Option Table entries:
> -cuda_set_device 0
> -cusp_synchronize
> -da_grid_x 200
> -da_grid_y 200
> -da_mat_type mpiaijcusp
> -da_vec_type mpicusp
> -dmmg_nlevels 1
> -log_summary
> -mat_no_inode
> -pc_type none
> -preload off
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
> Configure run at: Sat Sep 17 11:25:49 2011
> Configure options: PETSC_DIR=/home/sgu/softwares/petsc-dev PETSC_ARCH=gpu00CCT-cxx-nompi-release -with-clanguage=cxx --with-mpi=0 --download-f2cblaslapack=1 --download-f-blas-lapack=1 --with-debugging=0 --with-c2html=0 --with-valgrind-dir=~/softwares/valgrind --with-cuda=1 --with-cusp=1 --with-thrust=1 --with-cuda-arch=sm_20
> -----------------------------------------
> Libraries compiled on Sat Sep 17 11:25:49 2011 on gpu00.cct.lsu.edu
> Machine characteristics: Linux-2.6.32-131.6.1.el6.x86_64-x86_64-with-redhat-6.1-Santiago
> Using PETSc directory: /home/sgu/softwares/petsc-dev
> Using PETSc arch: gpu00CCT-cxx-nompi-release
> -----------------------------------------
>
> Using C compiler: g++ -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/valgrind/include -I/usr/local/cuda/include -I/home/sgu/softwares/petsc-dev/include/mpiuni
> -----------------------------------------
>
> Using C linker: g++
> Using libraries: -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lcublas -lcudart -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lf2clapack -lf2cblas -lm -lm -lstdc++ -ldl
> -----------------------------------------
>
>
>
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 300 -da_grid_y 300 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0
>
> lid velocity = 1.11111e-05, prandtl # = 1, grashof # = 1
> Number of SNES iterations = 2
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex19 on a gpu00CCT- named gpu00.cct.lsu.edu with 1 processor, by sgu Sat Sep 17 20:38:29 2011
> Using Petsc Development HG revision: 94fea4d40b1fcca2e886a14e7fdb916b8f6fecf3 HG Date: Sat Sep 17 00:48:29 2011 -0500
>
> Max Max/Min Avg Total
> Time (sec): 1.095e+02 1.00000 1.095e+02
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 3.136e+11 1.00000 3.136e+11 3.136e+11
> Flops/sec: 2.865e+09 1.00000 2.865e+09 2.865e+09
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 4.4090e+00 4.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 5.6010e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 1.0449e+02 95.5% 3.1360e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> PetscBarrier 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
> --- Event Stage 1: SetUp
>
> MatAssemblyBegin 1 1.0 1.4067e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 1.7501e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
> MatFDColorCreate 1 1.0 1.6907e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 30 0 0 0 0 0
>
> --- Event Stage 2: Solve
>
> VecDot 2 1.0 1.6890e-03 1.0 1.44e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 853
> VecMDot 7803 1.0 3.8534e+01 1.0 8.85e+10 1.0 0.0e+00 0.0e+00 0.0e+00 35 28 0 0 0 37 28 0 0 0 2297
> VecNorm 8068 1.0 6.5087e+00 1.0 1.16e+10 1.0 0.0e+00 0.0e+00 0.0e+00 6 4 0 0 0 6 4 0 0 0 1785
> VecScale 8064 1.0 1.8853e+00 1.0 2.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1540
> VecCopy 7851 1.0 1.9321e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> VecSet 263 1.0 5.4441e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 301 1.0 1.5158e-01 1.0 2.17e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1430
> VecWAXPY 261 1.0 6.9037e-02 1.0 9.40e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1361
> VecMAXPY 8064 1.0 7.6110e+00 1.0 9.41e+10 1.0 0.0e+00 0.0e+00 0.0e+00 7 30 0 0 0 7 30 0 0 0 12366
> VecScatterBegin 8069 1.0 1.2707e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
> VecReduceArith 2 1.0 6.5138e-03 1.0 1.44e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 221
> VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecCUSPCopyTo 8112 1.0 1.0913e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecCUSPCopyFrom 8107 1.0 1.2753e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
> SNESSolve 1 1.0 1.0444e+02 1.0 3.14e+11 1.0 0.0e+00 0.0e+00 0.0e+00 95100 0 0 0 100100 0 0 0 3003
> SNESLineSearch 2 1.0 3.9190e-02 1.0 5.25e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1339
> SNESFunctionEval 3 1.0 1.7656e-02 1.0 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1285
> SNESJacobianEval 2 1.0 2.0955e+00 1.0 3.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 165
> KSPGMRESOrthog 7803 1.0 4.5761e+01 1.0 1.77e+11 1.0 0.0e+00 0.0e+00 0.0e+00 42 56 0 0 0 44 56 0 0 0 3868
> KSPSetup 2 1.0 4.4107e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 2 1.0 1.0228e+02 1.0 3.13e+11 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 98100 0 0 0 3062
> PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 7803 1.0 1.9026e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> MatMult 8064 1.0 4.5629e+01 1.0 1.16e+11 1.0 0.0e+00 0.0e+00 0.0e+00 42 37 0 0 0 44 37 0 0 0 2538
> MatAssemblyBegin 2 1.0 2.0981e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 2 1.0 2.8598e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 2 1.0 1.9902e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatFDColorApply 2 1.0 2.0955e+00 1.0 3.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 165
> MatFDColorFunc 42 1.0 1.1288e-01 1.0 3.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2813
> MatCUSPCopyTo 4 1.0 8.1736e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>
> --- Event Stage 1: SetUp
>
> Distributed Mesh 1 0 0 0
> Vector 11 3 4424 0
> Vector Scatter 4 0 0 0
> Index Set 29 9 366600 0
> IS L to G Mapping 3 0 0 0
> SNES 1 0 0 0
> Krylov Solver 2 1 1064 0
> Preconditioner 2 1 752 0
> Matrix 3 0 0 0
> Matrix FD Coloring 1 0 0 0
>
> --- Event Stage 2: Solve
>
> Distributed Mesh 0 1 1804840 0
> Vector 74 82 118202416 0
> Vector Scatter 0 4 2448 0
> Index Set 0 20 1454720 0
> IS L to G Mapping 0 3 1441668 0
> SNES 0 1 1288 0
> Krylov Solver 0 1 18864 0
> Preconditioner 0 1 952 0
> Matrix 0 3 97696868 0
> Matrix FD Coloring 0 1 58887668 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> #PETSc Option Table entries:
> -cuda_set_device 0
> -cusp_synchronize
> -da_grid_x 300
> -da_grid_y 300
> -da_mat_type mpiaijcusp
> -da_vec_type mpicusp
> -dmmg_nlevels 1
> -log_summary
> -mat_no_inode
> -pc_type none
> -preload off
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
> Configure run at: Sat Sep 17 11:25:49 2011
> Configure options: PETSC_DIR=/home/sgu/softwares/petsc-dev PETSC_ARCH=gpu00CCT-cxx-nompi-release -with-clanguage=cxx --with-mpi=0 --download-f2cblaslapack=1 --download-f-blas-lapack=1 --with-debugging=0 --with-c2html=0 --with-valgrind-dir=~/softwares/valgrind --with-cuda=1 --with-cusp=1 --with-thrust=1 --with-cuda-arch=sm_20
> -----------------------------------------
> Libraries compiled on Sat Sep 17 11:25:49 2011 on gpu00.cct.lsu.edu
> Machine characteristics: Linux-2.6.32-131.6.1.el6.x86_64-x86_64-with-redhat-6.1-Santiago
> Using PETSc directory: /home/sgu/softwares/petsc-dev
> Using PETSc arch: gpu00CCT-cxx-nompi-release
> -----------------------------------------
>
> Using C compiler: g++ -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/valgrind/include -I/usr/local/cuda/include -I/home/sgu/softwares/petsc-dev/include/mpiuni
> -----------------------------------------
>
> Using C linker: g++
> Using libraries: -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lcublas -lcudart -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lf2clapack -lf2cblas -lm -lm -lstdc++ -ldl
> -----------------------------------------
>
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 400 -da_grid_y 400 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0
>
> lid velocity = 6.25e-06, prandtl # = 1, grashof # = 1
> Number of SNES iterations = 2
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./ex19 on a gpu00CCT- named gpu00.cct.lsu.edu with 1 processor, by sgu Sat Sep 17 20:42:05 2011
> Using Petsc Development HG revision: 94fea4d40b1fcca2e886a14e7fdb916b8f6fecf3 HG Date: Sat Sep 17 00:48:29 2011 -0500
>
> Max Max/Min Avg Total
> Time (sec): 1.909e+02 1.00000 1.909e+02
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 7.167e+11 1.00000 7.167e+11 7.167e+11
> Flops/sec: 3.753e+09 1.00000 3.753e+09 3.753e+09
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 4.4291e+00 2.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 1.0122e+00 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 1.8551e+02 97.2% 7.1669e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> PetscBarrier 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
> --- Event Stage 1: SetUp
>
> MatAssemblyBegin 1 1.0 1.5974e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 1 1.0 3.1045e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
> MatFDColorCreate 1 1.0 3.1857e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 31 0 0 0 0 0
>
> --- Event Stage 2: Solve
>
> VecDot 2 1.0 1.8530e-03 1.0 2.56e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1382
> VecMDot 10031 1.0 5.4102e+01 1.0 2.02e+11 1.0 0.0e+00 0.0e+00 0.0e+00 28 28 0 0 0 29 28 0 0 0 3739
> VecNorm 10370 1.0 8.5987e+00 1.0 2.65e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 4 0 0 0 5 4 0 0 0 3087
> VecScale 10366 1.0 2.9179e+00 1.0 6.63e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 2274
> VecCopy 10079 1.0 2.9971e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> VecSet 337 1.0 7.6832e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 375 1.0 2.3210e-01 1.0 4.80e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2068
> VecWAXPY 335 1.0 1.1250e-01 1.0 2.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1906
> VecMAXPY 10366 1.0 1.5716e+01 1.0 2.15e+11 1.0 0.0e+00 0.0e+00 0.0e+00 8 30 0 0 0 8 30 0 0 0 13687
> VecScatterBegin 10371 1.0 2.5508e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 13 0 0 0 0 14 0 0 0 0 0
> VecReduceArith 2 1.0 8.3668e-03 1.0 2.56e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 306
> VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecCUSPCopyTo 10414 1.0 1.5341e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecCUSPCopyFrom 10409 1.0 2.5585e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 13 0 0 0 0 14 0 0 0 0 0
> SNESSolve 1 1.0 1.8546e+02 1.0 7.17e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 100100 0 0 0 3864
> SNESLineSearch 2 1.0 6.2440e-02 1.0 9.33e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1495
> SNESFunctionEval 3 1.0 3.0468e-02 1.0 4.03e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1323
> SNESJacobianEval 2 1.0 3.7313e+00 1.0 6.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 165
> KSPGMRESOrthog 10031 1.0 6.8969e+01 1.0 4.05e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 56 0 0 0 37 56 0 0 0 5865
> KSPSetup 2 1.0 4.6015e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 2 1.0 1.8163e+02 1.0 7.16e+11 1.0 0.0e+00 0.0e+00 0.0e+00 95100 0 0 0 98100 0 0 0 3942
> PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 10031 1.0 2.9429e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> MatMult 10366 1.0 9.7032e+01 1.0 2.65e+11 1.0 0.0e+00 0.0e+00 0.0e+00 51 37 0 0 0 52 37 0 0 0 2729
> MatAssemblyBegin 2 1.0 2.1935e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 2 1.0 5.1729e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 2 1.0 3.1707e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatFDColorApply 2 1.0 3.7312e+00 1.0 6.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 165
> MatFDColorFunc 42 1.0 2.3831e-01 1.0 5.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2369
> MatCUSPCopyTo 4 1.0 1.3754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>
> --- Event Stage 1: SetUp
>
> Distributed Mesh 1 0 0 0
> Vector 11 3 4424 0
> Vector Scatter 4 0 0 0
> Index Set 29 9 646600 0
> IS L to G Mapping 3 0 0 0
> SNES 1 0 0 0
> Krylov Solver 2 1 1064 0
> Preconditioner 2 1 752 0
> Matrix 3 0 0 0
> Matrix FD Coloring 1 0 0 0
>
> --- Event Stage 2: Solve
>
> Distributed Mesh 0 1 3204840 0
> Vector 74 82 210042416 0
> Vector Scatter 0 4 2448 0
> Index Set 0 20 2574720 0
> IS L to G Mapping 0 3 2561668 0
> SNES 0 1 1288 0
> Krylov Solver 0 1 18864 0
> Preconditioner 0 1 952 0
> Matrix 0 3 173780068 0
> Matrix FD Coloring 0 1 104756468 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> #PETSc Option Table entries:
> -cuda_set_device 0
> -cusp_synchronize
> -da_grid_x 400
> -da_grid_y 400
> -da_mat_type mpiaijcusp
> -da_vec_type mpicusp
> -dmmg_nlevels 1
> -log_summary
> -mat_no_inode
> -pc_type none
> -preload off
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
> Configure run at: Sat Sep 17 11:25:49 2011
> Configure options: PETSC_DIR=/home/sgu/softwares/petsc-dev PETSC_ARCH=gpu00CCT-cxx-nompi-release -with-clanguage=cxx --with-mpi=0 --download-f2cblaslapack=1 --download-f-blas-lapack=1 --with-debugging=0 --with-c2html=0 --with-valgrind-dir=~/softwares/valgrind --with-cuda=1 --with-cusp=1 --with-thrust=1 --with-cuda-arch=sm_20
> -----------------------------------------
> Libraries compiled on Sat Sep 17 11:25:49 2011 on gpu00.cct.lsu.edu
> Machine characteristics: Linux-2.6.32-131.6.1.el6.x86_64-x86_64-with-redhat-6.1-Santiago
> Using PETSc directory: /home/sgu/softwares/petsc-dev
> Using PETSc arch: gpu00CCT-cxx-nompi-release
> -----------------------------------------
>
> Using C compiler: g++ -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
> -----------------------------------------
>
> Using include paths: -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/valgrind/include -I/usr/local/cuda/include -I/home/sgu/softwares/petsc-dev/include/mpiuni
> -----------------------------------------
>
> Using C linker: g++
> Using libraries: -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lcublas -lcudart -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lf2clapack -lf2cblas -lm -lm -lstdc++ -ldl
> -----------------------------------------
>
>
> On Sat, Sep 17, 2011 at 4:14 PM, Matthew Knepley <petsc-maint at mcs.anl.gov>wrote:
>
>> On Sat, Sep 17, 2011 at 3:26 PM, Shiyuan <gshy2014 at gmail.com> wrote:
>>
>>> I configure petsc-dev with --with-cuda-arch=sm_20 and rebuild, but it
>>> doesn't help. The performance is essentially the same. The machine has two
>>> Tesla M2050, with CUDA driver 4.0 and cusp 2.0 and I use -cuda-set-device
>>> to
>>> choose one. Any clues what's going wrong ? configure.log is attached.
>>>
>>
>> Can you show me the output of -cuda_show_devices?
>>
> CUDA device 0: Tesla M2050
> CUDA device 1: Tesla M2050
>
>>
>>
>> In order to investigate further, please run with
>>
>> -da_vec_type mpicusp -da_mat_type mpiaijcusp
>>
>> and then a series of sizes
>>
>> -da_grid_x {100,200,300,400} -da_grid_y {100,200,300,400}
>>
>> and send the log summaries.
>>
>> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
> -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode
> -preload off -cusp_synchronize -cuda_show_devices
> CUDA device 0: Tesla M2050
> CUDA device 1: Tesla M2050
>
> Time (sec): 1.928e+01 1.00000 1.928e+01
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 9.039e+09 1.00000 9.039e+09 9.039e+09
> Flops/sec: 4.687e+08 1.00000 4.687e+08 4.687e+08
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
> -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 4.3905e+00 22.8% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 6.0178e-02 0.3% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 1.4834e+01 76.9% 9.0389e+09 100.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> VecMDot 2024 1.0 8.6724e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 45 28 0 0 0 58 28 0 0 0 293
> VecNorm 2096 1.0 1.5712e+00 1.0 3.35e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 8 4 0 0 0 11 4 0 0 0 213
> VecCUSPCopyTo 2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
> VecCUSPCopyFrom 2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
> KSPSolve 2 1.0 1.4543e+01 1.0 8.99e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 75 99 0 0 0 98 99 0 0 0 618
> MatMult 2092 1.0 2.8551e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 15 37 0 0 0 19 37 0 0 0 1163
> MatCUSPCopyTo 4 1.0 1.6344e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
>
>
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
> -dmmg_nlevels 1 -da_grid_x 200 -da_grid_y 200 -log_summary -mat_no_inode
> -preload off -cusp_synchronize -cuda_set_device 0
>
> Time (sec): 5.042e+01 1.00000 5.042e+01
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 8.283e+10 1.00000 8.283e+10 8.283e+10
> Flops/sec: 1.643e+09 1.00000 1.643e+09 1.643e+09
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
> -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 4.6509e+00 9.2% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 2.5148e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 4.5517e+01 90.3% 8.2826e+10 100.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> VecMDot 4637 1.0 2.1155e+01 1.0 2.34e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 42 28 0 0 0 46 28 0 0 0 1104
> VecNorm 4796 1.0 3.7077e+00 1.0 3.07e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 7 4 0 0 0 8 4 0 0 0 828
> VecCUSPCopyTo 4840 1.0 6.1929e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecCUSPCopyFrom 4835 1.0 5.0045e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 10 0 0 0 0 11 0 0 0 0 0
> KSPSolve 2 1.0 4.4465e+01 1.0 8.26e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 88100 0 0 0 98100 0 0 0 1859
> MatMult 4792 1.0 1.4925e+01 1.0 3.05e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 30 37 0 0 0 33 37 0 0 0 2047
> MatCUSPCopyTo 4 1.0 4.9795e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
>
>
>
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
> -dmmg_nlevels 1 -da_grid_x 300 -da_grid_y 300 -log_summary -mat_no_inode
> -preload off -cusp_synchronize -cuda_set_device 0 >> ex19p.txt
> Time (sec): 1.095e+02 1.00000 1.095e+02
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 3.136e+11 1.00000 3.136e+11 3.136e+11
> Flops/sec: 2.865e+09 1.00000 2.865e+09 2.865e+09
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
> -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 4.4090e+00 4.0% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 5.6010e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 1.0449e+02 95.5% 3.1360e+11 100.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> VecMDot 7803 1.0 3.8534e+01 1.0 8.85e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 35 28 0 0 0 37 28 0 0 0 2297
> VecNorm 8068 1.0 6.5087e+00 1.0 1.16e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 6 4 0 0 0 6 4 0 0 0 1785
> VecCUSPCopyTo 8112 1.0 1.0913e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecCUSPCopyFrom 8107 1.0 1.2753e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
> KSPSolve 2 1.0 1.0228e+02 1.0 3.13e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 93100 0 0 0 98100 0 0 0 3062
> MatMult 8064 1.0 4.5629e+01 1.0 1.16e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 42 37 0 0 0 44 37 0 0 0 2538
> MatCUSPCopyTo 4 1.0 8.1736e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
>
>
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
> -dmmg_nlevels 1 -da_grid_x 400 -da_grid_y 400 -log_summary -mat_no_inode
> -preload off -cusp_synchronize -cuda_set_device 0 >> ex19p.txt
>
> Time (sec): 1.909e+02 1.00000 1.909e+02
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 7.167e+11 1.00000 7.167e+11 7.167e+11
> Flops/sec: 3.753e+09 1.00000 3.753e+09 3.753e+09
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
> -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 4.4291e+00 2.3% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 1.0122e+00 0.5% 0.0000e+00 0.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 1.8551e+02 97.2% 7.1669e+11 100.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> VecMDot 10031 1.0 5.4102e+01 1.0 2.02e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 28 28 0 0 0 29 28 0 0 0 3739
> VecNorm 10370 1.0 8.5987e+00 1.0 2.65e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 5 4 0 0 0 5 4 0 0 0 3087
> VecCUSPCopyTo 10414 1.0 1.5341e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecCUSPCopyFrom 10409 1.0 2.5585e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 13 0 0 0 0 14 0 0 0 0 0
> KSPSolve 2 1.0 1.8163e+02 1.0 7.16e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 95100 0 0 0 98100 0 0 0 3942
> MatMult 10366 1.0 9.7032e+01 1.0 2.65e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 51 37 0 0 0 52 37 0 0 0 2729
> MatCUSPCopyTo 4 1.0 1.3754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
> The complete log_summaries are attached.
>
>
>
>
> On Sat, Sep 17, 2011 at 4:14 PM, Matthew Knepley <petsc-maint at mcs.anl.gov> wrote:
> On Sat, Sep 17, 2011 at 3:26 PM, Shiyuan <gshy2014 at gmail.com> wrote:
> I configure petsc-dev with --with-cuda-arch=sm_20 and rebuild, but it
> doesn't help. The performance is essentially the same. The machine has two
> Tesla M2050, with CUDA driver 4.0 and cusp 2.0 and I use -cuda-set-device to
> choose one. Any clues what's going wrong ? configure.log is attached.
>
> Can you show me the output of -cuda_show_devices?
> CUDA device 0: Tesla M2050
> CUDA device 1: Tesla M2050
>
>
> In order to investigate further, please run with
>
> -da_vec_type mpicusp -da_mat_type mpiaijcusp
>
> and then a series of sizes
>
> -da_grid_x {100,200,300,400} -da_grid_y {100,200,300,400}
>
> and send the log summaries.
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_show_devices
> CUDA device 0: Tesla M2050
> CUDA device 1: Tesla M2050
>
> Time (sec): 1.928e+01 1.00000 1.928e+01
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 9.039e+09 1.00000 9.039e+09 9.039e+09
> Flops/sec: 4.687e+08 1.00000 4.687e+08 4.687e+08
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 4.3905e+00 22.8% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 6.0178e-02 0.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 1.4834e+01 76.9% 9.0389e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> VecMDot 2024 1.0 8.6724e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 45 28 0 0 0 58 28 0 0 0 293
> VecNorm 2096 1.0 1.5712e+00 1.0 3.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 4 0 0 0 11 4 0 0 0 213
> VecCUSPCopyTo 2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
> VecCUSPCopyFrom 2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
> KSPSolve 2 1.0 1.4543e+01 1.0 8.99e+09 1.0 0.0e+00 0.0e+00 0.0e+00 75 99 0 0 0 98 99 0 0 0 618
> MatMult 2092 1.0 2.8551e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 37 0 0 0 19 37 0 0 0 1163
> MatCUSPCopyTo 4 1.0 1.6344e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
>
>
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 200 -da_grid_y 200 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0
>
> Time (sec): 5.042e+01 1.00000 5.042e+01
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 8.283e+10 1.00000 8.283e+10 8.283e+10
> Flops/sec: 1.643e+09 1.00000 1.643e+09 1.643e+09
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 4.6509e+00 9.2% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 2.5148e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 4.5517e+01 90.3% 8.2826e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> VecMDot 4637 1.0 2.1155e+01 1.0 2.34e+10 1.0 0.0e+00 0.0e+00 0.0e+00 42 28 0 0 0 46 28 0 0 0 1104
> VecNorm 4796 1.0 3.7077e+00 1.0 3.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 7 4 0 0 0 8 4 0 0 0 828
> VecCUSPCopyTo 4840 1.0 6.1929e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecCUSPCopyFrom 4835 1.0 5.0045e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 11 0 0 0 0 0
> KSPSolve 2 1.0 4.4465e+01 1.0 8.26e+10 1.0 0.0e+00 0.0e+00 0.0e+00 88100 0 0 0 98100 0 0 0 1859
> MatMult 4792 1.0 1.4925e+01 1.0 3.05e+10 1.0 0.0e+00 0.0e+00 0.0e+00 30 37 0 0 0 33 37 0 0 0 2047
> MatCUSPCopyTo 4 1.0 4.9795e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
>
>
>
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 300 -da_grid_y 300 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0 >> ex19p.txt
> Time (sec): 1.095e+02 1.00000 1.095e+02
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 3.136e+11 1.00000 3.136e+11 3.136e+11
> Flops/sec: 2.865e+09 1.00000 2.865e+09 2.865e+09
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 4.4090e+00 4.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 5.6010e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 1.0449e+02 95.5% 3.1360e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> VecMDot 7803 1.0 3.8534e+01 1.0 8.85e+10 1.0 0.0e+00 0.0e+00 0.0e+00 35 28 0 0 0 37 28 0 0 0 2297
> VecNorm 8068 1.0 6.5087e+00 1.0 1.16e+10 1.0 0.0e+00 0.0e+00 0.0e+00 6 4 0 0 0 6 4 0 0 0 1785
> VecCUSPCopyTo 8112 1.0 1.0913e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecCUSPCopyFrom 8107 1.0 1.2753e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
> KSPSolve 2 1.0 1.0228e+02 1.0 3.13e+11 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 98100 0 0 0 3062
> MatMult 8064 1.0 4.5629e+01 1.0 1.16e+11 1.0 0.0e+00 0.0e+00 0.0e+00 42 37 0 0 0 44 37 0 0 0 2538
> MatCUSPCopyTo 4 1.0 8.1736e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
>
>
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 400 -da_grid_y 400 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0 >> ex19p.txt
>
> Time (sec): 1.909e+02 1.00000 1.909e+02
> Objects: 1.320e+02 1.00000 1.320e+02
> Flops: 7.167e+11 1.00000 7.167e+11 7.167e+11
> Flops/sec: 3.753e+09 1.00000 3.753e+09 3.753e+09
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 4.4291e+00 2.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 1: SetUp: 1.0122e+00 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> 2: Solve: 1.8551e+02 97.2% 7.1669e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> VecMDot 10031 1.0 5.4102e+01 1.0 2.02e+11 1.0 0.0e+00 0.0e+00 0.0e+00 28 28 0 0 0 29 28 0 0 0 3739
> VecNorm 10370 1.0 8.5987e+00 1.0 2.65e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 4 0 0 0 5 4 0 0 0 3087
> VecCUSPCopyTo 10414 1.0 1.5341e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> VecCUSPCopyFrom 10409 1.0 2.5585e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 13 0 0 0 0 14 0 0 0 0 0
> KSPSolve 2 1.0 1.8163e+02 1.0 7.16e+11 1.0 0.0e+00 0.0e+00 0.0e+00 95100 0 0 0 98100 0 0 0 3942
> MatMult 10366 1.0 9.7032e+01 1.0 2.65e+11 1.0 0.0e+00 0.0e+00 0.0e+00 51 37 0 0 0 52 37 0 0 0 2729
> MatCUSPCopyTo 4 1.0 1.3754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>
> The complete log_summaries are attached.
More information about the petsc-dev
mailing list