[petsc-dev] [GPU] Performance of ex19
Keita Teranishi
keita at cray.com
Tue Aug 31 13:45:59 CDT 2010
Barry,
Your performance data is identical with mine. Could you repost?
Thanks,
================================
Keita Teranishi
Scientific Library Group
Cray, Inc.
keita at cray.com
================================
From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Barry Smith
Sent: Tuesday, August 31, 2010 1:38 PM
To: For users of the development version of PETSc
Subject: Re: [petsc-dev] [GPU] Performance of ex19
Interesting. Some numbers are worse than our older system (MAXPY), some are a bit better, nothing is huge amounts better. Here is the older one
VecDot 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 2024 1.0 1.1760e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 29 0 0 0 32 29 0 0 0 2163
VecNorm 2096 1.0 3.1199e-01 1.0 1.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 2 0 0 0 9 2 0 0 0 537
VecScale 2092 1.0 1.7600e-01 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 5 1 0 0 0 475
VecCopy 2072 1.0 9.1996e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 3 0 0 0 0 0
VecSet 70 1.0 3.9999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 108 1.0 1.5999e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 540
VecWAXPY 68 1.0 7.9999e-03 1.0 2.72e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 340
VecMAXPY 2092 1.0 7.0399e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 31 0 0 0 19 31 0 0 0 3844
VecScatterBegin 5 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUDACopyTo 10 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUDACopyFrom 5 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESSolve 1 1.0 3.6199e+00 1.0 8.87e+09 1.0 0.0e+00 0.0e+00 0.0e+00 56100 0 0 0 100100 0 0 0 2451
SNESLineSearch 2 1.0 7.9999e-03 1.0 5.49e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 687
SNESFunctionEval 3 1.0 3.9999e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 630
SNESJacobianEval 2 1.0 3.0399e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 8 0 0 0 0 127
KSPGMRESOrthog 2024 1.0 1.8280e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 57 0 0 0 50 57 0 0 0 2783
KSPSetup 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 3.3079e+00 1.0 8.83e+09 1.0 0.0e+00 0.0e+00 0.0e+00 51 99 0 0 0 91 99 0 0 0 2668
PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2024 1.0 8.7996e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
MatMult 2092 1.0 8.3197e-01 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 13 37 0 0 0 23 37 0 0 0 3991
MatAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 7.9989e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 4.0002e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorApply 2 1.0 3.0399e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 8 0 0 0 0 127
MatFDColorFunc 42 1.0 1.2000e-02 1.0 3.53e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2940
On Aug 31, 2010, at 2:21 PM, Keita Teranishi wrote:
Barry,
Here it is. The flops rate is better, but the solver is not multilevel anymore :(.
Thanks,
--- Event Stage 0: Main Stage
PetscBarrier 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: SetUp
MatAssemblyBegin 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 8.0001e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 9 0 0 0 0 0
MatFDColorCreate 1 1.0 3.5999e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 41 0 0 0 0 0
--- Event Stage 2: Solve
VecDot 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 2024 1.0 1.1760e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 18 29 0 0 0 32 29 0 0 0 2163
VecNorm 2096 1.0 3.1199e-01 1.0 1.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 2 0 0 0 9 2 0 0 0 537
VecScale 2092 1.0 1.7600e-01 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 1 0 0 0 5 1 0 0 0 475
VecCopy 2072 1.0 9.1996e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 3 0 0 0 0 0
VecSet 70 1.0 3.9999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 108 1.0 1.5999e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 540
VecWAXPY 68 1.0 7.9999e-03 1.0 2.72e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 340
VecMAXPY 2092 1.0 7.0399e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 31 0 0 0 19 31 0 0 0 3844
VecScatterBegin 5 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUDACopyTo 10 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUDACopyFrom 5 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESSolve 1 1.0 3.6199e+00 1.0 8.87e+09 1.0 0.0e+00 0.0e+00 0.0e+00 56100 0 0 0 100100 0 0 0 2451
SNESLineSearch 2 1.0 7.9999e-03 1.0 5.49e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 687
SNESFunctionEval 3 1.0 3.9999e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 630
SNESJacobianEval 2 1.0 3.0399e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 8 0 0 0 0 127
KSPGMRESOrthog 2024 1.0 1.8280e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 57 0 0 0 50 57 0 0 0 2783
KSPSetup 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 3.3079e+00 1.0 8.83e+09 1.0 0.0e+00 0.0e+00 0.0e+00 51 99 0 0 0 91 99 0 0 0 2668
PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2024 1.0 8.7996e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
MatMult 2092 1.0 8.3197e-01 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 13 37 0 0 0 23 37 0 0 0 3991
MatAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 7.9989e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 4.0002e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorApply 2 1.0 3.0399e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 8 0 0 0 0 127
MatFDColorFunc 42 1.0 1.2000e-02 1.0 3.53e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2940
------------------------------------------------------------------------------------------------------------------------
From: petsc-dev-bounces at mcs.anl.gov<mailto:petsc-dev-bounces at mcs.anl.gov> [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Barry Smith
Sent: Tuesday, August 31, 2010 10:53 AM
To: For users of the development version of PETSc
Subject: Re: [petsc-dev] [GPU] Performance of ex19
Please run with the options ./ex19 -da_vec_type seqcuda -da_mat_type seqaijcuda -pc_type none -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode -preload off -cuda_synchronize
On Aug 31, 2010, at 11:45 AM, Keita Teranishi wrote:
Hi PETSc Developer team,
I have just measured the performance of ex19 program running on Fermi GPU. I hope it will help you to develop GPU-enabled PETSc further.
Thanks,
Keita
./ex19 -pc_type jacobi -dmmg_nlevels 5 -da_vec_type cuda -da_mat_type aijcuda -log_summary -cuda_synchronize
--- Event Stage 0: Main Stage
PetscBarrier 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: SetUp
VecSet 8 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUDACopyFrom 8 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMultTranspose 4 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 58 0 0 0 0
MatAssemblyBegin 9 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 9 1.0 3.9999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 14 0 0 0 0 0
MatFDColorCreate 5 1.0 1.2000e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 43 0 0 0 0 0
--- Event Stage 2: Solve
VecDot 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 980 1.0 5.5599e-01 1.0 2.95e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 14 0 0 0 39 28 0 0 0 530
VecNorm 1025 1.0 1.2399e-01 1.0 1.95e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 9 2 0 0 0 158
VecScale 1013 1.0 9.9998e-02 1.0 9.73e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 1 0 0 0 97
VecCopy 208 1.0 3.9999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 45 1.0 7.9989e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
VecAXPY 233 1.0 3.9999e-03 1.0 1.68e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 419
VecWAXPY 33 1.0 3.9990e-03 1.0 3.17e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 79
VecMAXPY 1013 1.0 2.9199e-01 1.0 3.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 5 15 0 0 0 21 30 0 0 0 1074
VecPointwiseMult 988 1.0 9.5995e-02 1.0 9.42e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 1 0 0 0 98
VecScatterBegin 13 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUDACopyTo 24 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUDACopyFrom 21 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 1013 1.0 1.3600e-01 1.0 3.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 18 0 0 0 10 37 0 0 0 2815
MatMultTranspose 8 1.0 3.9999e-03 1.0 1.15e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 29
MatAssemblyBegin 10 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 10 1.0 8.0001e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatZeroEntries 10 1.0 4.0002e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorApply 10 1.0 8.7998e-02 1.0 1.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 6 1 0 0 0 143
MatFDColorFunc 210 1.0 1.2000e-02 1.0 1.15e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 1 0 0 0 958
SNESSolve 1 1.0 1.4160e+00 1.0 1.04e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 50 0 0 0 100100 0 0 0 737
SNESLineSearch 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESFunctionEval 3 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESJacobianEval 2 1.0 9.1998e-02 1.0 1.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 6 1 0 0 0 138
KSPGMRESOrthog 980 1.0 8.3199e-01 1.0 5.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 15 28 0 0 0 59 56 0 0 0 708
KSPSetup 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 1.3240e+00 1.0 1.03e+09 1.0 0.0e+00 0.0e+00 0.0e+00 23 49 0 0 0 93 99 0 0 0 778
PCSetUp 2 1.0 3.9999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 980 1.0 9.5995e-02 1.0 9.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 7 1 0 0 0 98
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20100831/241b0f21/attachment.html>
More information about the petsc-dev
mailing list