[petsc-dev] [petsc-maint #87339] Re: ex19 on GPU
Daniel Lowell
redratio1 at gmail.com
Tue Sep 20 15:50:25 CDT 2011
Ran on behalf of Satish on Cookie with a single Tesla 2070 Fermi.
Configure Options: --configModules=PETSc.Configure
--optionsModule=PETSc.compilerOptions
--with-mpi-dir=/disks/soft/mpich2-1.3.1-gcc --download-f-blas-lapack=yes
--with-cuda-dir=/soft/cuda-4.0/cuda
--with-thrust-dir=/soft/cuda-4.0/cuda/include
--with-cusp-dir=/soft/cuda-4.0/cuda/include -with-debugging=0
--with-cudac=nvcc --with-precision=double --with-clanguage=c
--with-cuda-arch=sm_20 PETSC_ARCH=structgrid_cuda
./ex19 -da_vec_type seqcusp -da_mat_type seqaijcusp -pc_type none
-dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -mat_no_inode -preload off
-cusp_synchronize -cuda_set_device 0 -log_summary
ex19.cudaCookie2070Fermi.log
lid velocity = 0.0001, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
On Mon, Sep 19, 2011 at 11:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> Satish,
>
> Thanks
>
> Ok, those are all what we expect: so what the hey is wrong with Shiyuan
> machine? Is there another machine you can try on?
>
>
> VecDot 2 1.0 1.2088e-04 1.0 1.60e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1324
> VecMDot 2024 1.0 2.1392e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 35 29 0 0 0 49 29 0 0 0 1189
> VecNorm 2096 1.0 1.1928e-01 1.0 1.68e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 2 0 0 0 3 2 0 0 0 1406
> VecScale 2092 1.0 5.2948e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 1 0 0 0 1 1 0 0 0 1580
> VecCopy 2072 1.0 6.9294e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
> VecSet 70 1.0 1.7152e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 108 1.0 1.3336e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 648
> VecWAXPY 68 1.0 2.0800e-03 1.0 2.72e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1308
> VecMAXPY 2092 1.0 2.9396e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 5 31 0 0 0 7 31 0 0 0 9205
> VecScatterBegin 5 1.0 1.7390e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecReduceArith 2 1.0 9.2983e-04 1.0 1.60e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 172
> VecReduceComm 1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecCUSPCopyTo 49 1.0 9.9115e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecCUSPCopyFrom 44 1.0 1.4175e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> SNESSolve 1 1.0 4.3493e+00 1.0 8.87e+09 1.0 0.0e+00 0.0e+00
> 3.4e+04 71100 0 0100 100100 0 0100 2040
> SNESLineSearch 2 1.0 8.4569e-03 1.0 5.49e+06 1.0 0.0e+00 0.0e+00
> 4.0e+00 0 0 0 0 0 0 0 0 0 0 650
> SNESFunctionEval 3 1.0 6.6361e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00
> 3.0e+00 0 0 0 0 0 0 0 0 0 0 380
> SNESJacobianEval 2 1.0 5.0695e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00
> 4.3e+01 8 0 0 0 0 12 0 0 0 0 76
> KSPGMRESOrthog 2024 1.0 2.4282e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00
> 3.1e+04 40 57 0 0 92 56 57 0 0 93 2095
> KSPSetup 2 1.0 2.9206e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+01 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 2 1.0 3.8301e+00 1.0 8.83e+09 1.0 0.0e+00 0.0e+00
> 3.4e+04 63 99 0 0100 88 99 0 0100 2304
> PCSetUp 2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> PCApply 2024 1.0 6.3726e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
> MatMult 2092 1.0 1.1330e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 19 37 0 0 0 26 37 0 0 0 2931
>
> Barry
>
> On Sep 19, 2011, at 10:44 AM, Satish Balay wrote:
>
> > Attached is the output from the run on breadboard. It has 2 "nVidia
> > Corporation GT200 [Tesla C1060]" cards.
> >
> > Satish
> >
> > --------
> >
> > balay at bb30:~/petsc-dev/src/snes/examples/tutorials>./ex19 -da_vec_type
> seqcusp -da_mat_type seqaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 100
> -da_grid_y 100 -mat_no_inode -preload off -cusp_synchronize
> -cuda_set_device 0 -log_summary ex19.cuda.log
> > lid velocity = 0.0001, prandtl # = 1, grashof # = 1
> > Number of SNES iterations = 2
> > balay at bb30:~/petsc-dev/src/snes/examples/tutorials>
> >
> > On Sun, 18 Sep 2011, Barry Smith wrote:
> >
> >>
> >>
> >> Ok, the copy up and down are not a problem.
> >>
> >> Except for VecMAXPY() the vector operations are terrible (like they
> are not using the GPU, but they must be?) The MatMult() must be GPU because
> it is pretty good 2779???
> >>
> >> Does someone else have access to a similar system and can they run the
> exact same test to see what numbers they get? Satish, could you on
> breadboard? Maybe on Magellion :-)
> >>
> >>
> >> Barry
> >>
> >>
> >>
> >> VecDot 2 1.0 1.7049e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 94
> >> VecMDot 2024 1.0 8.6273e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 50 29 0 0 0 66 29 0 0 0 295
> >> VecNorm 2096 1.0 1.5544e+00 1.0 1.68e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 9 2 0 0 0 12 2 0 0 0 108
> >> VecScale 2092 1.0 3.7774e-01 1.0 8.37e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 3 1 0 0 0 222
> >> VecCopy 2072 1.0 3.8258e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
> >> VecSet 70 1.0 1.3119e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >> VecAXPY 108 1.0 4.7407e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 182
> >> VecWAXPY 68 1.0 1.2545e-02 1.0 2.72e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 217
> >> VecMAXPY 2092 1.0 6.4464e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 4 31 0 0 0 5 31 0 0 0 4198
> >> VecScatterBegin 5 1.0 1.5609e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >> VecReduceArith 2 1.0 3.8650e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 41
> >> VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >> VecCUSPCopyTo 49 1.0 3.0950e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >> VecCUSPCopyFrom 44 1.0 2.0876e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >> SNESSolve 1 1.0 1.3044e+01 1.0 8.87e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 75100 0 0 0 100100 0 0 0 680
> >> SNESLineSearch 2 1.0 1.1921e-02 1.0 5.49e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 461
> >> SNESFunctionEval 3 1.0 2.7192e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 927
> >> SNESJacobianEval 2 1.0 2.0424e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 2 0 0 0 0 188
> >> KSPGMRESOrthog 2024 1.0 9.2522e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 53 57 0 0 0 71 57 0 0 0 550
> >> KSPSetup 2 1.0 5.1975e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >> KSPSolve 2 1.0 1.2819e+01 1.0 8.83e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 74 99 0 0 0 98 99 0 0 0 689
> >> PCSetUp 2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >> PCApply 2024 1.0 3.8054e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
> >> MatMult 2092 1.0 1.1950e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 7 37 0 0 0 9 37 0 0 0 2779
> >>
> >> On Sep 18, 2011, at 10:29 AM, Shiyuan wrote:
> >>
> >>>
> >>>
> >>> On Sat, Sep 17, 2011 at 10:48 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> >>>
> >>> Run the first one with -da_vec_type seqcusp and -da_mat_type
> seqaijcusp
> >>>
> >>>> VecScatterBegin 2097 1.0 1.0270e+00 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
> >>>> VecCUSPCopyTo 2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
> >>>> VecCUSPCopyFrom 2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
> >>>
> >>> Why is it doing all these vector copy ups and downs? It is run on one
> process it shouldn't be doing more than a handful total.
> >>>
> >>> Barry
> >>>
> >>> ./ex19 -da_vec_type seqcusp -da_mat_type seqaijcusp -pc_type none
> -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode
> -preload off -cusp_synchronize -cuda_set_device 0 | tee ex19p2.txt
> >>>
> >>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> >>> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> >>> 0: Main Stage: 4.2393e+00 24.4% 0.0000e+00 0.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> >>> 1: SetUp: 4.9079e-02 0.3% 0.0000e+00 0.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> >>> 2: Solve: 1.3071e+01 75.3% 8.8712e+09 100.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
> >>>
> >>>
> ------------------------------------------------------------------------------------------------------------------------
> >>>
> >>> VecScatterBegin 5 1.0 1.5609e-03 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>> VecReduceArith 2 1.0 3.8650e-03 1.0 1.60e+05 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 41
> >>> VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>> VecCUSPCopyTo 49 1.0 3.0950e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>> VecCUSPCopyFrom 44 1.0 2.0876e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>>
> >>> The complete log is attached. Thanks.
> >>> <ex19p2.txt>
> >>
> >>
> > <ex19.cuda.log>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110920/8ab7853d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ex19.cudaCookie2070Fermi.log
Type: text/x-log
Size: 12212 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110920/8ab7853d/attachment.bin>
More information about the petsc-dev
mailing list