[petsc-dev] [petsc-maint #87339] Re: ex19 on GPU
Barry Smith
bsmith at mcs.anl.gov
Mon Sep 19 11:10:16 CDT 2011
Satish,
Thanks
Ok, those are all what we expect: so what the hey is wrong with Shiyuan machine? Is there another machine you can try on?
VecDot 2 1.0 1.2088e-04 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1324
VecMDot 2024 1.0 2.1392e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 29 0 0 0 49 29 0 0 0 1189
VecNorm 2096 1.0 1.1928e-01 1.0 1.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 3 2 0 0 0 1406
VecScale 2092 1.0 5.2948e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1580
VecCopy 2072 1.0 6.9294e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecSet 70 1.0 1.7152e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 108 1.0 1.3336e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 648
VecWAXPY 68 1.0 2.0800e-03 1.0 2.72e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1308
VecMAXPY 2092 1.0 2.9396e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00 5 31 0 0 0 7 31 0 0 0 9205
VecScatterBegin 5 1.0 1.7390e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 2 1.0 9.2983e-04 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 172
VecReduceComm 1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUSPCopyTo 49 1.0 9.9115e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUSPCopyFrom 44 1.0 1.4175e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESSolve 1 1.0 4.3493e+00 1.0 8.87e+09 1.0 0.0e+00 0.0e+00 3.4e+04 71100 0 0100 100100 0 0100 2040
SNESLineSearch 2 1.0 8.4569e-03 1.0 5.49e+06 1.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 650
SNESFunctionEval 3 1.0 6.6361e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 380
SNESJacobianEval 2 1.0 5.0695e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 4.3e+01 8 0 0 0 0 12 0 0 0 0 76
KSPGMRESOrthog 2024 1.0 2.4282e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00 3.1e+04 40 57 0 0 92 56 57 0 0 93 2095
KSPSetup 2 1.0 2.9206e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 3.8301e+00 1.0 8.83e+09 1.0 0.0e+00 0.0e+00 3.4e+04 63 99 0 0100 88 99 0 0100 2304
PCSetUp 2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2024 1.0 6.3726e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatMult 2092 1.0 1.1330e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 19 37 0 0 0 26 37 0 0 0 2931
Barry
On Sep 19, 2011, at 10:44 AM, Satish Balay wrote:
> Attached is the output from the run on breadboard. It has 2 "nVidia
> Corporation GT200 [Tesla C1060]" cards.
>
> Satish
>
> --------
>
> balay at bb30:~/petsc-dev/src/snes/examples/tutorials>./ex19 -da_vec_type seqcusp -da_mat_type seqaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0 -log_summary ex19.cuda.log
> lid velocity = 0.0001, prandtl # = 1, grashof # = 1
> Number of SNES iterations = 2
> balay at bb30:~/petsc-dev/src/snes/examples/tutorials>
>
> On Sun, 18 Sep 2011, Barry Smith wrote:
>
>>
>>
>> Ok, the copy up and down are not a problem.
>>
>> Except for VecMAXPY() the vector operations are terrible (like they are not using the GPU, but they must be?) The MatMult() must be GPU because it is pretty good 2779???
>>
>> Does someone else have access to a similar system and can they run the exact same test to see what numbers they get? Satish, could you on breadboard? Maybe on Magellion :-)
>>
>>
>> Barry
>>
>>
>>
>> VecDot 2 1.0 1.7049e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 94
>> VecMDot 2024 1.0 8.6273e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 50 29 0 0 0 66 29 0 0 0 295
>> VecNorm 2096 1.0 1.5544e+00 1.0 1.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 9 2 0 0 0 12 2 0 0 0 108
>> VecScale 2092 1.0 3.7774e-01 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 3 1 0 0 0 222
>> VecCopy 2072 1.0 3.8258e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
>> VecSet 70 1.0 1.3119e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecAXPY 108 1.0 4.7407e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 182
>> VecWAXPY 68 1.0 1.2545e-02 1.0 2.72e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 217
>> VecMAXPY 2092 1.0 6.4464e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 5 31 0 0 0 4198
>> VecScatterBegin 5 1.0 1.5609e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecReduceArith 2 1.0 3.8650e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 41
>> VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecCUSPCopyTo 49 1.0 3.0950e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> VecCUSPCopyFrom 44 1.0 2.0876e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> SNESSolve 1 1.0 1.3044e+01 1.0 8.87e+09 1.0 0.0e+00 0.0e+00 0.0e+00 75100 0 0 0 100100 0 0 0 680
>> SNESLineSearch 2 1.0 1.1921e-02 1.0 5.49e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 461
>> SNESFunctionEval 3 1.0 2.7192e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 927
>> SNESJacobianEval 2 1.0 2.0424e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 188
>> KSPGMRESOrthog 2024 1.0 9.2522e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 53 57 0 0 0 71 57 0 0 0 550
>> KSPSetup 2 1.0 5.1975e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> KSPSolve 2 1.0 1.2819e+01 1.0 8.83e+09 1.0 0.0e+00 0.0e+00 0.0e+00 74 99 0 0 0 98 99 0 0 0 689
>> PCSetUp 2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>> PCApply 2024 1.0 3.8054e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
>> MatMult 2092 1.0 1.1950e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 9 37 0 0 0 2779
>>
>> On Sep 18, 2011, at 10:29 AM, Shiyuan wrote:
>>
>>>
>>>
>>> On Sat, Sep 17, 2011 at 10:48 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>> Run the first one with -da_vec_type seqcusp and -da_mat_type seqaijcusp
>>>
>>>> VecScatterBegin 2097 1.0 1.0270e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
>>>> VecCUSPCopyTo 2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
>>>> VecCUSPCopyFrom 2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
>>>
>>> Why is it doing all these vector copy ups and downs? It is run on one process it shouldn't be doing more than a handful total.
>>>
>>> Barry
>>>
>>> ./ex19 -da_vec_type seqcusp -da_mat_type seqaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0 | tee ex19p2.txt
>>>
>>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
>>> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
>>> 0: Main Stage: 4.2393e+00 24.4% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>> 1: SetUp: 4.9079e-02 0.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>> 2: Solve: 1.3071e+01 75.3% 8.8712e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>>
>>> VecScatterBegin 5 1.0 1.5609e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecReduceArith 2 1.0 3.8650e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 41
>>> VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecCUSPCopyTo 49 1.0 3.0950e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> VecCUSPCopyFrom 44 1.0 2.0876e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>>
>>> The complete log is attached. Thanks.
>>> <ex19p2.txt>
>>
>>
> <ex19.cuda.log>
More information about the petsc-dev
mailing list