[petsc-dev] [petsc-maint #87339] Re: ex19 on GPU

Barry Smith bsmith at mcs.anl.gov
Mon Sep 19 11:10:16 CDT 2011



  Ok, those are all what we expect: so what the hey is wrong with Shiyuan machine? Is there another machine you can try on?

VecDot                 2 1.0 1.2088e-04 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1324
VecMDot             2024 1.0 2.1392e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 35 29  0  0  0  49 29  0  0  0  1189
VecNorm             2096 1.0 1.1928e-01 1.0 1.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   3  2  0  0  0  1406
VecScale            2092 1.0 5.2948e-02 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  1580
VecCopy             2072 1.0 6.9294e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
VecSet                70 1.0 1.7152e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              108 1.0 1.3336e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   648
VecWAXPY              68 1.0 2.0800e-03 1.0 2.72e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1308
VecMAXPY            2092 1.0 2.9396e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00  5 31  0  0  0   7 31  0  0  0  9205
VecScatterBegin        5 1.0 1.7390e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceArith         2 1.0 9.2983e-04 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   172
VecReduceComm          1 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecCUSPCopyTo         49 1.0 9.9115e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecCUSPCopyFrom       44 1.0 1.4175e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SNESSolve              1 1.0 4.3493e+00 1.0 8.87e+09 1.0 0.0e+00 0.0e+00 3.4e+04 71100  0  0100 100100  0  0100  2040
SNESLineSearch         2 1.0 8.4569e-03 1.0 5.49e+06 1.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0   650
SNESFunctionEval       3 1.0 6.6361e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  0   380
SNESJacobianEval       2 1.0 5.0695e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 4.3e+01  8  0  0  0  0  12  0  0  0  0    76
KSPGMRESOrthog      2024 1.0 2.4282e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00 3.1e+04 40 57  0  0 92  56 57  0  0 93  2095
KSPSetup               2 1.0 2.9206e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               2 1.0 3.8301e+00 1.0 8.83e+09 1.0 0.0e+00 0.0e+00 3.4e+04 63 99  0  0100  88 99  0  0100  2304
PCSetUp                2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             2024 1.0 6.3726e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatMult             2092 1.0 1.1330e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 19 37  0  0  0  26 37  0  0  0  2931


On Sep 19, 2011, at 10:44 AM, Satish Balay wrote:

> Attached is the output from the run on breadboard. It has 2 "nVidia
> Corporation GT200 [Tesla C1060]" cards.
> Satish
> --------
> balay at bb30:~/petsc-dev/src/snes/examples/tutorials>./ex19 -da_vec_type seqcusp -da_mat_type seqaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -mat_no_inode -preload off  -cusp_synchronize -cuda_set_device 0 -log_summary ex19.cuda.log
> lid velocity = 0.0001, prandtl # = 1, grashof # = 1
> Number of SNES iterations = 2
> balay at bb30:~/petsc-dev/src/snes/examples/tutorials>
> On Sun, 18 Sep 2011, Barry Smith wrote:
>>   Ok, the copy up and down are not a problem. 
>>   Except for VecMAXPY() the vector operations are terrible (like they are not using the GPU, but they must be?) The MatMult() must be GPU because it is pretty good 2779??? 
>>   Does someone else have access to a similar system and can they run the exact same test to see what numbers they get? Satish, could you on breadboard? Maybe on Magellion :-)
>>   Barry
>> VecDot                 2 1.0 1.7049e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    94
>> VecMDot             2024 1.0 8.6273e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 50 29  0  0  0  66 29  0  0  0   295
>> VecNorm             2096 1.0 1.5544e+00 1.0 1.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  9  2  0  0  0  12  2  0  0  0   108
>> VecScale            2092 1.0 3.7774e-01 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   3  1  0  0  0   222
>> VecCopy             2072 1.0 3.8258e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   3  0  0  0  0     0
>> VecSet                70 1.0 1.3119e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY              108 1.0 4.7407e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   182
>> VecWAXPY              68 1.0 1.2545e-02 1.0 2.72e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   217
>> VecMAXPY            2092 1.0 6.4464e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00  4 31  0  0  0   5 31  0  0  0  4198
>> VecScatterBegin        5 1.0 1.5609e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecReduceArith         2 1.0 3.8650e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    41
>> VecReduceComm          1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecCUSPCopyTo         49 1.0 3.0950e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecCUSPCopyFrom       44 1.0 2.0876e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> SNESSolve              1 1.0 1.3044e+01 1.0 8.87e+09 1.0 0.0e+00 0.0e+00 0.0e+00 75100  0  0  0 100100  0  0  0   680
>> SNESLineSearch         2 1.0 1.1921e-02 1.0 5.49e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   461
>> SNESFunctionEval       3 1.0 2.7192e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   927
>> SNESJacobianEval       2 1.0 2.0424e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0   188
>> KSPGMRESOrthog      2024 1.0 9.2522e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 53 57  0  0  0  71 57  0  0  0   550
>> KSPSetup               2 1.0 5.1975e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               2 1.0 1.2819e+01 1.0 8.83e+09 1.0 0.0e+00 0.0e+00 0.0e+00 74 99  0  0  0  98 99  0  0  0   689
>> PCSetUp                2 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> PCApply             2024 1.0 3.8054e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   3  0  0  0  0     0
>> MatMult             2092 1.0 1.1950e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00  7 37  0  0  0   9 37  0  0  0  2779
>> On Sep 18, 2011, at 10:29 AM, Shiyuan wrote:
>>> On Sat, Sep 17, 2011 at 10:48 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> Run the first one  with -da_vec_type seqcusp and -da_mat_type seqaijcusp
>>>> VecScatterBegin     2097 1.0 1.0270e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   7  0  0  0  0     0
>>>> VecCUSPCopyTo       2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
>>>> VecCUSPCopyFrom     2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  5  0  0  0  0   7  0  0  0  0     0
>>>  Why is it doing all these vector copy ups and downs? It is run on one process it shouldn't be doing more than a handful total.
>>>  Barry
>>> ./ex19 -da_vec_type seqcusp -da_mat_type seqaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode -preload off  -cusp_synchronize -cuda_set_device 0 | tee ex19p2.txt
>>> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
>>>                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
>>> 0:      Main Stage: 4.2393e+00  24.4%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>> 1:           SetUp: 4.9079e-02   0.3%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>> 2:           Solve: 1.3071e+01  75.3%  8.8712e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>> ------------------------------------------------------------------------------------------------------------------------
>>> VecScatterBegin        5 1.0 1.5609e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecReduceArith         2 1.0 3.8650e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    41
>>> VecReduceComm          1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecCUSPCopyTo         49 1.0 3.0950e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecCUSPCopyFrom       44 1.0 2.0876e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> The complete log is attached.  Thanks. 
>>> <ex19p2.txt>
> <ex19.cuda.log>

More information about the petsc-dev mailing list