[petsc-users] hypre / hip usage

Jed Brown jed at jedbrown.org
Fri Jan 21 09:44:41 CST 2022


Mark Adams <mfadams at lbl.gov> writes:

>>
>>
>>
>> > Is there a way to tell from log_view data that hypre is running on the
>> GPU?
>>
>> Is it clear from data transfer within PCApply?
>>
>>
> Well, this does not look right. '-mat_type hypre' fails. I guess we have to
> get that working or could/should it work with -mat_type aijkokkos ?
>
> --- Event Stage 2: KSP Solve only
>
> MatMult              230 1.0 1.0922e-01 2.0 1.50e+07 2.1 2.3e+06 2.7e+02
> 0.0e+00  1 58 81 64  0   3 91100100  0 62942       0      0 0.00e+00  920
> 4.26e+00  0
> KSPSolve              10 1.0 3.0406e+00 1.0 1.64e+07 2.0 2.3e+06 2.7e+02
> 7.0e+02 51 64 81 64 74 100100100100100  2488    4253    230 8.99e-01 1620
> 4.27e+00  9

This 9% on GPU isn't good. For comparison (debug on my laptop)

$ ompi-cuda-g/tests/snes/tutorials/ex5 -da_refine 7 -dm_mat_type aijcusparse -dm_vec_type cuda -pc_type hypre -log_view
[...]
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          1 1.0 4.7631e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatMult               28 1.0 1.0347e-02 1.0 3.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1 30  0  0  0   1 30  0  0  0  3601    8498      4 2.72e+01    0 0.00e+00 100
MatConvert             4 1.0 4.0087e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyBegin       9 1.0 2.2978e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd         9 1.0 2.0741e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatGetRowIJ            4 1.0 8.6590e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatView                1 1.0 5.0983e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatCUSPARSCopyTo       4 1.0 5.4931e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      4 2.72e+01    0 0.00e+00  0
KSPSetUp               4 1.0 1.9243e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               4 1.0 1.0843e-01 1.0 1.03e+08 1.0 0.0e+00 0.0e+00 0.0e+00  9 83  0  0  0   9 83  0  0  0   945    7695      4 2.72e+01    0 0.00e+00 100
KSPGMRESOrthog        24 1.0 5.3822e-03 1.0 4.98e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 40  0  0  0   0 40  0  0  0  9255   16695      0 0.00e+00    0 0.00e+00 100

So all the recorded flops are on the GPU.

SNESSolve              1 1.0 9.3487e-01 1.0 1.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 76100  0  0  0  76100  0  0  0   132    6212     11 3.55e+01   10 1.19e+01 93
SNESSetUp              1 1.0 2.5863e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21  0  0  0  0  21  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SNESFunctionEval       5 1.0 4.3279e-02 1.0 8.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  4  7  0  0  0   4  7  0  0  0   190       0      1 1.19e+00    7 8.30e+00  0
SNESJacobianEval       4 1.0 4.0748e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 33  0  0  0  0  33  0  0  0  0     0       0      0 0.00e+00    3 3.56e+00  0
SNESLineSearch         4 1.0 3.4112e-02 1.0 1.90e+07 1.0 0.0e+00 0.0e+00 0.0e+00  3 15  0  0  0   3 15  0  0  0   558    2556      5 5.93e+00    5 5.93e+00 65
DMCreateMat            1 1.0 2.5826e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21  0  0  0  0  21  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetGraph             2 1.0 7.5938e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetUp                1 1.0 7.5895e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFPack                17 1.0 3.9599e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFUnpack              17 1.0 3.9463e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecDot                 4 1.0 4.1088e-04 1.0 1.19e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  2886    3698      0 0.00e+00    0 0.00e+00 100
VecMDot               24 1.0 2.9923e-03 1.0 2.49e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 20  0  0  0   0 20  0  0  0  8325   17385      0 0.00e+00    0 0.00e+00 100
VecNorm               37 1.0 8.1704e-03 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  9  0  0  0   1  9  0  0  0  1342    1638      5 5.93e+00    0 0.00e+00 100
VecScale              32 1.0 6.4316e-04 1.0 4.15e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  6453   11772      0 0.00e+00    0 0.00e+00 100
VecCopy               12 1.0 4.6137e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                37 1.0 9.6073e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY                4 1.0 1.5693e-04 1.0 1.19e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  7556   11591      0 0.00e+00    0 0.00e+00 100
VecWAXPY               4 1.0 3.6415e-04 1.0 1.19e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  3256    4372      0 0.00e+00    0 0.00e+00 100
VecMAXPY              28 1.0 2.4212e-03 1.0 3.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 13223   15414      0 0.00e+00    0 0.00e+00 100
VecScatterBegin        9 1.0 1.0676e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      1 1.19e+00    7 8.30e+00  0
VecScatterEnd          9 1.0 9.2955e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecReduceArith         8 1.0 2.2297e-03 1.0 2.37e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  1064    1246      1 1.19e+00    0 0.00e+00 100
VecReduceComm          4 1.0 1.3690e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecNormalize          28 1.0 6.0496e-03 1.0 1.25e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0 10  0  0  0   0 10  0  0  0  2058    2357      0 0.00e+00    0 0.00e+00 100
VecCUDACopyTo          6 1.0 1.1751e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      6 7.11e+00    0 0.00e+00  0
VecCUDACopyFrom        4 1.0 7.8236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    4 4.74e+00  0
PCSetUp                4 1.0 3.6268e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 29  0  0  0  0  29  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
PCApply               28 1.0 8.1240e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  7  0  0  0  0   7  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

and PCApply doesn't transfer anything to the device. Perhaps we should use operator complexity to make a nonzero placeholder for Hypre flops. I don't think BoomerAMG has an API for operator complexity, but it looks like there is code to print it so maybe we can obtain it (or ask them to add an API).


More information about the petsc-users mailing list