[petsc-users] hypre / hip usage
Jed Brown
jed at jedbrown.org
Fri Jan 21 09:44:41 CST 2022
Mark Adams <mfadams at lbl.gov> writes:
>>
>>
>>
>> > Is there a way to tell from log_view data that hypre is running on the
>> GPU?
>>
>> Is it clear from data transfer within PCApply?
>>
>>
> Well, this does not look right. '-mat_type hypre' fails. I guess we have to
> get that working or could/should it work with -mat_type aijkokkos ?
>
> --- Event Stage 2: KSP Solve only
>
> MatMult 230 1.0 1.0922e-01 2.0 1.50e+07 2.1 2.3e+06 2.7e+02
> 0.0e+00 1 58 81 64 0 3 91100100 0 62942 0 0 0.00e+00 920
> 4.26e+00 0
> KSPSolve 10 1.0 3.0406e+00 1.0 1.64e+07 2.0 2.3e+06 2.7e+02
> 7.0e+02 51 64 81 64 74 100100100100100 2488 4253 230 8.99e-01 1620
> 4.27e+00 9
This 9% on GPU isn't good. For comparison (debug on my laptop)
$ ompi-cuda-g/tests/snes/tutorials/ex5 -da_refine 7 -dm_mat_type aijcusparse -dm_vec_type cuda -pc_type hypre -log_view
[...]
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 1 1.0 4.7631e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatMult 28 1.0 1.0347e-02 1.0 3.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 30 0 0 0 1 30 0 0 0 3601 8498 4 2.72e+01 0 0.00e+00 100
MatConvert 4 1.0 4.0087e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyBegin 9 1.0 2.2978e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyEnd 9 1.0 2.0741e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatGetRowIJ 4 1.0 8.6590e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatView 1 1.0 5.0983e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatCUSPARSCopyTo 4 1.0 5.4931e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 4 2.72e+01 0 0.00e+00 0
KSPSetUp 4 1.0 1.9243e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 4 1.0 1.0843e-01 1.0 1.03e+08 1.0 0.0e+00 0.0e+00 0.0e+00 9 83 0 0 0 9 83 0 0 0 945 7695 4 2.72e+01 0 0.00e+00 100
KSPGMRESOrthog 24 1.0 5.3822e-03 1.0 4.98e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 40 0 0 0 0 40 0 0 0 9255 16695 0 0.00e+00 0 0.00e+00 100
So all the recorded flops are on the GPU.
SNESSolve 1 1.0 9.3487e-01 1.0 1.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 76100 0 0 0 76100 0 0 0 132 6212 11 3.55e+01 10 1.19e+01 93
SNESSetUp 1 1.0 2.5863e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21 0 0 0 0 21 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SNESFunctionEval 5 1.0 4.3279e-02 1.0 8.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 7 0 0 0 4 7 0 0 0 190 0 1 1.19e+00 7 8.30e+00 0
SNESJacobianEval 4 1.0 4.0748e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 33 0 0 0 0 33 0 0 0 0 0 0 0 0.00e+00 3 3.56e+00 0
SNESLineSearch 4 1.0 3.4112e-02 1.0 1.90e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 15 0 0 0 3 15 0 0 0 558 2556 5 5.93e+00 5 5.93e+00 65
DMCreateMat 1 1.0 2.5826e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21 0 0 0 0 21 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetGraph 2 1.0 7.5938e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetUp 1 1.0 7.5895e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFPack 17 1.0 3.9599e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFUnpack 17 1.0 3.9463e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecDot 4 1.0 4.1088e-04 1.0 1.19e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2886 3698 0 0.00e+00 0 0.00e+00 100
VecMDot 24 1.0 2.9923e-03 1.0 2.49e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 20 0 0 0 0 20 0 0 0 8325 17385 0 0.00e+00 0 0.00e+00 100
VecNorm 37 1.0 8.1704e-03 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 9 0 0 0 1 9 0 0 0 1342 1638 5 5.93e+00 0 0.00e+00 100
VecScale 32 1.0 6.4316e-04 1.0 4.15e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 6453 11772 0 0.00e+00 0 0.00e+00 100
VecCopy 12 1.0 4.6137e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 37 1.0 9.6073e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 4 1.0 1.5693e-04 1.0 1.19e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 7556 11591 0 0.00e+00 0 0.00e+00 100
VecWAXPY 4 1.0 3.6415e-04 1.0 1.19e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 3256 4372 0 0.00e+00 0 0.00e+00 100
VecMAXPY 28 1.0 2.4212e-03 1.0 3.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 13223 15414 0 0.00e+00 0 0.00e+00 100
VecScatterBegin 9 1.0 1.0676e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 1 1.19e+00 7 8.30e+00 0
VecScatterEnd 9 1.0 9.2955e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecReduceArith 8 1.0 2.2297e-03 1.0 2.37e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1064 1246 1 1.19e+00 0 0.00e+00 100
VecReduceComm 4 1.0 1.3690e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecNormalize 28 1.0 6.0496e-03 1.0 1.25e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 2058 2357 0 0.00e+00 0 0.00e+00 100
VecCUDACopyTo 6 1.0 1.1751e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 6 7.11e+00 0 0.00e+00 0
VecCUDACopyFrom 4 1.0 7.8236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 4 4.74e+00 0
PCSetUp 4 1.0 3.6268e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 29 0 0 0 0 29 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
PCApply 28 1.0 8.1240e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
and PCApply doesn't transfer anything to the device. Perhaps we should use operator complexity to make a nonzero placeholder for Hypre flops. I don't think BoomerAMG has an API for operator complexity, but it looks like there is code to print it so maybe we can obtain it (or ask them to add an API).
More information about the petsc-users
mailing list