[petsc-dev] GPU performance of MatSOR()
Barry Smith
bsmith at petsc.dev
Wed Jul 27 21:05:08 CDT 2022
There are multicolor versions of SOR that theoretically offer good parallelism on GPUs but at the cost of multiple phases and slower convergence rates. Unless someone already has one coded for CUDA or Kokkos it would take a good amount of code to produce one that offers (but does not necessarily guarantee) reasonable performance on GPUs.
> On Jul 27, 2022, at 7:57 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Unfortunately, MatSOR is a really bad operation for GPUs. We can make it use sparse triangular primitives from cuSPARSE, but those run on GPU at about 20x slower than MatMult with the same sparse matrix. So unless MatSOR reduces iteration count by 20x compared to your next-best preconditioning option, you'll be better off finding a different preconditioner. This might be some elements of multigrid or polynomial smoothing with point-block Jacobi. If you can explain a bit about your application, we may be able to offer some advice.
>
> Han Tran <hantran at cs.utah.edu> writes:
>
>> Hello,
>>
>> Running my example using VECMPICUDA for VecSetType(), and MATMPIAIJCUSP for MatSetType(), I have the profiling results as shown below. It is seen that MatSOR() has %F of GPU, only has GpuToCpu count and size. Is it correct that PETSc currently does not have MatSOR implemented on GPU? It would be appreciated if you can provide an explanation on how MatSOR() currently use GPU. From this example, MatSOR takes a considerable time relatively compared to other functions.
>>
>> Thank you.
>>
>> -Han
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
>> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> --- Event Stage 0: Main Stage
>>
>> BuildTwoSided 220001 1.0 3.9580e+02139.9 0.00e+00 0.0 2.0e+00 4.0e+00 2.2e+05 4 0 0 0 20 4 0 0 0 20 0 0 0 0.00e+00 0 0.00e+00 0
>> BuildTwoSidedF 220000 1.0 3.9614e+02126.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+05 4 0 0 0 20 4 0 0 0 20 0 0 0 0.00e+00 0 0.00e+00 0
>> VecMDot 386001 1.0 6.3426e+01 1.5 1.05e+11 1.0 0.0e+00 0.0e+00 3.9e+05 1 11 0 0 35 1 11 0 0 35 3311 26012 386001 1.71e+05 0 0.00e+00 100
>> VecNorm 496001 1.0 5.0877e+01 1.2 5.49e+10 1.0 0.0e+00 0.0e+00 5.0e+05 1 6 0 0 45 1 6 0 0 45 2159 3707 110000 4.87e+04 0 0.00e+00 100
>> VecScale 496001 1.0 7.9951e+00 1.0 2.75e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 6869 13321 0 0.00e+00 0 0.00e+00 100
>> VecCopy 110000 1.0 1.9323e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> VecSet 330017 1.0 5.4319e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> VecAXPY 110000 1.0 1.5820e+00 1.0 1.22e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 15399 35566 0 0.00e+00 0 0.00e+00 100
>> VecMAXPY 496001 1.0 1.1505e+01 1.0 1.48e+11 1.0 0.0e+00 0.0e+00 0.0e+00 0 16 0 0 0 0 16 0 0 0 25665 39638 0 0.00e+00 0 0.00e+00 100
>> VecAssemblyBegin 110000 1.0 1.2021e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+05 0 0 0 0 10 0 0 0 0 10 0 0 0 0.00e+00 0 0.00e+00 0
>> VecAssemblyEnd 110000 1.0 1.5988e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> VecScatterBegin 496001 1.0 1.3002e+01 1.0 0.00e+00 0.0 9.9e+05 1.3e+04 1.0e+00 0 0100100 0 0 0100100 0 0 0 110000 4.87e+04 0 0.00e+00 0
>> VecScatterEnd 496001 1.0 1.8988e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> VecNormalize 496001 1.0 5.8797e+01 1.1 8.24e+10 1.0 0.0e+00 0.0e+00 5.0e+05 1 9 0 0 45 1 9 0 0 45 2802 4881 110000 4.87e+04 0 0.00e+00 100
>> VecCUDACopyTo 716001 1.0 3.4483e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 716001 3.17e+05 0 0.00e+00 0
>> VecCUDACopyFrom 1211994 1.0 5.1752e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 1211994 5.37e+05 0
>> MatMult 386001 1.0 4.8436e+01 1.0 1.90e+11 1.0 7.7e+05 1.3e+04 0.0e+00 1 21 78 78 0 1 21 78 78 0 7862 16962 0 0.00e+00 0 0.00e+00 100
>> MatMultAdd 110000 1.0 6.2666e+01 1.1 6.03e+10 1.0 2.2e+05 1.3e+04 1.0e+00 1 7 22 22 0 1 7 22 22 0 1926 16893 440000 3.39e+05 0 0.00e+00 100
>> MatSOR 496001 1.0 5.1821e+02 1.1 2.83e+11 1.0 0.0e+00 0.0e+00 0.0e+00 10 31 0 0 0 10 31 0 0 0 1090 0 0 0.00e+00 991994 4.39e+05 0
>> MatAssemblyBegin 110000 1.0 3.9732e+02109.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+05 4 0 0 0 10 4 0 0 0 10 0 0 0 0.00e+00 0 0.00e+00 0
>> MatAssemblyEnd 110000 1.0 5.3015e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> MatZeroEntries 110000 1.0 1.3179e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> MatCUSPARSCopyTo 220000 1.0 3.2805e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 220000 2.41e+05 0 0.00e+00 0
>> KSPSetUp 110000 1.0 3.5344e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> KSPSolve 110000 1.0 6.8304e+02 1.0 8.20e+11 1.0 7.7e+05 1.3e+04 8.8e+05 13 89 78 78 80 13 89 78 78 80 2401 14311 496001 2.20e+05 991994 4.39e+05 66
>> KSPGMRESOrthog 386001 1.0 7.2820e+01 1.4 2.10e+11 1.0 0.0e+00 0.0e+00 3.9e+05 1 23 0 0 35 1 23 0 0 35 5765 30176 386001 1.71e+05 0 0.00e+00 100
>> PCSetUp 110000 1.0 1.8825e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> PCApply 496001 1.0 5.1857e+02 1.1 2.83e+11 1.0 0.0e+00 0.0e+00 0.0e+00 10 31 0 0 0 10 31 0 0 0 1090 0 0 0.00e+00 991994 4.39e+05 0
>> SFSetGraph 1 1.0 2.0936e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> SFSetUp 1 1.0 2.5347e-03 1.0 0.00e+00 0.0 4.0e+00 3.3e+03 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> SFPack 496001 1.0 3.0026e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> SFUnpack 496001 1.0 1.1296e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
More information about the petsc-dev
mailing list