[petsc-dev] GPU performance of MatSOR()
Paul Mullowney
paulmullowney at gmail.com
Thu Jul 28 10:19:04 CDT 2022
One could also approximate the SOR triangular solves with a Neumann series,
where each term in the series is a SpMV (great for GPUs). The number of
terms needed in the series is matrix dependent.
We've seen this work to great effect for some problems.
-Paul
On Wed, Jul 27, 2022 at 8:05 PM Barry Smith <bsmith at petsc.dev> wrote:
>
> There are multicolor versions of SOR that theoretically offer good
> parallelism on GPUs but at the cost of multiple phases and slower
> convergence rates. Unless someone already has one coded for CUDA or Kokkos
> it would take a good amount of code to produce one that offers (but does
> not necessarily guarantee) reasonable performance on GPUs.
>
> > On Jul 27, 2022, at 7:57 PM, Jed Brown <jed at jedbrown.org> wrote:
> >
> > Unfortunately, MatSOR is a really bad operation for GPUs. We can make it
> use sparse triangular primitives from cuSPARSE, but those run on GPU at
> about 20x slower than MatMult with the same sparse matrix. So unless MatSOR
> reduces iteration count by 20x compared to your next-best preconditioning
> option, you'll be better off finding a different preconditioner. This might
> be some elements of multigrid or polynomial smoothing with point-block
> Jacobi. If you can explain a bit about your application, we may be able to
> offer some advice.
> >
> > Han Tran <hantran at cs.utah.edu> writes:
> >
> >> Hello,
> >>
> >> Running my example using VECMPICUDA for VecSetType(), and MATMPIAIJCUSP
> for MatSetType(), I have the profiling results as shown below. It is seen
> that MatSOR() has %F of GPU, only has GpuToCpu count and size. Is it
> correct that PETSc currently does not have MatSOR implemented on GPU? It
> would be appreciated if you can provide an explanation on how MatSOR()
> currently use GPU. From this example, MatSOR takes a considerable time
> relatively compared to other functions.
> >>
> >> Thank you.
> >>
> >> -Han
> >>
> >>
> ------------------------------------------------------------------------------------------------------------------------
> >> Event Count Time (sec) Flop
> --- Global --- --- Stage ---- Total GPU - CpuToGpu - -
> GpuToCpu - GPU
> >> Max Ratio Max Ratio Max Ratio Mess AvgLen
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size
> Count Size %F
> >>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >> --- Event Stage 0: Main Stage
> >>
> >> BuildTwoSided 220001 1.0 3.9580e+02139.9 0.00e+00 0.0 2.0e+00
> 4.0e+00 2.2e+05 4 0 0 0 20 4 0 0 0 20 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> BuildTwoSidedF 220000 1.0 3.9614e+02126.4 0.00e+00 0.0 0.0e+00
> 0.0e+00 2.2e+05 4 0 0 0 20 4 0 0 0 20 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> VecMDot 386001 1.0 6.3426e+01 1.5 1.05e+11 1.0 0.0e+00
> 0.0e+00 3.9e+05 1 11 0 0 35 1 11 0 0 35 3311 26012 386001
> 1.71e+05 0 0.00e+00 100
> >> VecNorm 496001 1.0 5.0877e+01 1.2 5.49e+10 1.0 0.0e+00
> 0.0e+00 5.0e+05 1 6 0 0 45 1 6 0 0 45 2159 3707 110000
> 4.87e+04 0 0.00e+00 100
> >> VecScale 496001 1.0 7.9951e+00 1.0 2.75e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 6869 13321 0
> 0.00e+00 0 0.00e+00 100
> >> VecCopy 110000 1.0 1.9323e+00 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> VecSet 330017 1.0 5.4319e+00 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> VecAXPY 110000 1.0 1.5820e+00 1.0 1.22e+10 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 15399 35566 0
> 0.00e+00 0 0.00e+00 100
> >> VecMAXPY 496001 1.0 1.1505e+01 1.0 1.48e+11 1.0 0.0e+00
> 0.0e+00 0.0e+00 0 16 0 0 0 0 16 0 0 0 25665 39638 0
> 0.00e+00 0 0.00e+00 100
> >> VecAssemblyBegin 110000 1.0 1.2021e+00 1.2 0.00e+00 0.0 0.0e+00
> 0.0e+00 1.1e+05 0 0 0 0 10 0 0 0 0 10 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> VecAssemblyEnd 110000 1.0 1.5988e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> VecScatterBegin 496001 1.0 1.3002e+01 1.0 0.00e+00 0.0 9.9e+05
> 1.3e+04 1.0e+00 0 0100100 0 0 0100100 0 0 0 110000
> 4.87e+04 0 0.00e+00 0
> >> VecScatterEnd 496001 1.0 1.8988e+01 1.3 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> VecNormalize 496001 1.0 5.8797e+01 1.1 8.24e+10 1.0 0.0e+00
> 0.0e+00 5.0e+05 1 9 0 0 45 1 9 0 0 45 2802 4881 110000
> 4.87e+04 0 0.00e+00 100
> >> VecCUDACopyTo 716001 1.0 3.4483e+01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 716001
> 3.17e+05 0 0.00e+00 0
> >> VecCUDACopyFrom 1211994 1.0 5.1752e+01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0
> 0.00e+00 1211994 5.37e+05 0
> >> MatMult 386001 1.0 4.8436e+01 1.0 1.90e+11 1.0 7.7e+05
> 1.3e+04 0.0e+00 1 21 78 78 0 1 21 78 78 0 7862 16962 0
> 0.00e+00 0 0.00e+00 100
> >> MatMultAdd 110000 1.0 6.2666e+01 1.1 6.03e+10 1.0 2.2e+05
> 1.3e+04 1.0e+00 1 7 22 22 0 1 7 22 22 0 1926 16893 440000
> 3.39e+05 0 0.00e+00 100
> >> MatSOR 496001 1.0 5.1821e+02 1.1 2.83e+11 1.0 0.0e+00
> 0.0e+00 0.0e+00 10 31 0 0 0 10 31 0 0 0 1090 0 0
> 0.00e+00 991994 4.39e+05 0
> >> MatAssemblyBegin 110000 1.0 3.9732e+02109.2 0.00e+00 0.0 0.0e+00
> 0.0e+00 1.1e+05 4 0 0 0 10 4 0 0 0 10 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> MatAssemblyEnd 110000 1.0 5.3015e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> MatZeroEntries 110000 1.0 1.3179e+01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> MatCUSPARSCopyTo 220000 1.0 3.2805e+01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 220000
> 2.41e+05 0 0.00e+00 0
> >> KSPSetUp 110000 1.0 3.5344e-02 1.3 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> KSPSolve 110000 1.0 6.8304e+02 1.0 8.20e+11 1.0 7.7e+05
> 1.3e+04 8.8e+05 13 89 78 78 80 13 89 78 78 80 2401 14311 496001
> 2.20e+05 991994 4.39e+05 66
> >> KSPGMRESOrthog 386001 1.0 7.2820e+01 1.4 2.10e+11 1.0 0.0e+00
> 0.0e+00 3.9e+05 1 23 0 0 35 1 23 0 0 35 5765 30176 386001
> 1.71e+05 0 0.00e+00 100
> >> PCSetUp 110000 1.0 1.8825e-02 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> PCApply 496001 1.0 5.1857e+02 1.1 2.83e+11 1.0 0.0e+00
> 0.0e+00 0.0e+00 10 31 0 0 0 10 31 0 0 0 1090 0 0
> 0.00e+00 991994 4.39e+05 0
> >> SFSetGraph 1 1.0 2.0936e-05 1.1 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> SFSetUp 1 1.0 2.5347e-03 1.0 0.00e+00 0.0 4.0e+00
> 3.3e+03 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> SFPack 496001 1.0 3.0026e+00 1.1 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >> SFUnpack 496001 1.0 1.1296e-01 1.0 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0.00e+00 0 0.00e+00 0
> >>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220728/660d0b31/attachment.html>
More information about the petsc-dev
mailing list