[petsc-dev] GPU performance of MatSOR()
Paul Mullowney
paulmullowney at gmail.com
Tue Aug 2 10:30:18 CDT 2022
The implementation is being (slowly) moved into Hypre. We have
primarily used this technique with ILU-based smoothers for AMG. We did some
comparisons against other smoothers like GS but not with Chebyshev or
Polynomial.
For the problems we cared about, ILU was an effective smoother. The power
series representation of the solve provided some nice speedups. I'ved
cc'ed Steve Thomas who could say more.
-Paul
On Sun, Jul 31, 2022 at 10:14 PM Jed Brown <jed at jedbrown.org> wrote:
> Do you have a test that compares this with a polynomial smoother for the
> original problem (like Chebyshev for SPD)?
>
> Paul Mullowney <paulmullowney at gmail.com> writes:
>
> > One could also approximate the SOR triangular solves with a Neumann
> series,
> > where each term in the series is a SpMV (great for GPUs). The number of
> > terms needed in the series is matrix dependent.
> > We've seen this work to great effect for some problems.
> >
> > -Paul
> >
> > On Wed, Jul 27, 2022 at 8:05 PM Barry Smith <bsmith at petsc.dev> wrote:
> >
> >>
> >> There are multicolor versions of SOR that theoretically offer good
> >> parallelism on GPUs but at the cost of multiple phases and slower
> >> convergence rates. Unless someone already has one coded for CUDA or
> Kokkos
> >> it would take a good amount of code to produce one that offers (but does
> >> not necessarily guarantee) reasonable performance on GPUs.
> >>
> >> > On Jul 27, 2022, at 7:57 PM, Jed Brown <jed at jedbrown.org> wrote:
> >> >
> >> > Unfortunately, MatSOR is a really bad operation for GPUs. We can make
> it
> >> use sparse triangular primitives from cuSPARSE, but those run on GPU at
> >> about 20x slower than MatMult with the same sparse matrix. So unless
> MatSOR
> >> reduces iteration count by 20x compared to your next-best
> preconditioning
> >> option, you'll be better off finding a different preconditioner. This
> might
> >> be some elements of multigrid or polynomial smoothing with point-block
> >> Jacobi. If you can explain a bit about your application, we may be able
> to
> >> offer some advice.
> >> >
> >> > Han Tran <hantran at cs.utah.edu> writes:
> >> >
> >> >> Hello,
> >> >>
> >> >> Running my example using VECMPICUDA for VecSetType(), and
> MATMPIAIJCUSP
> >> for MatSetType(), I have the profiling results as shown below. It is
> seen
> >> that MatSOR() has %F of GPU, only has GpuToCpu count and size. Is it
> >> correct that PETSc currently does not have MatSOR implemented on GPU? It
> >> would be appreciated if you can provide an explanation on how MatSOR()
> >> currently use GPU. From this example, MatSOR takes a considerable time
> >> relatively compared to other functions.
> >> >>
> >> >> Thank you.
> >> >>
> >> >> -Han
> >> >>
> >> >>
> >>
> ------------------------------------------------------------------------------------------------------------------------
> >> >> Event Count Time (sec) Flop
> >> --- Global --- --- Stage ---- Total GPU - CpuToGpu -
> -
> >> GpuToCpu - GPU
> >> >> Max Ratio Max Ratio Max Ratio Mess
> AvgLen
> >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size
> >> Count Size %F
> >> >>
> >>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> >> >>
> >> >> --- Event Stage 0: Main Stage
> >> >>
> >> >> BuildTwoSided 220001 1.0 3.9580e+02139.9 0.00e+00 0.0 2.0e+00
> >> 4.0e+00 2.2e+05 4 0 0 0 20 4 0 0 0 20 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> BuildTwoSidedF 220000 1.0 3.9614e+02126.4 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 2.2e+05 4 0 0 0 20 4 0 0 0 20 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> VecMDot 386001 1.0 6.3426e+01 1.5 1.05e+11 1.0 0.0e+00
> >> 0.0e+00 3.9e+05 1 11 0 0 35 1 11 0 0 35 3311 26012 386001
> >> 1.71e+05 0 0.00e+00 100
> >> >> VecNorm 496001 1.0 5.0877e+01 1.2 5.49e+10 1.0 0.0e+00
> >> 0.0e+00 5.0e+05 1 6 0 0 45 1 6 0 0 45 2159 3707 110000
> >> 4.87e+04 0 0.00e+00 100
> >> >> VecScale 496001 1.0 7.9951e+00 1.0 2.75e+10 1.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 6869 13321 0
> >> 0.00e+00 0 0.00e+00 100
> >> >> VecCopy 110000 1.0 1.9323e+00 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> VecSet 330017 1.0 5.4319e+00 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> VecAXPY 110000 1.0 1.5820e+00 1.0 1.22e+10 1.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 15399 35566 0
> >> 0.00e+00 0 0.00e+00 100
> >> >> VecMAXPY 496001 1.0 1.1505e+01 1.0 1.48e+11 1.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 16 0 0 0 0 16 0 0 0 25665 39638 0
> >> 0.00e+00 0 0.00e+00 100
> >> >> VecAssemblyBegin 110000 1.0 1.2021e+00 1.2 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 1.1e+05 0 0 0 0 10 0 0 0 0 10 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> VecAssemblyEnd 110000 1.0 1.5988e-01 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> VecScatterBegin 496001 1.0 1.3002e+01 1.0 0.00e+00 0.0 9.9e+05
> >> 1.3e+04 1.0e+00 0 0100100 0 0 0100100 0 0 0 110000
> >> 4.87e+04 0 0.00e+00 0
> >> >> VecScatterEnd 496001 1.0 1.8988e+01 1.3 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> VecNormalize 496001 1.0 5.8797e+01 1.1 8.24e+10 1.0 0.0e+00
> >> 0.0e+00 5.0e+05 1 9 0 0 45 1 9 0 0 45 2802 4881 110000
> >> 4.87e+04 0 0.00e+00 100
> >> >> VecCUDACopyTo 716001 1.0 3.4483e+01 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 716001
> >> 3.17e+05 0 0.00e+00 0
> >> >> VecCUDACopyFrom 1211994 1.0 5.1752e+01 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0
> >> 0.00e+00 1211994 5.37e+05 0
> >> >> MatMult 386001 1.0 4.8436e+01 1.0 1.90e+11 1.0 7.7e+05
> >> 1.3e+04 0.0e+00 1 21 78 78 0 1 21 78 78 0 7862 16962 0
> >> 0.00e+00 0 0.00e+00 100
> >> >> MatMultAdd 110000 1.0 6.2666e+01 1.1 6.03e+10 1.0 2.2e+05
> >> 1.3e+04 1.0e+00 1 7 22 22 0 1 7 22 22 0 1926 16893 440000
> >> 3.39e+05 0 0.00e+00 100
> >> >> MatSOR 496001 1.0 5.1821e+02 1.1 2.83e+11 1.0 0.0e+00
> >> 0.0e+00 0.0e+00 10 31 0 0 0 10 31 0 0 0 1090 0 0
> >> 0.00e+00 991994 4.39e+05 0
> >> >> MatAssemblyBegin 110000 1.0 3.9732e+02109.2 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 1.1e+05 4 0 0 0 10 4 0 0 0 10 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> MatAssemblyEnd 110000 1.0 5.3015e-01 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> MatZeroEntries 110000 1.0 1.3179e+01 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> MatCUSPARSCopyTo 220000 1.0 3.2805e+01 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 220000
> >> 2.41e+05 0 0.00e+00 0
> >> >> KSPSetUp 110000 1.0 3.5344e-02 1.3 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> KSPSolve 110000 1.0 6.8304e+02 1.0 8.20e+11 1.0 7.7e+05
> >> 1.3e+04 8.8e+05 13 89 78 78 80 13 89 78 78 80 2401 14311 496001
> >> 2.20e+05 991994 4.39e+05 66
> >> >> KSPGMRESOrthog 386001 1.0 7.2820e+01 1.4 2.10e+11 1.0 0.0e+00
> >> 0.0e+00 3.9e+05 1 23 0 0 35 1 23 0 0 35 5765 30176 386001
> >> 1.71e+05 0 0.00e+00 100
> >> >> PCSetUp 110000 1.0 1.8825e-02 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> PCApply 496001 1.0 5.1857e+02 1.1 2.83e+11 1.0 0.0e+00
> >> 0.0e+00 0.0e+00 10 31 0 0 0 10 31 0 0 0 1090 0 0
> >> 0.00e+00 991994 4.39e+05 0
> >> >> SFSetGraph 1 1.0 2.0936e-05 1.1 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> SFSetUp 1 1.0 2.5347e-03 1.0 0.00e+00 0.0 4.0e+00
> >> 3.3e+03 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> SFPack 496001 1.0 3.0026e+00 1.1 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >> SFUnpack 496001 1.0 1.1296e-01 1.0 0.00e+00 0.0 0.0e+00
> >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
> >> 0.00e+00 0 0.00e+00 0
> >> >>
> >>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220802/753ed3c9/attachment-0001.html>
More information about the petsc-dev
mailing list