[petsc-dev] [petsc-maint] running CUDA on SUMMIT

Wed Aug 14 17:59:33 CDT 2019

> On Aug 14, 2019, at 3:36 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> 
> 
> On Wed, Aug 14, 2019 at 3:37 PM Jed Brown <jed at jedbrown.org> wrote:
> Mark Adams via petsc-dev <petsc-dev at mcs.anl.gov> writes:
> 
> > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> >
> >>
> >>   Mark,
> >>
> >>    Would you be able to make one run using single precision? Just single
> >> everywhere since that is all we support currently?
> >>
> >>
> > Experience in engineering at least is single does not work for FE
> > elasticity. I have tried it many years ago and have heard this from others.
> > This problem is pretty simple other than using Q2. I suppose I could try
> > it, but just be aware the FE people might say that single sucks.
> 
> When they say that single sucks, is it for the definition of the
> operator or the preconditioner?
> 
> Operator.
> 
> And "ve seen GMRES stagnate when using single in communication in parallel Gauss-Seidel. Roundoff is nonlinear.

   When it is specific places in the algorithm that require more precision this can potentially be added. For example compute reductions in double. Even "delicate" parts of the function/Jacobian evaluation. Is it worth the bother? Apparently it is for the people with suitcases of money to hand out.

>  
> 
> As point of reference, we can apply Q2 elasticity operators in double
> precision at nearly a billion dofs/second per GPU. 
> 
> I'm skeptical of big wins in preconditioning (especially setup) due to
> the cost and irregularity of indexing being large compared to the
> bandwidth cost of the floating point values.