[petsc-dev] [petsc-maint] running CUDA on SUMMIT

Wed Aug 14 17:58:09 CDT 2019

"Smith, Barry F." <bsmith at mcs.anl.gov> writes:

>> On Aug 14, 2019, at 2:37 PM, Jed Brown <jed at jedbrown.org> wrote:
>> 
>> Mark Adams via petsc-dev <petsc-dev at mcs.anl.gov> writes:
>> 
>>> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>> 
>>>> 
>>>>  Mark,
>>>> 
>>>>   Would you be able to make one run using single precision? Just single
>>>> everywhere since that is all we support currently?
>>>> 
>>>> 
>>> Experience in engineering at least is single does not work for FE
>>> elasticity. I have tried it many years ago and have heard this from others.
>>> This problem is pretty simple other than using Q2. I suppose I could try
>>> it, but just be aware the FE people might say that single sucks.
>> 
>> When they say that single sucks, is it for the definition of the
>> operator or the preconditioner?
>> 
>> As point of reference, we can apply Q2 elasticity operators in double
>> precision at nearly a billion dofs/second per GPU.
>
>   And in single you get what?

I don't have exact numbers, but <2x faster on V100, and it sort of
doesn't matter because preconditioning cost will dominate.  The big win
of single is on consumer-grade GPUs, which DOE doesn't install and
NVIDIA forbids to be used in data centers (because they're so
cost-effective ;-)).

>> I'm skeptical of big wins in preconditioning (especially setup) due to
>> the cost and irregularity of indexing being large compared to the
>> bandwidth cost of the floating point values.