[petsc-dev] [petsc-maint] running CUDA on SUMMIT
Smith, Barry F.
bsmith at mcs.anl.gov
Wed Aug 14 18:11:10 CDT 2019
> On Aug 14, 2019, at 5:58 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> "Smith, Barry F." <bsmith at mcs.anl.gov> writes:
>
>>> On Aug 14, 2019, at 2:37 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>
>>> Mark Adams via petsc-dev <petsc-dev at mcs.anl.gov> writes:
>>>
>>>> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>>>>
>>>>>
>>>>> Mark,
>>>>>
>>>>> Would you be able to make one run using single precision? Just single
>>>>> everywhere since that is all we support currently?
>>>>>
>>>>>
>>>> Experience in engineering at least is single does not work for FE
>>>> elasticity. I have tried it many years ago and have heard this from others.
>>>> This problem is pretty simple other than using Q2. I suppose I could try
>>>> it, but just be aware the FE people might say that single sucks.
>>>
>>> When they say that single sucks, is it for the definition of the
>>> operator or the preconditioner?
>>>
>>> As point of reference, we can apply Q2 elasticity operators in double
>>> precision at nearly a billion dofs/second per GPU.
>>
>> And in single you get what?
>
> I don't have exact numbers, but <2x faster on V100, and it sort of
> doesn't matter because preconditioning cost will dominate.
When using block formats a much higher percentage of the bandwidth goes to moving the double precision matrix entries so switching to single could conceivably benefit up to almost a factor of two.
Depending on the matrix structure perhaps the column indices could be handled by a shift and short j indices. Or 2 shifts and 2 sets of j indices
> The big win
> of single is on consumer-grade GPUs, which DOE doesn't install and
> NVIDIA forbids to be used in data centers (because they're so
> cost-effective ;-)).
DOE LCFs are not our only customers. Cheap-o engineering professors might stack a bunch of consumer grade in their lab, would they benefit? Satish's basement could hold a great deal of consumer grades.
>
>>> I'm skeptical of big wins in preconditioning (especially setup) due to
>>> the cost and irregularity of indexing being large compared to the
>>> bandwidth cost of the floating point values.
More information about the petsc-dev
mailing list