[petsc-dev] [petsc-maint] running CUDA on SUMMIT

Wed Aug 14 17:48:36 CDT 2019

 Mark,

   Oh, I don't even care if it converges, just put in a fixed number of iterations. The idea is to just get a baseline of the possible improvement. 

    ECP is literally dropping millions into research on "multi precision" computations on GPUs, we need to have some actual numbers for the best potential benefit to determine how much we invest in further investigating it, or not.

    I am not expressing any opinions on the approach, we are just in the fact gathering stage.

   Barry

> On Aug 14, 2019, at 2:27 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> 
> 
> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   Mark,
> 
>    Would you be able to make one run using single precision? Just single everywhere since that is all we support currently? 
> 
> 
> Experience in engineering at least is single does not work for FE elasticity. I have tried it many years ago and have heard this from others. This problem is pretty simple other than using Q2. I suppose I could try it, but just be aware the FE people might say that single sucks.
>  
>    The results will give us motivation (or anti-motivation) to have support for running KSP (or PC (or Mat)  in single precision while the simulation is double.
> 
>    Thanks.
> 
>      Barry
> 
> For example if the GPU speed on KSP is a factor of 3 over the double on GPUs this is serious motivation. 
> 
> 
> > On Aug 14, 2019, at 12:45 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > 
> > FYI, Here is some scaling data of GAMG on SUMMIT. Getting about 4x GPU speedup with 98K dof/proc (3D Q2 elasticity).
> > 
> > This is weak scaling of a solve. There is growth in iteration count folded in here. I should put rtol in the title and/or run a fixed number of iterations and make it clear in the title.
> > 
> > Comments welcome.
> > <out_cpu_012288><out_cpu_001536><out_cuda_012288><out_cpu_000024><out_cpu_000192><out_cuda_001536><out_cuda_000192><out_cuda_000024><weak_scaling_cpu.png><weak_scaling_cuda.png>
>