[petsc-dev] [petsc-maint] running CUDA on SUMMIT

Smith, Barry F. bsmith at mcs.anl.gov
Wed Aug 14 20:31:14 CDT 2019


  Oh, are all your integers 8 bytes? Even on one node?

  Once Karl's new middleware is in place we should see about reducing to 4 bytes on the GPU.
   
   Barry


> On Aug 14, 2019, at 7:44 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> OK, I'll run single. It a bit perverse to run with 4 byte floats and 8 byte integers ... I could use 32 bit ints and just not scale out.
> 
> On Wed, Aug 14, 2019 at 6:48 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>  Mark,
> 
>    Oh, I don't even care if it converges, just put in a fixed number of iterations. The idea is to just get a baseline of the possible improvement. 
> 
>     ECP is literally dropping millions into research on "multi precision" computations on GPUs, we need to have some actual numbers for the best potential benefit to determine how much we invest in further investigating it, or not.
> 
>     I am not expressing any opinions on the approach, we are just in the fact gathering stage.
> 
> 
>    Barry
> 
> 
> > On Aug 14, 2019, at 2:27 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > 
> > 
> > 
> > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> > 
> >   Mark,
> > 
> >    Would you be able to make one run using single precision? Just single everywhere since that is all we support currently? 
> > 
> > 
> > Experience in engineering at least is single does not work for FE elasticity. I have tried it many years ago and have heard this from others. This problem is pretty simple other than using Q2. I suppose I could try it, but just be aware the FE people might say that single sucks.
> >  
> >    The results will give us motivation (or anti-motivation) to have support for running KSP (or PC (or Mat)  in single precision while the simulation is double.
> > 
> >    Thanks.
> > 
> >      Barry
> > 
> > For example if the GPU speed on KSP is a factor of 3 over the double on GPUs this is serious motivation. 
> > 
> > 
> > > On Aug 14, 2019, at 12:45 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > > 
> > > FYI, Here is some scaling data of GAMG on SUMMIT. Getting about 4x GPU speedup with 98K dof/proc (3D Q2 elasticity).
> > > 
> > > This is weak scaling of a solve. There is growth in iteration count folded in here. I should put rtol in the title and/or run a fixed number of iterations and make it clear in the title.
> > > 
> > > Comments welcome.
> > > <out_cpu_012288><out_cpu_001536><out_cuda_012288><out_cpu_000024><out_cpu_000192><out_cuda_001536><out_cuda_000192><out_cuda_000024><weak_scaling_cpu.png><weak_scaling_cuda.png>
> > 
> 



More information about the petsc-dev mailing list