[petsc-dev] [petsc-maint] running CUDA on SUMMIT

Wed Aug 14 20:45:21 CDT 2019

I can run single, I just can't scale up. But I can use like 1500 processors.

On Wed, Aug 14, 2019 at 9:31 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   Oh, are all your integers 8 bytes? Even on one node?
>
>   Once Karl's new middleware is in place we should see about reducing to 4
> bytes on the GPU.
>
>    Barry
>
>
> > On Aug 14, 2019, at 7:44 PM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > OK, I'll run single. It a bit perverse to run with 4 byte floats and 8
> byte integers ... I could use 32 bit ints and just not scale out.
> >
> > On Wed, Aug 14, 2019 at 6:48 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >  Mark,
> >
> >    Oh, I don't even care if it converges, just put in a fixed number of
> iterations. The idea is to just get a baseline of the possible improvement.
> >
> >     ECP is literally dropping millions into research on "multi
> precision" computations on GPUs, we need to have some actual numbers for
> the best potential benefit to determine how much we invest in further
> investigating it, or not.
> >
> >     I am not expressing any opinions on the approach, we are just in the
> fact gathering stage.
> >
> >
> >    Barry
> >
> >
> > > On Aug 14, 2019, at 2:27 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > >
> > >
> > >
> > > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > >
> > >   Mark,
> > >
> > >    Would you be able to make one run using single precision? Just
> single everywhere since that is all we support currently?
> > >
> > >
> > > Experience in engineering at least is single does not work for FE
> elasticity. I have tried it many years ago and have heard this from others.
> This problem is pretty simple other than using Q2. I suppose I could try
> it, but just be aware the FE people might say that single sucks.
> > >
> > >    The results will give us motivation (or anti-motivation) to have
> support for running KSP (or PC (or Mat)  in single precision while the
> simulation is double.
> > >
> > >    Thanks.
> > >
> > >      Barry
> > >
> > > For example if the GPU speed on KSP is a factor of 3 over the double
> on GPUs this is serious motivation.
> > >
> > >
> > > > On Aug 14, 2019, at 12:45 PM, Mark Adams <mfadams at lbl.gov> wrote:
> > > >
> > > > FYI, Here is some scaling data of GAMG on SUMMIT. Getting about 4x
> GPU speedup with 98K dof/proc (3D Q2 elasticity).
> > > >
> > > > This is weak scaling of a solve. There is growth in iteration count
> folded in here. I should put rtol in the title and/or run a fixed number of
> iterations and make it clear in the title.
> > > >
> > > > Comments welcome.
> > > >
> <out_cpu_012288><out_cpu_001536><out_cuda_012288><out_cpu_000024><out_cpu_000192><out_cuda_001536><out_cuda_000192><out_cuda_000024><weak_scaling_cpu.png><weak_scaling_cuda.png>
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190814/2437603a/attachment-0001.html>