<div dir="ltr">I can run single, I just can't scale up. But I can use like 1500 processors.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 14, 2019 at 9:31 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Oh, are all your integers 8 bytes? Even on one node?<br>
<br>
Once Karl's new middleware is in place we should see about reducing to 4 bytes on the GPU.<br>
<br>
Barry<br>
<br>
<br>
> On Aug 14, 2019, at 7:44 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>
> <br>
> OK, I'll run single. It a bit perverse to run with 4 byte floats and 8 byte integers ... I could use 32 bit ints and just not scale out.<br>
> <br>
> On Wed, Aug 14, 2019 at 6:48 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
> <br>
> Mark,<br>
> <br>
> Oh, I don't even care if it converges, just put in a fixed number of iterations. The idea is to just get a baseline of the possible improvement. <br>
> <br>
> ECP is literally dropping millions into research on "multi precision" computations on GPUs, we need to have some actual numbers for the best potential benefit to determine how much we invest in further investigating it, or not.<br>
> <br>
> I am not expressing any opinions on the approach, we are just in the fact gathering stage.<br>
> <br>
> <br>
> Barry<br>
> <br>
> <br>
> > On Aug 14, 2019, at 2:27 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>
> > <br>
> > <br>
> > <br>
> > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>
> > <br>
> > Mark,<br>
> > <br>
> > Would you be able to make one run using single precision? Just single everywhere since that is all we support currently? <br>
> > <br>
> > <br>
> > Experience in engineering at least is single does not work for FE elasticity. I have tried it many years ago and have heard this from others. This problem is pretty simple other than using Q2. I suppose I could try it, but just be aware the FE people might say that single sucks.<br>
> > <br>
> > The results will give us motivation (or anti-motivation) to have support for running KSP (or PC (or Mat) in single precision while the simulation is double.<br>
> > <br>
> > Thanks.<br>
> > <br>
> > Barry<br>
> > <br>
> > For example if the GPU speed on KSP is a factor of 3 over the double on GPUs this is serious motivation. <br>
> > <br>
> > <br>
> > > On Aug 14, 2019, at 12:45 PM, Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> wrote:<br>
> > > <br>
> > > FYI, Here is some scaling data of GAMG on SUMMIT. Getting about 4x GPU speedup with 98K dof/proc (3D Q2 elasticity).<br>
> > > <br>
> > > This is weak scaling of a solve. There is growth in iteration count folded in here. I should put rtol in the title and/or run a fixed number of iterations and make it clear in the title.<br>
> > > <br>
> > > Comments welcome.<br>
> > > <out_cpu_012288><out_cpu_001536><out_cuda_012288><out_cpu_000024><out_cpu_000192><out_cuda_001536><out_cuda_000192><out_cuda_000024><weak_scaling_cpu.png><weak_scaling_cuda.png><br>
> > <br>
> <br>
<br>
</blockquote></div>