GPU related stuff

Matthew Knepley knepley at gmail.com
Thu Jul 9 07:39:45 CDT 2009


On Thu, Jul 9, 2009 at 7:31 AM, Jed Brown <jed at 59a2.org> wrote:

> Matthew Knepley wrote:
>
> > PCs which have high flop to memory access ratios look good.  No
> > surprise there.
>
> My concern here is that almost all "good" preconditioners are
> multiplicative in the fine-grained kernels or do significant work on
> coarse levels.  Both of these are very bad for putting on a GPU.
> Switching from SOR or ILU to Jacobi or red-black GS will greatly improve
> the throughput on a GPU, but is normally much less effective.  Since the
> GPU typically needs thousands of threads to attain high performance,
> it's really hard to use on all but the finest level.


I agree with all these comments. I have no idea how to make those PCs
work. I am counting on Barry's genius here.


>
> One of the more interesting preconditioners would be 3-level balancing
> or overlapping DD with very small subdomains (like thousands of
> subdomains per process).  There would then be 1 subregion per process
> and a global coarse level.  This would allow the PC to be additive with
> chunks of the right block size, while keeping a minimal amount of work
> on the coarser levels (which are handled by the CPU).  (It's really hard
> to get multigrid to coarsen this rapidly, as in 1M dofs to 10 dofs in 2
> levels.)  Unfortunately, this sort of scheme is rather problem- and
> discretization-dependent, as well as rather complex to implement.


With regard to targets, my strategy is to implement things that I can
prove work well on a GPU. For starters, we have FMM. We have done
a complete computational model and can prove that this will scale almost
indefinitely. The first paper is out, and the other 2 are almost done. We
are
also implementing wavelets, since the structure and proofs are very similar
to FMM.

The strategy is to use FMM/Wavelets for problems they can solve to
precondition
more complex problems. The prototype is Stokes preconditioning variable
viscosity Stokes, which I am working on with Dave May and Dave Yuen.


> I'll be interested to see what sort of performance you can get for real
> preconditioners on a GPU.


Felipe Cruz has preliminary numbers for FMM: 500 GF on a single 1060C!
That is probably 10 times what you can hope to achieve with traditional
relaxation (I think).

   Matt


>
> Jed
>
-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090709/d32d160d/attachment.html>


More information about the petsc-dev mailing list