GPU related stuff
Farshid Mossaiby
mossaiby at yahoo.com
Fri Jul 10 15:17:42 CDT 2009
Thanks all for comments.
--- On Thu, 7/9/09, Matthew Knepley <knepley at gmail.com> wrote:
> From: Matthew Knepley <knepley at gmail.com>
> Subject: Re: GPU related stuff
> To: "For users of the development version of PETSc" <petsc-dev at mcs.anl.gov>
> Date: Thursday, July 9, 2009, 5:09 PM
> On Thu, Jul 9, 2009 at 7:31 AM, Jed
> Brown <jed at 59a2.org>
> wrote:
>
> Matthew Knepley wrote:
>
>
>
> > PCs which have high flop to memory access ratios look
> good. No
>
> > surprise there.
>
>
>
> My concern here is that almost all "good"
> preconditioners are
>
> multiplicative in the fine-grained kernels or do
> significant work on
>
> coarse levels. Both of these are very bad for putting on
> a GPU.
>
> Switching from SOR or ILU to Jacobi or red-black GS will
> greatly improve
>
> the throughput on a GPU, but is normally much less
> effective. Since the
>
> GPU typically needs thousands of threads to attain high
> performance,
>
> it's really hard to use on all but the finest
> level.
> I agree with all these comments. I have no idea how to make
> those PCs
> work. I am counting on Barry's genius here.
>
>
>
> One of the more interesting preconditioners would be
> 3-level balancing
>
> or overlapping DD with very small subdomains (like
> thousands of
>
> subdomains per process). There would then be 1 subregion
> per process
>
> and a global coarse level. This would allow the PC to be
> additive with
>
> chunks of the right block size, while keeping a minimal
> amount of work
>
> on the coarser levels (which are handled by the CPU).
> (It's really hard
>
> to get multigrid to coarsen this rapidly, as in 1M dofs to
> 10 dofs in 2
>
> levels.) Unfortunately, this sort of scheme is rather
> problem- and
>
> discretization-dependent, as well as rather complex to
> implement.
> With regard to targets, my strategy is to implement things
> that I can
> prove work well on a GPU. For starters, we have FMM. We
> have done
>
> a complete computational model and can prove that this will
> scale almost
> indefinitely. The first paper is out, and the other 2 are
> almost done. We are
> also implementing wavelets, since the structure and proofs
> are very similar
>
> to FMM.
>
> The strategy is to use FMM/Wavelets for problems they can
> solve to precondition
> more complex problems. The prototype is Stokes
> preconditioning variable
> viscosity Stokes, which I am working on with Dave May and
> Dave Yuen.
>
>
>
>
> I'll be interested to see what sort of performance you
> can get for real
>
> preconditioners on a GPU.
> Felipe Cruz has preliminary numbers for FMM: 500 GF on a
> single 1060C!
> That is probably 10 times what you can hope to achieve with
> traditional
> relaxation (I think).
>
>
> Matt
>
>
> Jed
> --
> What most experimenters take for granted before they begin
> their experiments is infinitely more interesting than any
> results to which their experiments lead.
> -- Norbert Wiener
>
>
>
More information about the petsc-dev
mailing list