GPU related stuff

Jed Brown jed at 59A2.org
Thu Jul 9 07:31:25 CDT 2009


Matthew Knepley wrote:

> PCs which have high flop to memory access ratios look good.  No
> surprise there.

My concern here is that almost all "good" preconditioners are
multiplicative in the fine-grained kernels or do significant work on
coarse levels.  Both of these are very bad for putting on a GPU.
Switching from SOR or ILU to Jacobi or red-black GS will greatly improve
the throughput on a GPU, but is normally much less effective.  Since the
GPU typically needs thousands of threads to attain high performance,
it's really hard to use on all but the finest level.

One of the more interesting preconditioners would be 3-level balancing
or overlapping DD with very small subdomains (like thousands of
subdomains per process).  There would then be 1 subregion per process
and a global coarse level.  This would allow the PC to be additive with
chunks of the right block size, while keeping a minimal amount of work
on the coarser levels (which are handled by the CPU).  (It's really hard
to get multigrid to coarsen this rapidly, as in 1M dofs to 10 dofs in 2
levels.)  Unfortunately, this sort of scheme is rather problem- and
discretization-dependent, as well as rather complex to implement.

I'll be interested to see what sort of performance you can get for real
preconditioners on a GPU.

Jed

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090709/8af607e0/attachment.sig>


More information about the petsc-dev mailing list