[petsc-dev] programming model for PETSc

Sat Nov 26 12:07:15 CST 2011

On Fri, Nov 25, 2011 at 16:48, Matthew Knepley <knepley at gmail.com> wrote:

> Synopsis of what I said before to elicit comment:
>
> 1) I think the only thing we can learn from Brook, CUDA, OpenCL is that
> you identify threads by a grid ID.
>
> 2) Things like BLAS are so easy that you can move up to the streaming
> model, but this does not work for
>
>   - FD and FEM residual evaluation (Jed has an FD example with Aron, SNES
> ex52 is my FEM example)
>
>   - FD and FEM Jacobian evaluation
>

I think these are also probably too simple. Discontinuous Galerkin with
overlapped flux computations and interior integration would be a somewhat
better model problem. Nonlinear Gauss-Seidel in a multigrid context would
be another.

>
> 3) If you look at ex52 I do a "thread transposition" meaning threads start
> working on different areas of
>     memory which looks like a transpose on a 2D grid. I can do this using
> shared memory for the vector group.
>
> The API is very simple. Give grid indices to the thread, and its done in
> CUDA and OpenCL essentially the
> same way.
>

As is, this seems to assume a flat memory model and the memory access only
appears in how the kernel uses threadIdx to determine what memory to
operate on. If we could say something about this up-front, then the library
could schedule tasks relative to memory and perhaps handle some updates for
distributed memory.

Can we have a way to specify the required memory access before launching
the kernels?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111126/4ff0f92d/attachment.html>