[petsc-dev] programming model for PETSc

Thu Nov 24 16:49:58 CST 2011

On Thu, Nov 24, 2011 at 16:41, Matthew Knepley <knepley at gmail.com> wrote:
>
> Let's start with the "lowest" level, or at least the smallest. I think the
> only sane way to program for portable performance here
> is using CUDA-type vectorization. This SIMT style is explained well here
> http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
> I think this is much easier and more portable than the intrinsics for
> Intel, and more performant and less error prone than threads.
> I think you can show that it will accomplish anything we want to do.
> OpenCL seems to have capitulated on this point. Do we agree
> here?
>

Moving from the other thread, I asked how far we could get with an API for
high-level data movement combined with CUDA/OpenCL kernels. Matt wrote

*I think it will get you quite far, and the point for me will be*
*how will the user describe a communication pattern, and how will we
automate the generation of MPI*
*from that specification. Sieve has an attempt to do this buried in it
inspired by the "manifold" idea.*
*
*
Now that CUDA supports function pointers and similar, we can write real
code in it. Whenever OpenCL gets around to supporting them, we'll be able
to write real code for multicore and see how it performs. To unify the
distributed and manycore aspects, we need some sort of hierarchical
abstraction for NUMA and a communicator-like object to maintain scope.
After applying a local-distribution filter, we might be able to express
this using coloring plus the parallel primitives that I have been
suggesting in the other thread.

I'll think more on this and see if I can put together a concrete API
proposal.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111124/cea5dcaa/attachment.html>