[petsc-dev] programming model for PETSc

Matthew Knepley knepley at gmail.com
Thu Nov 24 17:00:28 CST 2011


On Thu, Nov 24, 2011 at 4:49 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> On Thu, Nov 24, 2011 at 16:41, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> Let's start with the "lowest" level, or at least the smallest. I think
>> the only sane way to program for portable performance here
>> is using CUDA-type vectorization. This SIMT style is explained well here
>> http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html
>> I think this is much easier and more portable than the intrinsics for
>> Intel, and more performant and less error prone than threads.
>> I think you can show that it will accomplish anything we want to do.
>> OpenCL seems to have capitulated on this point. Do we agree
>> here?
>>
>
> Moving from the other thread, I asked how far we could get with an API for
> high-level data movement combined with CUDA/OpenCL kernels. Matt wrote
>
> *I think it will get you quite far, and the point for me will be*
> *how will the user describe a communication pattern, and how will we
> automate the generation of MPI*
> *from that specification. Sieve has an attempt to do this buried in it
> inspired by the "manifold" idea.*
> *
> *
> Now that CUDA supports function pointers and similar, we can write real
> code in it. Whenever OpenCL gets around to supporting them, we'll be able
> to write real code for multicore and see how it performs. To unify the
> distributed and manycore aspects, we need some sort of hierarchical
> abstraction for NUMA and a communicator-like object to maintain scope.
> After applying a local-distribution filter, we might be able to express
> this using coloring plus the parallel primitives that I have been
> suggesting in the other thread.
>
> I'll think more on this and see if I can put together a concrete API
> proposal.
>

Next, I think we need example problems, as I said before. DM ex1 does mesh
distribution, which I think should also include
distribution of data over the mesh. I think we should add AMG, and FMM.
With these three examples, we can prove this system
is worthwhile. Any discussion of these examples, or other suggestions?

   Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111124/23d7dfa5/attachment.html>


More information about the petsc-dev mailing list