<div class="gmail_quote">On Fri, Nov 25, 2011 at 09:45, Mark F. Adams <span dir="ltr"><<a href="mailto:mark.adams@columbia.edu">mark.adams@columbia.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

So you want MPI to provide an incremental path to a model that will work at exa-scale and pthreads are part of that?  So you are in the MPI + pthreads camp?</blockquote></div><br><div>I don't know what Barry wants, but I like the idea of building the library interface that we want on top of MPI+pthreads. I see them merely as portable network- and kernels-level abstractions, not as anything intended to provide a good interface for applications to call directly, or even to use directly when implementing most PETSc functionality.</div>

<div><br></div><div>It is a noble goal to offer a unified view of shared/threads and distributed memory. I don't know how possible this is, so it always involve some explicit hierarchy. The distributed-memory primitives might just involve some layout informed by NUMA, followed by threads or CUDA-style kernels run locally.</div>

<div><br></div><div>I would really like to see some performance numbers for OpenCL running on CPUs. It would make our life simpler (once OpenCL supports indirect calls) if we could just use that, but I'm not convinced that it will deliver good performance due to the different memory model.</div>