On Thu, Nov 24, 2011 at 4:40 PM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im"><div class="gmail_quote">On Thu, Nov 24, 2011 at 16:26, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>This is one great reason that vectorization works and pthreads is crap. I am not totally sold on the thread block system, but<div>it looks like genius compared to pthreads. I would start there.</div></div></blockquote>


</div><br></div><div>Suppose you had a higher level way to describe data movement (across shared and distributed memory) between invocation of CUDA/OpenCL kernels. How far would that get you?</div>

</blockquote></div><br>Move this question to Barry's new thread. I think it will get you quite far, and the point for me will be<div>how will the user describe a communication pattern, and how will we automate the generation of MPI</div>

<div>from that specification. Sieve has an attempt to do this buried in it inspired by the "manifold" idea.<br><div><br></div><div>   Matt<br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

</div></div>