On Fri, Nov 12, 2010 at 11:54 AM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im"><br>

On Nov 11, 2010, at 6:15 PM, Matthew Knepley wrote:<br>

<br>

> On Fri, Nov 12, 2010 at 9:52 AM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>

><br>

>   What should we use a for programming model for PETSc on multi-core systems? Currently for conventional multicore we have only have one MPI process per core and for GPU we have subclasses of Vec and Mat with custom CUDA code.<br>


><br>

>   Should we introduce subclasses of Vec and Mat built on pthreads (this is what Bill G recommends, and not to use OpenMP)?<br>

><br>

> pthreads are a nightmare and do not do vectorization right, which is necessary here.<br>

<br>

</div>   What do you mean do not vectorize?</blockquote><div><br></div><div>I mean its easy to tell a thread to do something, but I was not aware that pthreads had nice support</div><div>for telling all threads to do something at the same time. On a multicore, you want vector instructions,</div>

<div>and on a GPU you basically have no choice.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">

> I am for OpenCL/CUDA. Eventually<br>

> I think OpenCL will take its head out of its ass and be as nice as CUDA.<br>

<br>

</div>   That's fine for GPUs but what about 8 core or 12 core Intel processors? CUDA for that also?</blockquote><div><br></div><div>OpenCL already runs on these, but the interface is too low-level. It should look more like CUDA. In CUDA,</div>

<div>you have constructs for vectorization, but you can also branch as Jed indicated. This is slow on a GPU,</div><div>but would be fine on a multicore.</div><div><br></div><div>Jed, do you see obvious shortcomings of CUDA for these multicore machines?</div>

<div><br></div><div>   Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><font color="#888888"><br>

   Barry<br>

</font><div><div></div><div class="h5"><br>

><br>

>   Matt<br>

><br>

>   Is there a way to have some kind of consistent model between conventional multicore and GPU multi-core? If not the same code. What about this MCUDA stuff?<br>

><br>

>   Barry<br>

> --<br>

> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

> -- Norbert Wiener<br>

<br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>