<div><br><div class="gmail_quote"><div>On Tue, Mar 14, 2017 at 8:52 PM Jed Brown <<a href="mailto:jed@jedbrown.org">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Jeff Hammond <<a href="mailto:jeff.science@gmail.com" class="gmail_msg" target="_blank">jeff.science@gmail.com</a>> writes:<br class="gmail_msg">

<br class="gmail_msg">

> On Mon, Mar 13, 2017 at 8:08 PM, Jed Brown <<a href="mailto:jed@jedbrown.org" class="gmail_msg" target="_blank">jed@jedbrown.org</a>> wrote:<br class="gmail_msg">

>><br class="gmail_msg">

>> Jeff Hammond <<a href="mailto:jeff.science@gmail.com" class="gmail_msg" target="_blank">jeff.science@gmail.com</a>> writes:<br class="gmail_msg">

>><br class="gmail_msg">

>> > OpenMP did not prevent OpenCL,<br class="gmail_msg">

>><br class="gmail_msg">

>> This programming model isn't really intended for architectures with<br class="gmail_msg">

>> persistent caches.<br class="gmail_msg">

>><br class="gmail_msg">

><br class="gmail_msg">

> It's not clear to me how much this should matter in a good implementation.<br class="gmail_msg">

> The lack of implementation effort for OpenCL on cache-coherent CPU<br class="gmail_msg">

> architectures appears to be a more significant issue.<br class="gmail_msg">

<br class="gmail_msg">

How do you keep data resident in cache between kernel launches?<br class="gmail_msg">

</blockquote><div><br></div><div>Not do stuff that causes it to be evicted. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">

>> > C11, C++11<br class="gmail_msg">

>><br class="gmail_msg">

>> These are basically pthreads, which predates OpenMP.<br class="gmail_msg">

>><br class="gmail_msg">

><br class="gmail_msg">

> I'm not sure why it matters which one came first.  POSIX standardized<br class="gmail_msg">

> threads in 1995, while OpenMP was first standardized in 1997.  However, the<br class="gmail_msg">

> first serious Pthreads implementation in Linux was in 2003.<br class="gmail_msg">

<br class="gmail_msg">

And the first serious OpenMP on OS X was when?<br class="gmail_msg">

</blockquote><div><br></div><div>When was the first serious implementation of MPI-3 shared memory windows on OSX? That is your alternative to OpenMP for shared memory.</div><div><br></div><div>I think it is rather pathetic to use Apple's compiler support as an argument against the OpenMP programming model.</div><div><br></div><div>In any case, MPI doesn't run on any GPU hardware, nor any FPGA, nor DSP.  OpenMP supports at least two of these. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">

> OpenMP standardized the best practices identified in Kuck, SGI and<br class="gmail_msg">

> Cray directives, just like POSIX presumably standardized best<br class="gmail_msg">

> practices in OS threads from various Unix implementations.<br class="gmail_msg">

><br class="gmail_msg">

> C++11 and beyond have concurrency features beyond just threads.  You<br class="gmail_msg">

> probably hate all of them because they are C++, and in any case I won't<br class="gmail_msg">

> argue, because I don't see anything that's implemented better<br class="gmail_msg">

><br class="gmail_msg">

>><br class="gmail_msg">

>> > or Fortran 2008<br class="gmail_msg">

>><br class="gmail_msg">

>> A different language and doesn't play well with others.<br class="gmail_msg">

>><br class="gmail_msg">

><br class="gmail_msg">

> Sure, but you could use Fortran 2003 features to interoperate between C and<br class="gmail_msg">

> Fortran if you wanted to leverage Fortran 2008 concurrency features in an<br class="gmail_msg">

> ISO-compliant way.  I'm not suggesting you want to do this, but I dispute<br class="gmail_msg">

> the suggestion that Fortran does not play nice with C.<br class="gmail_msg">

<br class="gmail_msg">

I think the above qualifies as not playing nicely in this context.<br class="gmail_msg">

</blockquote><div><br></div><div>ISO-defined interoperability is not playing nice?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">

> Fortran coarrays images are OS processes in every implementation I know,<br class="gmail_msg">

> although the standard does not explicitly require this implementation.  The<br class="gmail_msg">

> situation is identical to that of MPI, although there are actually MPI<br class="gmail_msg">

> implementations based upon OS threads rather than OS processes (and they<br class="gmail_msg">

> require compiler or OS magic to deal with non-heap data).<br class="gmail_msg">

><br class="gmail_msg">

> Both of the widely available Fortran coarray implementations use MPI-3 RMA<br class="gmail_msg">

> under the hood and all of the ones I know about define an image to be an OS<br class="gmail_msg">

> process.<br class="gmail_msg">

<br class="gmail_msg">

Are you trying to sell PETSc on MPI?<br class="gmail_msg">

</blockquote><div><br></div><div>No. I am countering your suggestion that they don't play nice. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">

>> > from introducing parallelism. Not sure if your comment was meant to be<br class="gmail_msg">

>> > serious,<br class="gmail_msg">

>><br class="gmail_msg">

>> Partially.  It was just enough to give the appearance of a solution<br class="gmail_msg">

>> while not really being a solution.<br class="gmail_msg">

>><br class="gmail_msg">

><br class="gmail_msg">

> It still isn't clear what you actually want.  You appear to reject every<br class="gmail_msg">

> standard API for enabling explicit vectorization for CPU execution<br class="gmail_msg">

> (Fortran, OpenMP, OpenCL), which suggests that (1) you do not believe in<br class="gmail_msg">

> vectorization, (2) you think that autovectorizing compilers are sufficient,<br class="gmail_msg">

> (3) you think vector code is necessarily a non-portable software construct,<br class="gmail_msg">

> or (4) you do not think vectorization is relevant to PETSc.<br class="gmail_msg">

<br class="gmail_msg">

OpenMP is strictly about vectorization with nothing to do with threads<br class="gmail_msg">

and MPI is sufficient?  I don't have a problem with that, but will<br class="gmail_msg">

probably stick to attributes and intrinsics instead of omp simd, at<br class="gmail_msg">

least until it matures and demonstrates feature parity.<br class="gmail_msg">

</blockquote><div><br></div><div>Is MPI strictly about collectives?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">

Have you tried writing a BLIS microkernel using omp simd?  Is it any<br class="gmail_msg">

good?</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"></blockquote><div><br></div><div>Have you tried writing MPI send-recv using TCP/IP?</div><div><br></div><div>You appear disinterested in trying to come up with constructive ideas. I don't see any value in continuing this conversation. </div><div><br></div><div>Jeff</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">

</blockquote></div></div><div dir="ltr">-- <br></div><div data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>