<div><br><div class="gmail_quote"><div>On Tue, Mar 14, 2017 at 8:52 PM Jed Brown <<a href="mailto:jed@jedbrown.org">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Jeff Hammond <<a href="mailto:jeff.science@gmail.com" class="gmail_msg" target="_blank">jeff.science@gmail.com</a>> writes:<br class="gmail_msg">
<br class="gmail_msg">
> On Mon, Mar 13, 2017 at 8:08 PM, Jed Brown <<a href="mailto:jed@jedbrown.org" class="gmail_msg" target="_blank">jed@jedbrown.org</a>> wrote:<br class="gmail_msg">
>><br class="gmail_msg">
>> Jeff Hammond <<a href="mailto:jeff.science@gmail.com" class="gmail_msg" target="_blank">jeff.science@gmail.com</a>> writes:<br class="gmail_msg">
>><br class="gmail_msg">
>> > OpenMP did not prevent OpenCL,<br class="gmail_msg">
>><br class="gmail_msg">
>> This programming model isn't really intended for architectures with<br class="gmail_msg">
>> persistent caches.<br class="gmail_msg">
>><br class="gmail_msg">
><br class="gmail_msg">
> It's not clear to me how much this should matter in a good implementation.<br class="gmail_msg">
> The lack of implementation effort for OpenCL on cache-coherent CPU<br class="gmail_msg">
> architectures appears to be a more significant issue.<br class="gmail_msg">
<br class="gmail_msg">
How do you keep data resident in cache between kernel launches?<br class="gmail_msg">
</blockquote><div><br></div><div>Not do stuff that causes it to be evicted. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">
>> > C11, C++11<br class="gmail_msg">
>><br class="gmail_msg">
>> These are basically pthreads, which predates OpenMP.<br class="gmail_msg">
>><br class="gmail_msg">
><br class="gmail_msg">
> I'm not sure why it matters which one came first. POSIX standardized<br class="gmail_msg">
> threads in 1995, while OpenMP was first standardized in 1997. However, the<br class="gmail_msg">
> first serious Pthreads implementation in Linux was in 2003.<br class="gmail_msg">
<br class="gmail_msg">
And the first serious OpenMP on OS X was when?<br class="gmail_msg">
</blockquote><div><br></div><div>When was the first serious implementation of MPI-3 shared memory windows on OSX? That is your alternative to OpenMP for shared memory.</div><div><br></div><div>I think it is rather pathetic to use Apple's compiler support as an argument against the OpenMP programming model.</div><div><br></div><div>In any case, MPI doesn't run on any GPU hardware, nor any FPGA, nor DSP. OpenMP supports at least two of these. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">
> OpenMP standardized the best practices identified in Kuck, SGI and<br class="gmail_msg">
> Cray directives, just like POSIX presumably standardized best<br class="gmail_msg">
> practices in OS threads from various Unix implementations.<br class="gmail_msg">
><br class="gmail_msg">
> C++11 and beyond have concurrency features beyond just threads. You<br class="gmail_msg">
> probably hate all of them because they are C++, and in any case I won't<br class="gmail_msg">
> argue, because I don't see anything that's implemented better<br class="gmail_msg">
><br class="gmail_msg">
>><br class="gmail_msg">
>> > or Fortran 2008<br class="gmail_msg">
>><br class="gmail_msg">
>> A different language and doesn't play well with others.<br class="gmail_msg">
>><br class="gmail_msg">
><br class="gmail_msg">
> Sure, but you could use Fortran 2003 features to interoperate between C and<br class="gmail_msg">
> Fortran if you wanted to leverage Fortran 2008 concurrency features in an<br class="gmail_msg">
> ISO-compliant way. I'm not suggesting you want to do this, but I dispute<br class="gmail_msg">
> the suggestion that Fortran does not play nice with C.<br class="gmail_msg">
<br class="gmail_msg">
I think the above qualifies as not playing nicely in this context.<br class="gmail_msg">
</blockquote><div><br></div><div>ISO-defined interoperability is not playing nice?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">
> Fortran coarrays images are OS processes in every implementation I know,<br class="gmail_msg">
> although the standard does not explicitly require this implementation. The<br class="gmail_msg">
> situation is identical to that of MPI, although there are actually MPI<br class="gmail_msg">
> implementations based upon OS threads rather than OS processes (and they<br class="gmail_msg">
> require compiler or OS magic to deal with non-heap data).<br class="gmail_msg">
><br class="gmail_msg">
> Both of the widely available Fortran coarray implementations use MPI-3 RMA<br class="gmail_msg">
> under the hood and all of the ones I know about define an image to be an OS<br class="gmail_msg">
> process.<br class="gmail_msg">
<br class="gmail_msg">
Are you trying to sell PETSc on MPI?<br class="gmail_msg">
</blockquote><div><br></div><div>No. I am countering your suggestion that they don't play nice. </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">
>> > from introducing parallelism. Not sure if your comment was meant to be<br class="gmail_msg">
>> > serious,<br class="gmail_msg">
>><br class="gmail_msg">
>> Partially. It was just enough to give the appearance of a solution<br class="gmail_msg">
>> while not really being a solution.<br class="gmail_msg">
>><br class="gmail_msg">
><br class="gmail_msg">
> It still isn't clear what you actually want. You appear to reject every<br class="gmail_msg">
> standard API for enabling explicit vectorization for CPU execution<br class="gmail_msg">
> (Fortran, OpenMP, OpenCL), which suggests that (1) you do not believe in<br class="gmail_msg">
> vectorization, (2) you think that autovectorizing compilers are sufficient,<br class="gmail_msg">
> (3) you think vector code is necessarily a non-portable software construct,<br class="gmail_msg">
> or (4) you do not think vectorization is relevant to PETSc.<br class="gmail_msg">
<br class="gmail_msg">
OpenMP is strictly about vectorization with nothing to do with threads<br class="gmail_msg">
and MPI is sufficient? I don't have a problem with that, but will<br class="gmail_msg">
probably stick to attributes and intrinsics instead of omp simd, at<br class="gmail_msg">
least until it matures and demonstrates feature parity.<br class="gmail_msg">
</blockquote><div><br></div><div>Is MPI strictly about collectives?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">
Have you tried writing a BLIS microkernel using omp simd? Is it any<br class="gmail_msg">
good?</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"></blockquote><div><br></div><div>Have you tried writing MPI send-recv using TCP/IP?</div><div><br></div><div>You appear disinterested in trying to come up with constructive ideas. I don't see any value in continuing this conversation. </div><div><br></div><div>Jeff</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">
</blockquote></div></div><div dir="ltr">-- <br></div><div data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>