<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jul 9, 2018 at 10:04 AM, Jed Brown <span dir="ltr"><<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Jeff Hammond <<a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a>> writes:<br>

<br>

> This is the textbook Wrong Way to write OpenMP and the reason that the<br>

> thread-scalability of DOE applications using MPI+OpenMP sucks.  It leads to<br>

> codes that do fork-join far too often and suffer from death by Amdahl,<br>

> unless you do a second pass where you fuse all the OpenMP regions and<br>

> replace the serial regions between them with critical sections or similar.<br>

><br>

</span><span class="">> This isn't how you'd write MPI, is it?  No, you'd figure out how to<br>

> decompose your data properly to exploit locality and then implement an<br>

> algorithm that minimizes communication and synchronization.  Do that with<br>

> OpenMP.<br>

<br>

</span>The applications that would call PETSc do not do this decomposition and<br>

<span style="background-color:rgb(255,255,0)">the OpenMP programming model does not provide a "communicator" or<br>

similar abstraction to associate the work done by the various threads.<br>

It's all implicit.  </span></blockquote><div><br></div><div>This is perhaps the biggest single reason that I hate OpenMP.</div><div><br></div><div>--Richard<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The idea with PETSc's threadcomm was to provide an<br>

object for this, but nobody wanted to call PETSc that way.  It's clear<br>

that applications using OpenMP are almost exclusively interested in its<br>

incrementalism, not in doing it right.  It's also pretty clear that the<br>

OpenMP forum agrees, otherwise they would be providing abstractions for<br>

performing collective operations across module boundaries within a<br>

parallel region.<br>

<br>

So the practical solution is to use OpenMP the way everyone else does,<br>

even if the performance is not good, because at least it works with the<br>

programming model the application has chosen.<br>

</blockquote></div><br></div></div>