<div dir="ltr"><div>I agree with Matt's comment and let me add (somewhat redundantly)</div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>This isn't how you'd write MPI, is it?  No, you'd figure out how to decompose your data properly to exploit locality and then implement an algorithm that minimizes communication and synchronization.  Do that with OpenMP.</div></div></div></div></blockquote><div><br></div><div>I have never seen a DOE app that does this correct, get you data model figured out first, then implement. In fact, in my mind, the only advantage of OMP is that it is incremental. You can get something running quickly and then incrementally optimize as resources allow and performance demands. That is nice in theory, but in practice apps just say "we did it all on our own without that pesky distributed memory computing" and do science. Fine. We all have resource limits and have to make decisions.</div><div><br></div><div>Jeff, with your approach, you do all the hard work of distributing your data intelligently, which must be done regardless of PM, but you are probably left with a code that has more shared memory algorithms in it than if you had started with the MPI side. I thought *you* were one of the savants that preach shared memory code is just impossible to make correct for non-trivial codes, and thus hard to maintain.</div><div><br></div><div>Case in point. I recently tried to use hypre's OMP support and we had numerical problems. After a week of digging found a hypre test case (ie, no PETSc) that seemed to work with -O1 and failed with -O2 (solver just did not converge and I valgrind seemed clean). (This was using the 'ij' test problem.) I then ran a PETSc test problem, with this -O1 hypre build, and it failed. I gave up at that point. Ulrike is in the loop and she agreed it looked like a compiler problem.</div><div><br></div><div>If Intel can get this hypre test to work they can tell me what they did I can try it again in PETSc. BTW, I looked at the hypre code and they do not seem to do much if any fusing, etc.</div><div><br></div><div>And, this is all anecdotal and I do not want to imply that OMP or hypre or Intel are bad in any way (in fact I like both hypre and Intel).</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

   Note: that for BLAS 1 operations likely the correct thing to do is turn on MKL BLAS threading (being careful to make sure the number of threads MKL uses matches that used by other parts of the code). This way we don't need to OpenMP optimize many parts of PETSc's vector operations (norm, dot, scale, axpy). In fact, this is the first thing Mark should do, how much does it speed up the vector operations?<br></blockquote><div><br></div><div>BLAS1 operations are all memory-bound unless running out of cache (in which case one shouldn't use threads) and compilers do a great job with them.  Just put the pragmas on and let the compiler do its job.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  The problem is how many ECP applications actually use OpenMP just as a #pragma optimization tool, or do they use other features of OpenMP. For example I remember Brian wanted to/did use OpenMP threads directly in BoxLib and didn't just stick to the #pragma model. If they did this then we would need custom PETSc to match their model.<span class="m_-5964536132994351926HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div>If this implies that BoxLib will use omp-parallel and then use explicit threading in a manner similar to MPI (omp_get_num_threads=MPI_Comm_size and omp_get_thread_num=MPI_Comm_rank), then this is the Right Way to write OpenMP.</div></div></div></div></blockquote><div><br></div><div>Note, Chombo (Phil Collela) split from BoxLib <span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">(John Bell)<span> </span></span>about 15 years ago (and added more C++) and BoxLib has been refactored into AMReX. Brian works with Chombo. Some staff are fungible and go between both projects. I don't think Brian is fungible.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra">

</div></div>

</blockquote></div></div>