<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Fri, Jul 6, 2018 at 6:20 PM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Fri, Jul 6, 2018 at 3:07 PM Richard Tran Mills <<a href="mailto:rtmills@anl.gov" target="_blank">rtmills@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>True, Barry. But, unfortunately, I think Jed's argument has something to it because the hybrid MPI + OpenMP model has become so popular. I know of a few codes where adopting this model makes some sense, though I believe that, more often, the model has been adopted simply because it is the fashionable thing to do. Regardless of good or bad reasons for its adoption, I do have some real concern that codes that use this model have a difficult time using PETSc effectively because of the lack of thread support. Like many of us, I had hoped that endpoints would make it into the MPI standard and this would provide a reasonable mechanism for integrating PETSc with codes using MPI+threads, but progress on this seems to have stagnated. I hope that the MPI endpoints effort eventually goes somewhere, but what can we do in the meantime? Within the DOE ECP program, the MPI+threads approach is being pushed really hard, and many of the ECP subprojects have adopted it. I think it's mostly idiotic, but I think it's too late to turn the tide and convince most people that pure MPI is the way to go.</div></div></blockquote><div><br></div><div>This sounds like the "so many of our fellow buffalo have gone over the cliff, who are we to stand here on the precipice?" argument.</div><div><br></div><div>Also, every time I hear from one of us that supporting threads is "not that hard", I grind my teeth. The ICL people are not slouches, and</div><div>they poured resources into a thing that ultimately did little good, and was so intrusive that it had to be abandoned. How much more evidence</div><div>do we need that threads are not an appendix, but rather a cancer on a codebase. I submit that we would spend vastly fewer resources assigning</div><div>someone full-time to just fix these fucked up codes one-by-one than we would on the endless maintenance that threads in PETSc would necessitate.</div></div></div></blockquote><div><br></div><div>I think we should just say we support OpenMP and GPUs in several different ways (Hypre, MKL kernels, native GPUs, whatever). Don't worry if it does not live up to your standards, just say its done. Don't bother arguing that OMP is bad, you can not tell a child 'a stove is hot, don't touch', they will touch. Once.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div> Meanwhile, my understanding is that we need to be able to support more of the ECP application projects to justify the substantial funding we are getting from the program. Many of these projects are dead-set on using OpenMP. (I note that I believe that the folks Mark is trying to help with PETSc and OpenMP are people affiliated with Carl Steefel's ECP subsurface project.)</div></div></blockquote><div><br></div><div>Carl has now chosen Threads and Windows. What next, VAX?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Since it looks like MPI endpoints are going to be a long time (or possibly forever) in coming, I think we need (a) stopgap plan(s) to support this crappy MPI + OpenMP model in the meantime. One possible approach is to do what Mark is trying with to do with MKL: Use a third party library that provides optimized OpenMP implementations of computationally expensive kernels. It might make sense to also consider using Karl's ViennaCL library in this manner, which we already use to support GPUs, but which I believe (Karl, please let me know if I am off-base here) we could also use to provide OpenMP-ized linear algebra operations on CPUs as well. Such approaches won't use threads for lots of the things that a PETSc code will do, but might be able to provide decent resource utilization for the most expensive parts for some codes.</div></div></blockquote><div><br></div><div>I think you are right that this is the best thing. We should realize that best here means face savings, because all current experiments</div><div>say that the best thing to do is turn off OpenMP when using them, but at least we can claim to whoever will listen that we support it.</div></div></div></blockquote><div><br></div><div>Yes, let people do what they want. It is not a big deal and you can write good code with OMP (its just not easier than MPI). We have finite resources so we support threads in a cheapo way by using libraries.</div><div><br></div><div>Mark</div></div></div>