[petsc-dev] GAMG error with MKL

Fri Jul 6 17:20:36 CDT 2018

On Fri, Jul 6, 2018 at 3:07 PM Richard Tran Mills <rtmills at anl.gov> wrote:

> True, Barry. But, unfortunately, I think Jed's argument has something to
> it because the hybrid MPI + OpenMP model has become so popular. I know of a
> few codes where adopting this model makes some sense, though I believe
> that, more often, the model has been adopted simply because it is the
> fashionable thing to do. Regardless of good or bad reasons for its
> adoption, I do have some real concern that codes that use this model have a
> difficult time using PETSc effectively because of the lack of thread
> support. Like many of us, I had hoped that endpoints would make it into the
> MPI standard and this would provide a reasonable mechanism for integrating
> PETSc with codes using MPI+threads, but progress on this seems to have
> stagnated. I hope that the MPI endpoints effort eventually goes somewhere,
> but what can we do in the meantime? Within the DOE ECP program, the
> MPI+threads approach is being pushed really hard, and many of the ECP
> subprojects have adopted it. I think it's mostly idiotic, but I think it's
> too late to turn the tide and convince most people that pure MPI is the way
> to go.
>

This sounds like the "so many of our fellow buffalo have gone over the
cliff, who are we to stand here on the precipice?" argument.

Also, every time I hear from one of us that supporting threads is "not that
hard", I grind my teeth. The ICL people are not slouches, and
they poured resources into a thing that ultimately did little good, and was
so intrusive that it had to be abandoned. How much more evidence
do we need that threads are not an appendix, but rather a cancer on a
codebase. I submit that we would spend vastly fewer resources assigning
someone full-time to just fix these fucked up codes one-by-one than we
would on the endless maintenance that threads in PETSc would necessitate.

> Meanwhile, my understanding is that we need to be able to support more of
> the ECP application projects to justify the substantial funding we are
> getting from the program. Many of these projects are dead-set on using
> OpenMP. (I note that I believe that the folks Mark is trying to help with
> PETSc and OpenMP are people affiliated with Carl Steefel's ECP subsurface
> project.)
>

Carl has now chosen Threads and Windows. What next, VAX?

> Since it looks like MPI endpoints are going to be a long time (or possibly
> forever) in coming, I think we need (a) stopgap plan(s) to support this
> crappy MPI + OpenMP model in the meantime. One possible approach is to do
> what Mark is trying with to do with MKL: Use a third party library that
> provides optimized OpenMP implementations of computationally expensive
> kernels. It might make sense to also consider using Karl's ViennaCL library
> in this manner, which we already use to support GPUs, but which I believe
> (Karl, please let me know if I am off-base here) we could also use to
> provide OpenMP-ized linear algebra operations on CPUs as well. Such
> approaches won't use threads for lots of the things that a PETSc code will
> do, but might be able to provide decent resource utilization for the most
> expensive parts for some codes.
>

I think you are right that this is the best thing. We should realize that
best here means face savings, because all current experiments
say that the best thing to do is turn off OpenMP when using them, but at
least we can claim to whoever will listen that we support it.

   Matt

> Clever ideas from anyone on this list about how to use an adequate number
> of MPI ranks for PETSc while using only a subset of these ranks for the
> MPI+OpenMP application code will be appreciated, though I don't know if
> there are any good solutions.
>
> --Richard
>
> On Wed, Jul 4, 2018 at 11:38 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
>
>>
>>    Jed,
>>
>>      You could use your same argument to argue PETSc should do
>> "something" to help people who have (rightly or wrongly) chosen to code
>> their application in High Performance Fortran or any other similar inane
>> parallel programming model.
>>
>>    Barry
>>
>>
>>
>> > On Jul 4, 2018, at 11:51 PM, Jed Brown <jed at jedbrown.org> wrote:
>> >
>> > Matthew Knepley <knepley at gmail.com> writes:
>> >
>> >> On Wed, Jul 4, 2018 at 4:51 PM Jeff Hammond <jeff.science at gmail.com>
>> wrote:
>> >>
>> >>> On Wed, Jul 4, 2018 at 6:31 AM Matthew Knepley <knepley at gmail.com>
>> wrote:
>> >>>
>> >>>> On Tue, Jul 3, 2018 at 10:32 PM Jeff Hammond <jeff.science at gmail.com
>> >
>> >>>> wrote:
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Jul 3, 2018 at 4:35 PM Mark Adams <mfadams at lbl.gov> wrote:
>> >>>>>
>> >>>>>> On Tue, Jul 3, 2018 at 1:00 PM Richard Tran Mills <rtmills at anl.gov
>> >
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Hi Mark,
>> >>>>>>>
>> >>>>>>> I'm glad to see you trying out the AIJMKL stuff. I think you are
>> the
>> >>>>>>> first person trying to actually use it, so we are probably going
>> to expose
>> >>>>>>> some bugs and also some performance issues. My somewhat limited
>> testing has
>> >>>>>>> shown that the MKL sparse routines often perform worse than our
>> own
>> >>>>>>> implementations in PETSc.
>> >>>>>>>
>> >>>>>>
>> >>>>>> My users just want OpenMP.
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>> Why not just add OpenMP to PETSc? I know certain developers hate
>> it, but
>> >>>>> it is silly to let a principled objection stand in the way of
>> enabling users
>> >>>>>
>> >>>>
>> >>>> "if that would deliver the best performance for NERSC users."
>> >>>>
>> >>>> You have answered your own question.
>> >>>>
>> >>>
>> >>> Please share the results of your experiments that prove OpenMP does
>> not
>> >>> improve performance for Mark’s users.
>> >>>
>> >>
>> >> Oh God. I am supremely uninterested in minutely proving yet again that
>> >> OpenMP is not better than MPI.
>> >> There are already countless experiments. One more will not add
>> anything of
>> >> merit.
>> >
>> > Jeff assumes an absurd null hypothesis, Matt selfishly believes that
>> > users should modify their code/execution environment to subscribe to a
>> > more robust and equally performant approach, and the MPI forum abdicates
>> > by stalling on endpoints.  How do we resolve this?
>> >
>> >>> Also we are not in the habit of fucking up our codebase in order to
>> follow
>> >>>> some fad.
>> >>>>
>> >>>
>> >>> If you can’t use OpenMP without messing up your code base, you
>> probably
>> >>> don’t know how to design software.
>> >>>
>> >>
>> >> That is an interesting, if wrong, opinion. It might be your contention
>> that
>> >> sticking any random paradigm in a library should
>> >> be alright if its "well designed"? I have never encountered such a
>> >> well-designed library.
>> >>
>> >>
>> >>> I guess if you refuse to use _Pragma because C99 is still a fad for
>> you,
>> >>> it is harder, but clearly _Complex is tolerated.
>> >>>
>> >>
>> >> Yes, littering your code with preprocessor directives improves almost
>> >> everything. Doing proper resource management
>> >> using Pragmas, in an environment with several layers of libraries, is a
>> >> dream.
>> >>
>> >>
>> >>> More seriously, you’ve adopted OpenMP hidden behind MKL
>> >>>
>> >>
>> >> Nope. We can use MKL with that crap shutoff.
>> >>
>> >>
>> >>> so I see no reason why you can’t wrap OpenMP implementations of the
>> PETSc
>> >>> sparse kernels in a similar manner.
>> >>>
>> >>
>> >> We could, its just a colossal waste of time and effort, as well as
>> >> counterproductive for the codebase :)
>> >
>> > Endpoints either need to become a thing we can depend on or we need a
>> > solution for users that insist on using threads (even if their decision
>> > to use threads is objectively bad).  The problem Matt harps on is
>> > legitimate: OpenMP parallel regions cannot reliably cross module
>> > boundaries except for embarrassingly parallel operations.  This means
>> > loop-level omp parallel which significantly increases overhead for small
>> > problem sizes (e.g., slowing coarse grid solves and strong scaling
>> > limits).  It can be done and isn't that hard, but the Imperial group
>> > discarded their branch after observing that it also provided no
>> > performance benefit.  However, I'm coming around to the idea that PETSc
>> > should do it so that there is _a_ solution for users that insist on
>> > using threads in a particular way.  Unless Endpoints become available
>> > and reliable, in which case we could do it right.
>>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180706/65d3a2d8/attachment-0001.html>