[petsc-dev] GAMG error with MKL

Wed Jul 11 02:23:09 CDT 2018

Jeff Hammond <jeff.science at gmail.com> writes:

> On Tue, Jul 10, 2018 at 11:27 AM, Richard Tran Mills <rtmills at anl.gov>
> wrote:
>
>> On Mon, Jul 9, 2018 at 10:04 AM, Jed Brown <jed at jedbrown.org> wrote:
>>
>>> Jeff Hammond <jeff.science at gmail.com> writes:
>>>
>>> > This is the textbook Wrong Way to write OpenMP and the reason that the
>>> > thread-scalability of DOE applications using MPI+OpenMP sucks.  It
>>> leads to
>>> > codes that do fork-join far too often and suffer from death by Amdahl,
>>> > unless you do a second pass where you fuse all the OpenMP regions and
>>> > replace the serial regions between them with critical sections or
>>> similar.
>>> >
>>> > This isn't how you'd write MPI, is it?  No, you'd figure out how to
>>> > decompose your data properly to exploit locality and then implement an
>>> > algorithm that minimizes communication and synchronization.  Do that
>>> with
>>> > OpenMP.
>>>
>>> The applications that would call PETSc do not do this decomposition and
>>> the OpenMP programming model does not provide a "communicator" or
>>> similar abstraction to associate the work done by the various threads.
>>> It's all implicit.
>>
>>
>> This is perhaps the biggest single reason that I hate OpenMP.
>>
>> --Richard
>>
>> The idea with PETSc's threadcomm was to provide an
>>> object for this, but nobody wanted to call PETSc that way.  It's clear
>>> that applications using OpenMP are almost exclusively interested in its
>>> incrementalism, not in doing it right.  It's also pretty clear that the
>>> OpenMP forum agrees, otherwise they would be providing abstractions for
>>> performing collective operations across module boundaries within a
>>> parallel region.
>>>
>>> So the practical solution is to use OpenMP the way everyone else does,
>>> even if the performance is not good, because at least it works with the
>>> programming model the application has chosen.
>>>
>>
>>
> The counter argument is that users who want this level of control are
> empowered to implement exactly what they need using the explicit threading
> model.  

There is no standard for "explicit threading" crossing module
boundaries.  A communicator would provide an explicit way to say "these
threads participate in this collective operation".  It's fragile to
build it on top of a programming model that does not have such a
concept, particularly when you wish to support multiple libraries and
callback interfaces (necessary for nonlinear solvers and other types of
extensible composition).

> A thread communicator is just the set of threads that work together
> and one can implement workshare and collective operations using that
> information.  Given how inefficiently GOMP implementations barriers,
> folks should probably be rolling their own barriers anyways.  If folks
> are deeply unhappy with this suggestion because of the work it
> requires, then perhaps DOE needs to fund somebody to write an
> open-source collectives library for OpenMP.

What set of threads would these hypethetical collectives be collective
on?

> For what it's worth, there's a active effort to make the teams construct
> valid in general (i.e. not just in the context of 'target'), which is
> intended to make NUMA programming easier.  Currently, teams are defined by
> the implementation, but it may be possible to allow them to be
> user-defined.  The challenge is that teams were initially created to
> support the OpenCL execution model that does not permit synchronization
> between work groups, so there may be objections to giving users control of
> their definition.
>
> Jeff
>
> -- 
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/