[petsc-dev] GAMG error with MKL

Tue Jul 10 13:41:43 CDT 2018

On Tue, Jul 10, 2018 at 11:27 AM, Richard Tran Mills <rtmills at anl.gov>
wrote:

> On Mon, Jul 9, 2018 at 10:04 AM, Jed Brown <jed at jedbrown.org> wrote:
>
>> Jeff Hammond <jeff.science at gmail.com> writes:
>>
>> > This is the textbook Wrong Way to write OpenMP and the reason that the
>> > thread-scalability of DOE applications using MPI+OpenMP sucks.  It
>> leads to
>> > codes that do fork-join far too often and suffer from death by Amdahl,
>> > unless you do a second pass where you fuse all the OpenMP regions and
>> > replace the serial regions between them with critical sections or
>> similar.
>> >
>> > This isn't how you'd write MPI, is it?  No, you'd figure out how to
>> > decompose your data properly to exploit locality and then implement an
>> > algorithm that minimizes communication and synchronization.  Do that
>> with
>> > OpenMP.
>>
>> The applications that would call PETSc do not do this decomposition and
>> the OpenMP programming model does not provide a "communicator" or
>> similar abstraction to associate the work done by the various threads.
>> It's all implicit.
>
>
> This is perhaps the biggest single reason that I hate OpenMP.
>
> --Richard
>
> The idea with PETSc's threadcomm was to provide an
>> object for this, but nobody wanted to call PETSc that way.  It's clear
>> that applications using OpenMP are almost exclusively interested in its
>> incrementalism, not in doing it right.  It's also pretty clear that the
>> OpenMP forum agrees, otherwise they would be providing abstractions for
>> performing collective operations across module boundaries within a
>> parallel region.
>>
>> So the practical solution is to use OpenMP the way everyone else does,
>> even if the performance is not good, because at least it works with the
>> programming model the application has chosen.
>>
>
>
The counter argument is that users who want this level of control are
empowered to implement exactly what they need using the explicit threading
model.  A thread communicator is just the set of threads that work together
and one can implement workshare and collective operations using that
information.  Given how inefficiently GOMP implementations barriers, folks
should probably be rolling their own barriers anyways.  If folks are deeply
unhappy with this suggestion because of the work it requires, then perhaps
DOE needs to fund somebody to write an open-source collectives library for
OpenMP.

For what it's worth, there's a active effort to make the teams construct
valid in general (i.e. not just in the context of 'target'), which is
intended to make NUMA programming easier.  Currently, teams are defined by
the implementation, but it may be possible to allow them to be
user-defined.  The challenge is that teams were initially created to
support the OpenCL execution model that does not permit synchronization
between work groups, so there may be objections to giving users control of
their definition.

Jeff

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180710/e3bdbc35/attachment-0001.html>