[petsc-dev] fork for programming models debate (was "Using multiple mallocs with PETSc")

Jeff Hammond jeff.science at gmail.com
Wed Mar 15 23:08:35 CDT 2017


On Tue, Mar 14, 2017 at 8:52 PM Jed Brown <jed at jedbrown.org> wrote:

> Jeff Hammond <jeff.science at gmail.com> writes:
>
> > On Mon, Mar 13, 2017 at 8:08 PM, Jed Brown <jed at jedbrown.org> wrote:
> >>
> >> Jeff Hammond <jeff.science at gmail.com> writes:
> >>
> >> > OpenMP did not prevent OpenCL,
> >>
> >> This programming model isn't really intended for architectures with
> >> persistent caches.
> >>
> >
> > It's not clear to me how much this should matter in a good
> implementation.
> > The lack of implementation effort for OpenCL on cache-coherent CPU
> > architectures appears to be a more significant issue.
>
> How do you keep data resident in cache between kernel launches?
>

Not do stuff that causes it to be evicted.


> >> > C11, C++11
> >>
> >> These are basically pthreads, which predates OpenMP.
> >>
> >
> > I'm not sure why it matters which one came first.  POSIX standardized
> > threads in 1995, while OpenMP was first standardized in 1997.  However,
> the
> > first serious Pthreads implementation in Linux was in 2003.
>
> And the first serious OpenMP on OS X was when?
>

When was the first serious implementation of MPI-3 shared memory windows on
OSX? That is your alternative to OpenMP for shared memory.

I think it is rather pathetic to use Apple's compiler support as an
argument against the OpenMP programming model.

In any case, MPI doesn't run on any GPU hardware, nor any FPGA, nor DSP.
OpenMP supports at least two of these.


> > OpenMP standardized the best practices identified in Kuck, SGI and
> > Cray directives, just like POSIX presumably standardized best
> > practices in OS threads from various Unix implementations.
> >
> > C++11 and beyond have concurrency features beyond just threads.  You
> > probably hate all of them because they are C++, and in any case I won't
> > argue, because I don't see anything that's implemented better
> >
> >>
> >> > or Fortran 2008
> >>
> >> A different language and doesn't play well with others.
> >>
> >
> > Sure, but you could use Fortran 2003 features to interoperate between C
> and
> > Fortran if you wanted to leverage Fortran 2008 concurrency features in an
> > ISO-compliant way.  I'm not suggesting you want to do this, but I dispute
> > the suggestion that Fortran does not play nice with C.
>
> I think the above qualifies as not playing nicely in this context.
>

ISO-defined interoperability is not playing nice?


> > Fortran coarrays images are OS processes in every implementation I know,
> > although the standard does not explicitly require this implementation.
> The
> > situation is identical to that of MPI, although there are actually MPI
> > implementations based upon OS threads rather than OS processes (and they
> > require compiler or OS magic to deal with non-heap data).
> >
> > Both of the widely available Fortran coarray implementations use MPI-3
> RMA
> > under the hood and all of the ones I know about define an image to be an
> OS
> > process.
>
> Are you trying to sell PETSc on MPI?
>

No. I am countering your suggestion that they don't play nice.


> >> > from introducing parallelism. Not sure if your comment was meant to be
> >> > serious,
> >>
> >> Partially.  It was just enough to give the appearance of a solution
> >> while not really being a solution.
> >>
> >
> > It still isn't clear what you actually want.  You appear to reject every
> > standard API for enabling explicit vectorization for CPU execution
> > (Fortran, OpenMP, OpenCL), which suggests that (1) you do not believe in
> > vectorization, (2) you think that autovectorizing compilers are
> sufficient,
> > (3) you think vector code is necessarily a non-portable software
> construct,
> > or (4) you do not think vectorization is relevant to PETSc.
>
> OpenMP is strictly about vectorization with nothing to do with threads
> and MPI is sufficient?  I don't have a problem with that, but will
> probably stick to attributes and intrinsics instead of omp simd, at
> least until it matures and demonstrates feature parity.
>

Is MPI strictly about collectives?


> Have you tried writing a BLIS microkernel using omp simd?  Is it any
> good?


Have you tried writing MPI send-recv using TCP/IP?

You appear disinterested in trying to come up with constructive ideas. I
don't see any value in continuing this conversation.

Jeff


> --
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170316/f6de041a/attachment.html>


More information about the petsc-dev mailing list