[petsc-dev] fork for programming models debate (was "Using multiple mallocs with PETSc")

Tue Mar 14 18:33:11 CDT 2017

On Mon, Mar 13, 2017 at 8:08 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Jeff Hammond <jeff.science at gmail.com> writes:
>
> > OpenMP did not prevent OpenCL,
>
> This programming model isn't really intended for architectures with
> persistent caches.
>

It's not clear to me how much this should matter in a good implementation.
The lack of implementation effort for OpenCL on cache-coherent CPU
architectures appears to be a more significant issue.

>
> > C11, C++11
>
> These are basically pthreads, which predates OpenMP.
>

I'm not sure why it matters which one came first.  POSIX standardized
threads in 1995, while OpenMP was first standardized in 1997.  However, the
first serious Pthreads implementation in Linux was in 2003.  OpenMP
standardized the best practices identified in Kuck, SGI and Cray
directives, just like POSIX presumably standardized best practices in OS
threads from various Unix implementations.

C++11 and beyond have concurrency features beyond just threads.  You
probably hate all of them because they are C++, and in any case I won't
argue, because I don't see anything that's implemented better

>
> > or Fortran 2008
>
> A different language and doesn't play well with others.
>

Sure, but you could use Fortran 2003 features to interoperate between C and
Fortran if you wanted to leverage Fortran 2008 concurrency features in an
ISO-compliant way.  I'm not suggesting you want to do this, but I dispute
the suggestion that Fortran does not play nice with C.

Fortran coarrays images are OS processes in every implementation I know,
although the standard does not explicitly require this implementation.  The
situation is identical to that of MPI, although there are actually MPI
implementations based upon OS threads rather than OS processes (and they
require compiler or OS magic to deal with non-heap data).

Both of the widely available Fortran coarray implementations use MPI-3 RMA
under the hood and all of the ones I know about define an image to be an OS
process.

> > from introducing parallelism. Not sure if your comment was meant to be
> > serious,
>
> Partially.  It was just enough to give the appearance of a solution
> while not really being a solution.
>

It still isn't clear what you actually want.  You appear to reject every
standard API for enabling explicit vectorization for CPU execution
(Fortran, OpenMP, OpenCL), which suggests that (1) you do not believe in
vectorization, (2) you think that autovectorizing compilers are sufficient,
(3) you think vector code is necessarily a non-portable software construct,
or (4) you do not think vectorization is relevant to PETSc.

Jeff

>
> > but it appears unfounded nonetheless.
> >
> > Jeff
> >
> > On Sun, Mar 12, 2017 at 11:16 AM Jed Brown <jed at jedbrown.org> wrote:
> >
> >> Implementation-defined, but it's exactly the same as malloc, which also
> >> doesn't promise unfaulted pages. This is one reason some of us keep
saying
> >> that OpenMP sucks. It's a shitty standard that obstructs better
standards
> >> from being created.
> >>
> >>
> >> On March 12, 2017 11:19:49 AM MDT, Jeff Hammond <jeff.science at gmail.com
>
> >> wrote:
> >>
> >>
> >> On Sat, Mar 11, 2017 at 9:00 AM Jed Brown <jed at jedbrown.org> wrote:
> >>
> >> Jeff Hammond <jeff.science at gmail.com> writes:
> >> > I agree 100% that multithreaded codes that fault pages from the main
> >> thread in a NUMA environment are doing something wrong ;-)
> >> >
> >> > Does calloc *guarantee* pages are not mapped? If I calloc(8), do I
get
> >> the zero page or part of the arena that's already mapped that is
zeroed by
> >> the heap manager?
> >>
> >> Is your argument that calloc() should never be used in multi-threaded
code?
> >>
> >>
> >> I never use it for code that I want to behave well in a NUMA
environment.
> >>
> >>
> >> If the allocation is larger than MMAP_THRESHOLD (128 KiB by default for
> >> glibc) then it calls mmap.  This obviously leaves an intermediate size
> >> that could be poorly mapped (assuming 4 KiB pages), but it's also so
> >> small that it easily fits in cache.
> >>
> >>
> >> Is this behavior standardized or merely implementation-defined? I'm not
> >> interested in writing code that assumes Linux/glibc.
> >>
> >> Jeff
> >>
> >>
> >> --
> >> Jeff Hammond
> >> jeff.science at gmail.com
> >> http://jeffhammond.github.io/
> >>
> >> --
> > Jeff Hammond
> > jeff.science at gmail.com
> > http://jeffhammond.github.io/

--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170314/494fc3f2/attachment.html>