[petsc-dev] fork for programming models debate (was "Using multiple mallocs with PETSc")

Tue Mar 14 22:52:16 CDT 2017

Jeff Hammond <jeff.science at gmail.com> writes:

> On Mon, Mar 13, 2017 at 8:08 PM, Jed Brown <jed at jedbrown.org> wrote:
>>
>> Jeff Hammond <jeff.science at gmail.com> writes:
>>
>> > OpenMP did not prevent OpenCL,
>>
>> This programming model isn't really intended for architectures with
>> persistent caches.
>>
>
> It's not clear to me how much this should matter in a good implementation.
> The lack of implementation effort for OpenCL on cache-coherent CPU
> architectures appears to be a more significant issue.

How do you keep data resident in cache between kernel launches?

>> > C11, C++11
>>
>> These are basically pthreads, which predates OpenMP.
>>
>
> I'm not sure why it matters which one came first.  POSIX standardized
> threads in 1995, while OpenMP was first standardized in 1997.  However, the
> first serious Pthreads implementation in Linux was in 2003.  

And the first serious OpenMP on OS X was when?

> OpenMP standardized the best practices identified in Kuck, SGI and
> Cray directives, just like POSIX presumably standardized best
> practices in OS threads from various Unix implementations.
>
> C++11 and beyond have concurrency features beyond just threads.  You
> probably hate all of them because they are C++, and in any case I won't
> argue, because I don't see anything that's implemented better
>
>>
>> > or Fortran 2008
>>
>> A different language and doesn't play well with others.
>>
>
> Sure, but you could use Fortran 2003 features to interoperate between C and
> Fortran if you wanted to leverage Fortran 2008 concurrency features in an
> ISO-compliant way.  I'm not suggesting you want to do this, but I dispute
> the suggestion that Fortran does not play nice with C.

I think the above qualifies as not playing nicely in this context.

> Fortran coarrays images are OS processes in every implementation I know,
> although the standard does not explicitly require this implementation.  The
> situation is identical to that of MPI, although there are actually MPI
> implementations based upon OS threads rather than OS processes (and they
> require compiler or OS magic to deal with non-heap data).
>
> Both of the widely available Fortran coarray implementations use MPI-3 RMA
> under the hood and all of the ones I know about define an image to be an OS
> process.

Are you trying to sell PETSc on MPI?

>> > from introducing parallelism. Not sure if your comment was meant to be
>> > serious,
>>
>> Partially.  It was just enough to give the appearance of a solution
>> while not really being a solution.
>>
>
> It still isn't clear what you actually want.  You appear to reject every
> standard API for enabling explicit vectorization for CPU execution
> (Fortran, OpenMP, OpenCL), which suggests that (1) you do not believe in
> vectorization, (2) you think that autovectorizing compilers are sufficient,
> (3) you think vector code is necessarily a non-portable software construct,
> or (4) you do not think vectorization is relevant to PETSc.

OpenMP is strictly about vectorization with nothing to do with threads
and MPI is sufficient?  I don't have a problem with that, but will
probably stick to attributes and intrinsics instead of omp simd, at
least until it matures and demonstrates feature parity.

Have you tried writing a BLIS microkernel using omp simd?  Is it any
good?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20170314/c7e3df00/attachment.sig>