[petsc-users] PETSc with modern C++

Mon Apr 3 16:50:26 CDT 2017

On Mon, Apr 3, 2017 at 11:45 AM, Filippo Leonardi <filippo.leon at gmail.com>
wrote:

> On Monday, 3 April 2017 02:00:53 CEST you wrote:
>
> > On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi <filippo.leon at gmail.com
> >
>
> >
>
> > wrote:
>
> > > Hello,
>
> > >
>
> > > I have a project in mind and seek feedback.
>
> > >
>
> > > Disclaimer: I hope I am not abusing of this mailing list with this
> idea.
>
> > > If so, please ignore.
>
> > >
>
> > > As a thought experiment, and to have a bit of fun, I am currently
>
> > > writing/thinking on writing, a small (modern) C++ wrapper around PETSc.
>
> > >
>
> > > Premise: PETSc is awesome, I love it and use in many projects.
> Sometimes I
>
> > > am just not super comfortable writing C. (I know my idea goes against
>
> > > PETSc's design philosophy).
>
> > >
>
> > > I know there are many around, and there is not really a need for this
>
> > > (especially since PETSc has his own object-oriented style), but there
> are
>
> > > a
>
> > > few things I would like to really include in this wrapper, that I found
>
> > > nowhere):
>
> > > - I am currently only thinking about the Vector/Matrix/KSP/DM part of
> the
>
> > > Framework, there are many other cool things that PETSc does that I do
> not
>
> > > have the brainpower to consider those as well.
>
> > > - expression templates (in my opinion this is where C++ shines): this
>
> > > would replace all code bloat that a user might need with cool/easy to
> read
>
> > > expressions (this could increase the number of axpy-like routines);
>
> > > - those expression templates should use SSE and AVX whenever available;
>
> > > - expressions like x += alpha * y should fall back to BLAS axpy (tough
>
> > > sometimes this is not even faster than a simple loop);
>
> >
>
> > The idea for the above is not clear. Do you want templates generating
> calls
>
> > to BLAS? Or scalar code that operates on raw arrays with SSE/AVX?
>
> > There is some advantage here of expanding the range of BLAS operations,
>
> > which has been done to death by Liz Jessup and collaborators, but not
>
> > that much.
>
>
> Templates should generate scalar code operating on raw arrays using SIMD.
> But
>
> I can detect if you want to use axpbycz or gemv, and use the blas
>
> implementation instead. I do not think there is a point in trying to
> "beat"
>
> BLAS. (Here a interesting point opens: I assume an efficient BLAS
>
> implementation, but I am not so sure about how the different BLAS do
> things
>
> internally. I work from the assumption that we have a very well tuned BLAS
>
> implementation at our disposal).
>

The speed improvement comes from pulling vectors through memory fewer times
by
merging operations (kernel fusion).

> >
>
> > > - all calls to PETSc should be less verbose, more C++-like:
>
> > > * for instance a VecGlobalToLocalBegin could return an empty object
> that
>
> > >
>
> > > calls VecGlobalToLocalEnd when it is destroyed.
>
> > >
>
> > > * some cool idea to easily write GPU kernels.
>
> >
>
> > If you find a way to make this pay off it would be amazing, since
> currently
>
> > nothing but BLAS3 has a hope of mattering in this context.
>
> >
>
> > > - the idea would be to have safer routines (at compile time), by means
> of
>
> > > RAII etc.
>
> > >
>
> > > I aim for zero/near-zero/negligible overhead with full optimization,
> for
>
> > > that I include benchmarks and extensive test units.
>
> > >
>
> > > So my question is:
>
> > > - anyone that would be interested (in the product/in developing)?
>
> > > - anyone that has suggestions (maybe that what I have in mind is
>
> > > nonsense)?
>
> >
>
> > I would suggest making a simple performance model that says what you will
>
> > do will have at least
>
> > a 2x speed gain. Because anything less is not worth your time, and
>
> > inevitably you will not get the
>
> > whole multiplier. I am really skeptical that is possible with the above
>
> > sketch.
>
>
> That I will do as next steps for sure. But I also doubt this much of will
> be
>
> achievable in any case.
>
>
> >
>
> > Second, I would try to convince myself that what you propose would be
>
> > simpler, in terms of lines of code,
>
> > number of objects, number of concepts, etc. Right now, that is not clear
> to
>
> > me either.
>
>
> Number of objects per se may not be smaller. I am more thinking about
> reducing
>
> lines of codes (verbosity), concepts and increase safety.
>
>
> I have two examples I've been burnt with in the past:
>
> - casting to void* to pass custom contexts to PETSc routines
>

I don't think you can beat this cast, and I do not know of a way to enforce
type
safety with this open world assumption using C++ either.

> - forgetting to call the corresponding XXXEnd after a call to XXXBegin
>
> (PETSc notices that, ofc., but at runtime, and that might be too late).
>
>
> Example: I can imagine that I need a Petsc's internal array. In this case
> I
>
> call VecGetArray. However I will inevitably foget to return the array to
>
> PETSc. I could have my new VecArray returning an object that restores the
>
> array
>
> when it goes out of scope. I can also flag the function with [[nodiscard]]
> to
>
> prevent the user to destroy the returned object from the start.
>

Jed claims that this pattern is no longer preferred, but I have forgotten
his argument.
Jed?

> >
>
> > Baring that, maybe you can argue that new capabilities, such as the type
>
> > flexibility described by Michael, are enabled. That
>
> > would be the most convincing I think.
>
>
> This would be very interesting indeed, but I see only two options:
>
> - recompile PETSc twice
>
> - manually implement all complex routines, which might be to much of a task
>

We have had this discussion for years on this list. Having separate names
for each type
is really ugly and does not achieve what we want. We want smooth
interoperability between
objects with different backing types, but it is still not clear how to do
this.

  Thanks,

     Matt

> >
>
> > Thanks,
>
> >
>
> > Matt
>
>
> Thanks for the feedback Matt.
>
>
> >
>
> > If you have read up to here, thanks.
>
>
>
>
> On Mon, 3 Apr 2017 at 02:00 Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi <filippo.leon at gmail.com>
>> wrote:
>>
>>
>> Hello,
>>
>> I have a project in mind and seek feedback.
>>
>> Disclaimer: I hope I am not abusing of this mailing list with this idea.
>> If so, please ignore.
>>
>> As a thought experiment, and to have a bit of fun, I am currently
>> writing/thinking on writing, a small (modern) C++ wrapper around PETSc.
>>
>> Premise: PETSc is awesome, I love it and use in many projects. Sometimes
>> I am just not super comfortable writing C. (I know my idea goes against
>> PETSc's design philosophy).
>>
>> I know there are many around, and there is not really a need for this
>> (especially since PETSc has his own object-oriented style), but there are a
>> few things I would like to really include in this wrapper, that I found
>> nowhere):
>> - I am currently only thinking about the Vector/Matrix/KSP/DM part of the
>> Framework, there are many other cool things that PETSc does that I do
>> not have the brainpower to consider those as well.
>> - expression templates (in my opinion this is where C++ shines): this
>> would replace all code bloat that a user might need with cool/easy to read
>> expressions (this could increase the number of axpy-like routines);
>> - those expression templates should use SSE and AVX whenever available;
>> - expressions like x += alpha * y should fall back to BLAS axpy (tough
>> sometimes this is not even faster than a simple loop);
>>
>>
>> The idea for the above is not clear. Do you want templates generating
>> calls to BLAS? Or scalar code that operates on raw arrays with SSE/AVX?
>> There is some advantage here of expanding the range of BLAS operations,
>> which has been done to death by Liz Jessup and collaborators, but not
>> that much.
>>
>>
>> - all calls to PETSc should be less verbose, more C++-like:
>>   * for instance a VecGlobalToLocalBegin could return an empty object
>> that calls VecGlobalToLocalEnd when it is destroyed.
>>   * some cool idea to easily write GPU kernels.
>>
>>
>> If you find a way to make this pay off it would be amazing, since
>> currently nothing but BLAS3 has a hope of mattering in this context.
>>
>>
>> - the idea would be to have safer routines (at compile time), by means
>> of RAII etc.
>>
>> I aim for zero/near-zero/negligible overhead with full optimization, for
>> that I include benchmarks and extensive test units.
>>
>> So my question is:
>> - anyone that would be interested (in the product/in developing)?
>> - anyone that has suggestions (maybe that what I have in mind is
>> nonsense)?
>>
>>
>> I would suggest making a simple performance model that says what you will
>> do will have at least
>> a 2x speed gain. Because anything less is not worth your time, and
>> inevitably you will not get the
>> whole multiplier. I am really skeptical that is possible with the above
>> sketch.
>>
>> Second, I would try to convince myself that what you propose would be
>> simpler, in terms of lines of code,
>> number of objects, number of concepts, etc. Right now, that is not clear
>> to me either.
>>
>> Baring that, maybe you can argue that new capabilities, such as the type
>> flexibility described by Michael, are enabled. That
>> would be the most convincing I think.
>>
>>   Thanks,
>>
>>      Matt
>>
>> If you have read up to here, thanks.
>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170403/6c46d4e3/attachment-0001.html>