[petsc-users] PETSc with modern C++

Mon Apr 3 11:45:15 CDT 2017

On Monday, 3 April 2017 02:00:53 CEST you wrote:

> On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi <filippo.leon at gmail.com>

>

> wrote:

> > Hello,

> >

> > I have a project in mind and seek feedback.

> >

> > Disclaimer: I hope I am not abusing of this mailing list with this idea.

> > If so, please ignore.

> >

> > As a thought experiment, and to have a bit of fun, I am currently

> > writing/thinking on writing, a small (modern) C++ wrapper around PETSc.

> >

> > Premise: PETSc is awesome, I love it and use in many projects.
Sometimes I

> > am just not super comfortable writing C. (I know my idea goes against

> > PETSc's design philosophy).

> >

> > I know there are many around, and there is not really a need for this

> > (especially since PETSc has his own object-oriented style), but there
are

> > a

> > few things I would like to really include in this wrapper, that I found

> > nowhere):

> > - I am currently only thinking about the Vector/Matrix/KSP/DM part of
the

> > Framework, there are many other cool things that PETSc does that I do
not

> > have the brainpower to consider those as well.

> > - expression templates (in my opinion this is where C++ shines): this

> > would replace all code bloat that a user might need with cool/easy to
read

> > expressions (this could increase the number of axpy-like routines);

> > - those expression templates should use SSE and AVX whenever available;

> > - expressions like x += alpha * y should fall back to BLAS axpy (tough

> > sometimes this is not even faster than a simple loop);

>

> The idea for the above is not clear. Do you want templates generating
calls

> to BLAS? Or scalar code that operates on raw arrays with SSE/AVX?

> There is some advantage here of expanding the range of BLAS operations,

> which has been done to death by Liz Jessup and collaborators, but not

> that much.

Templates should generate scalar code operating on raw arrays using SIMD.
But

I can detect if you want to use axpbycz or gemv, and use the blas

implementation instead. I do not think there is a point in trying to "beat"

BLAS. (Here a interesting point opens: I assume an efficient BLAS

implementation, but I am not so sure about how the different BLAS do things

internally. I work from the assumption that we have a very well tuned BLAS

implementation at our disposal).

>

> > - all calls to PETSc should be less verbose, more C++-like:

> > * for instance a VecGlobalToLocalBegin could return an empty object that

> >

> > calls VecGlobalToLocalEnd when it is destroyed.

> >

> > * some cool idea to easily write GPU kernels.

>

> If you find a way to make this pay off it would be amazing, since
currently

> nothing but BLAS3 has a hope of mattering in this context.

>

> > - the idea would be to have safer routines (at compile time), by means
of

> > RAII etc.

> >

> > I aim for zero/near-zero/negligible overhead with full optimization, for

> > that I include benchmarks and extensive test units.

> >

> > So my question is:

> > - anyone that would be interested (in the product/in developing)?

> > - anyone that has suggestions (maybe that what I have in mind is

> > nonsense)?

>

> I would suggest making a simple performance model that says what you will

> do will have at least

> a 2x speed gain. Because anything less is not worth your time, and

> inevitably you will not get the

> whole multiplier. I am really skeptical that is possible with the above

> sketch.

That I will do as next steps for sure. But I also doubt this much of will
be

achievable in any case.

>

> Second, I would try to convince myself that what you propose would be

> simpler, in terms of lines of code,

> number of objects, number of concepts, etc. Right now, that is not clear
to

> me either.

Number of objects per se may not be smaller. I am more thinking about
reducing

lines of codes (verbosity), concepts and increase safety.

I have two examples I've been burnt with in the past:

- casting to void* to pass custom contexts to PETSc routines

- forgetting to call the corresponding XXXEnd after a call to XXXBegin

(PETSc notices that, ofc., but at runtime, and that might be too late).

Example: I can imagine that I need a Petsc's internal array. In this case I

call VecGetArray. However I will inevitably foget to return the array to

PETSc. I could have my new VecArray returning an object that restores the

array

when it goes out of scope. I can also flag the function with [[nodiscard]]
to

prevent the user to destroy the returned object from the start.

>

> Baring that, maybe you can argue that new capabilities, such as the type

> flexibility described by Michael, are enabled. That

> would be the most convincing I think.

This would be very interesting indeed, but I see only two options:

- recompile PETSc twice

- manually implement all complex routines, which might be to much of a task

>

> Thanks,

>

> Matt

Thanks for the feedback Matt.

>

> If you have read up to here, thanks.

On Mon, 3 Apr 2017 at 02:00 Matthew Knepley <knepley at gmail.com> wrote:

> On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi <filippo.leon at gmail.com>
> wrote:
>
>
> Hello,
>
> I have a project in mind and seek feedback.
>
> Disclaimer: I hope I am not abusing of this mailing list with this idea.
> If so, please ignore.
>
> As a thought experiment, and to have a bit of fun, I am currently
> writing/thinking on writing, a small (modern) C++ wrapper around PETSc.
>
> Premise: PETSc is awesome, I love it and use in many projects. Sometimes I
> am just not super comfortable writing C. (I know my idea goes against
> PETSc's design philosophy).
>
> I know there are many around, and there is not really a need for this
> (especially since PETSc has his own object-oriented style), but there are a
> few things I would like to really include in this wrapper, that I found
> nowhere):
> - I am currently only thinking about the Vector/Matrix/KSP/DM part of the
> Framework, there are many other cool things that PETSc does that I do not
> have the brainpower to consider those as well.
> - expression templates (in my opinion this is where C++ shines): this
> would replace all code bloat that a user might need with cool/easy to read
> expressions (this could increase the number of axpy-like routines);
> - those expression templates should use SSE and AVX whenever available;
> - expressions like x += alpha * y should fall back to BLAS axpy (tough
> sometimes this is not even faster than a simple loop);
>
>
> The idea for the above is not clear. Do you want templates generating
> calls to BLAS? Or scalar code that operates on raw arrays with SSE/AVX?
> There is some advantage here of expanding the range of BLAS operations,
> which has been done to death by Liz Jessup and collaborators, but not
> that much.
>
>
> - all calls to PETSc should be less verbose, more C++-like:
>   * for instance a VecGlobalToLocalBegin could return an empty object that
> calls VecGlobalToLocalEnd when it is destroyed.
>   * some cool idea to easily write GPU kernels.
>
>
> If you find a way to make this pay off it would be amazing, since
> currently nothing but BLAS3 has a hope of mattering in this context.
>
>
> - the idea would be to have safer routines (at compile time), by means of
> RAII etc.
>
> I aim for zero/near-zero/negligible overhead with full optimization, for
> that I include benchmarks and extensive test units.
>
> So my question is:
> - anyone that would be interested (in the product/in developing)?
> - anyone that has suggestions (maybe that what I have in mind is nonsense)?
>
>
> I would suggest making a simple performance model that says what you will
> do will have at least
> a 2x speed gain. Because anything less is not worth your time, and
> inevitably you will not get the
> whole multiplier. I am really skeptical that is possible with the above
> sketch.
>
> Second, I would try to convince myself that what you propose would be
> simpler, in terms of lines of code,
> number of objects, number of concepts, etc. Right now, that is not clear
> to me either.
>
> Baring that, maybe you can argue that new capabilities, such as the type
> flexibility described by Michael, are enabled. That
> would be the most convincing I think.
>
>   Thanks,
>
>      Matt
>
> If you have read up to here, thanks.
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170403/94cbede2/attachment-0001.html>