<div dir="ltr"><p style="margin:0px">On Monday, 3 April 2017 02:00:53 CEST you wrote:</p>

<p style="margin:0px">> On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi <<a href="mailto:filippo.leon@gmail.com">filippo.leon@gmail.com</a>></p>

<p style="margin:0px">> </p>

<p style="margin:0px">> wrote:</p>

<p style="margin:0px">> > Hello,</p>

<p style="margin:0px">> > </p>

<p style="margin:0px">> > I have a project in mind and seek feedback.</p>

<p style="margin:0px">> > </p>

<p style="margin:0px">> > Disclaimer: I hope I am not abusing of this mailing list with this idea.</p>

<p style="margin:0px">> > If so, please ignore.</p>

<p style="margin:0px">> > </p>

<p style="margin:0px">> > As a thought experiment, and to have a bit of fun, I am currently</p>

<p style="margin:0px">> > writing/thinking on writing, a small (modern) C++ wrapper around PETSc.</p>

<p style="margin:0px">> > </p>

<p style="margin:0px">> > Premise: PETSc is awesome, I love it and use in many projects. Sometimes I</p>

<p style="margin:0px">> > am just not super comfortable writing C. (I know my idea goes against</p>

<p style="margin:0px">> > PETSc's design philosophy).</p>

<p style="margin:0px">> > </p>

<p style="margin:0px">> > I know there are many around, and there is not really a need for this</p>

<p style="margin:0px">> > (especially since PETSc has his own object-oriented style), but there are</p>

<p style="margin:0px">> > a</p>

<p style="margin:0px">> > few things I would like to really include in this wrapper, that I found</p>

<p style="margin:0px">> > nowhere):</p>

<p style="margin:0px">> > - I am currently only thinking about the Vector/Matrix/KSP/DM part of the</p>

<p style="margin:0px">> > Framework, there are many other cool things that PETSc does that I do not</p>

<p style="margin:0px">> > have the brainpower to consider those as well.</p>

<p style="margin:0px">> > - expression templates (in my opinion this is where C++ shines): this</p>

<p style="margin:0px">> > would replace all code bloat that a user might need with cool/easy to read</p>

<p style="margin:0px">> > expressions (this could increase the number of axpy-like routines);</p>

<p style="margin:0px">> > - those expression templates should use SSE and AVX whenever available;</p>

<p style="margin:0px">> > - expressions like x += alpha * y should fall back to BLAS axpy (tough</p>

<p style="margin:0px">> > sometimes this is not even faster than a simple loop);</p>

<p style="margin:0px">> </p>

<p style="margin:0px">> The idea for the above is not clear. Do you want templates generating calls</p>

<p style="margin:0px">> to BLAS? Or scalar code that operates on raw arrays with SSE/AVX?</p>

<p style="margin:0px">> There is some advantage here of expanding the range of BLAS operations,</p>

<p style="margin:0px">> which has been done to death by Liz Jessup and collaborators, but not</p>

<p style="margin:0px">> that much.</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">Templates should generate scalar code operating on raw arrays using SIMD. But </p>

<p style="margin:0px">I can detect if you want to use axpbycz or gemv, and use the blas </p>

<p style="margin:0px">implementation instead. I do not think there is a point in trying to "beat" </p>

<p style="margin:0px">BLAS. (Here a interesting point opens: I assume an efficient BLAS </p>

<p style="margin:0px">implementation, but I am not so sure about how the different BLAS do things </p>

<p style="margin:0px">internally. I work from the assumption that we have a very well tuned BLAS </p>

<p style="margin:0px">implementation at our disposal).</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">> </p>

<p style="margin:0px">> > - all calls to PETSc should be less verbose, more C++-like:</p>

<p style="margin:0px">> >   * for instance a VecGlobalToLocalBegin could return an empty object that</p>

<p style="margin:0px">> > </p>

<p style="margin:0px">> > calls VecGlobalToLocalEnd when it is destroyed.</p>

<p style="margin:0px">> > </p>

<p style="margin:0px">> >   * some cool idea to easily write GPU kernels.</p>

<p style="margin:0px">> </p>

<p style="margin:0px">> If you find a way to make this pay off it would be amazing, since currently</p>

<p style="margin:0px">> nothing but BLAS3 has a hope of mattering in this context.</p>

<p style="margin:0px">> </p>

<p style="margin:0px">> > - the idea would be to have safer routines (at compile time), by means of</p>

<p style="margin:0px">> > RAII etc.</p>

<p style="margin:0px">> > </p>

<p style="margin:0px">> > I aim for zero/near-zero/negligible overhead with full optimization, for</p>

<p style="margin:0px">> > that I include benchmarks and extensive test units.</p>

<p style="margin:0px">> > </p>

<p style="margin:0px">> > So my question is:</p>

<p style="margin:0px">> > - anyone that would be interested (in the product/in developing)?</p>

<p style="margin:0px">> > - anyone that has suggestions (maybe that what I have in mind is</p>

<p style="margin:0px">> > nonsense)?</p>

<p style="margin:0px">> </p>

<p style="margin:0px">> I would suggest making a simple performance model that says what you will</p>

<p style="margin:0px">> do will have at least</p>

<p style="margin:0px">> a 2x speed gain. Because anything less is not worth your time, and</p>

<p style="margin:0px">> inevitably you will not get the</p>

<p style="margin:0px">> whole multiplier. I am really skeptical that is possible with the above</p>

<p style="margin:0px">> sketch.</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">That I will do as next steps for sure. But I also doubt this much of will be </p>

<p style="margin:0px">achievable in any case.</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">> </p>

<p style="margin:0px">> Second, I would try to convince myself that what you propose would be</p>

<p style="margin:0px">> simpler, in terms of lines of code,</p>

<p style="margin:0px">> number of objects, number of concepts, etc. Right now, that is not clear to</p>

<p style="margin:0px">> me either.</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">Number of objects per se may not be smaller. I am more thinking about reducing </p>

<p style="margin:0px">lines of codes (verbosity), concepts and increase safety.</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">I have two examples I've been burnt with in the past:</p>

<p style="margin:0px">- casting to void* to pass custom contexts to PETSc routines</p>

<p style="margin:0px">- forgetting to call the corresponding XXXEnd after a call to XXXBegin</p>

<p style="margin:0px">(PETSc notices that, ofc., but at runtime, and that might be too late).</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">Example: I can imagine that I need a Petsc's internal array. In this case I </p>

<p style="margin:0px">call VecGetArray. However I will inevitably foget to return the array to </p>

<p style="margin:0px">PETSc. I could have my new VecArray returning an object that restores the </p>

<p style="margin:0px">array</p>

<p style="margin:0px">when it goes out of scope. I can also flag the function with [[nodiscard]] to </p>

<p style="margin:0px">prevent the user to destroy the returned object from the start.</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">> </p>

<p style="margin:0px">> Baring that, maybe you can argue that new capabilities, such as the type</p>

<p style="margin:0px">> flexibility described by Michael, are enabled. That</p>

<p style="margin:0px">> would be the most convincing I think.</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">This would be very interesting indeed, but I see only two options:</p>

<p style="margin:0px">- recompile PETSc twice</p>

<p style="margin:0px">- manually implement all complex routines, which might be to much of a task</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">> </p>

<p style="margin:0px">>   Thanks,</p>

<p style="margin:0px">> </p>

<p style="margin:0px">>      Matt</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">Thanks for the feedback Matt.</p>

<p style="margin:0px"><br></p>

<p style="margin:0px">> </p>

<p style="margin:0px">> If you have read up to here, thanks.</p>

<p style="margin:0px"><br></p>

<p style="margin:0px"><br></p><br><div class="gmail_quote"><div dir="ltr">On Mon, 3 Apr 2017 at 02:00 Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg">On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi <span dir="ltr" class="gmail_msg"><<a href="mailto:filippo.leon@gmail.com" class="gmail_msg" target="_blank">filippo.leon@gmail.com</a>></span> wrote:<br class="gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><br class="gmail_msg"><div class="gmail_msg">Hello,</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">I have a project in mind and seek feedback.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Disclaimer: I hope I am not abusing of this mailing list with this idea. If so, please ignore.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">As a thought experiment, and to have a bit of fun, I am currently writing/thinking on writing, a small (modern) C++ wrapper around PETSc.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"><span style="line-height:15.6px" class="gmail_msg">Premise: PETSc is awesome, I love it and use in many projects. Sometimes I am just not super comfortable writing C. (I know my idea goes against PETSc's design philosophy).</span><br class="gmail_msg"></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">I know there are many around, and there is not really a need for this (especially since PETSc has his own object-oriented style), but there are a few things I would like to <span style="line-height:1.5" class="gmail_msg">really include in this wrapper, that I found nowhere):</span></div><div class="gmail_msg"><span style="line-height:1.5" class="gmail_msg">- I am currently only thinking about the Vector/Matrix/KSP/DM part of the Framework, there are many </span><span style="line-height:1.5" class="gmail_msg">other cool things that PETSc does that I do not have the brainpower to consider those as well.</span></div><div class="gmail_msg">- expression templates (in my opinion this is where C++ shines): this would replace all code bloat that a user might need with cool/easy to read expressions (this could increase the number of axpy-like routines);</div><div class="gmail_msg">- those expression templates should use SSE and AVX whenever available;</div><div class="gmail_msg">- expressions like x += alpha * y should fall back to BLAS axpy (tough sometimes this is not even faster than a simple loop);</div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg">The idea for the above is not clear. Do you want templates generating calls to BLAS? Or scalar code that operates on raw arrays with SSE/AVX?</div><div class="gmail_msg">There is some advantage here of expanding the range of BLAS operations, which has been done to death by Liz Jessup and collaborators, but not</div><div class="gmail_msg">that much.</div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_msg">- all calls to PETSc should be less verbose, more C++-like:</div><div class="gmail_msg">  * for instance a VecGlobalToLocalBegin could return an empty object that calls VecGlobalToLocalEnd <span style="line-height:1.5" class="gmail_msg">when it is destroyed.</span></div><div class="gmail_msg">  * some cool idea to easily write GPU kernels.</div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg">If you find a way to make this pay off it would be amazing, since currently nothing but BLAS3 has a hope of mattering in this context.</div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_msg">- the idea would be to have safer routines<span class="m_6609298410028807009m_-1883525242653817249m_-9209114810536857738inbox-inbox-Apple-converted-space gmail_msg" style="line-height:15.6px"> </span><span style="line-height:15.6px" class="gmail_msg">(at compile time)</span><span style="line-height:1.5" class="gmail_msg">, by means of RAII etc.</span></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">I aim for zero/near-zero/negligible overhead with full optimization, for that I include benchmarks and extensive test units.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">So my question is:</div><div class="gmail_msg">- anyone that would be interested (in the product/in developing)?</div><div class="gmail_msg">- anyone that has suggestions (maybe that what I have in mind is nonsense)?</div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg">I would suggest making a simple performance model that says what you will do will have at least</div><div class="gmail_msg">a 2x speed gain. Because anything less is not worth your time, and inevitably you will not get the</div><div class="gmail_msg">whole multiplier. I am really skeptical that is possible with the above sketch.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Second, I would try to convince myself that what you propose would be simpler, in terms of lines of code,</div><div class="gmail_msg">number of objects, number of concepts, etc. Right now, that is not clear to me either.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Baring that, maybe you can argue that new capabilities, such as the type flexibility described by Michael, are enabled. That</div><div class="gmail_msg">would be the most convincing I think.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">  Thanks,</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">     Matt</div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_msg">If you have read up to here, thanks.</div></div>

</blockquote></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><br class="gmail_msg"><br clear="all" class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div>-- <br class="gmail_msg"><div class="m_6609298410028807009m_-1883525242653817249gmail_signature gmail_msg" data-smartmail="gmail_signature">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br class="gmail_msg">-- Norbert Wiener</div>

</div></div></blockquote></div></div>