[petsc-users] PETSc with modern C++
Filippo Leonardi
filippo.leon at gmail.com
Wed Apr 5 06:39:53 CDT 2017
Awesome, I appreciate the feedback of all of you.
@Karli: Just to be clear: my original idea is to wrap up and make PETSc C++
bindings in a clean C++-ish way. Then an added value would be expression
templates ("easy" to implement once you have a clear infrastructure), with
all the benefits and drawbacks, as we discussed. But this would just be an
"extra" thing.
For my computations: I really have to say that I must've messed up
something (gcc not vectorizing some PETSc call), and I really do not expect
a 4x improvemet from templates. If I manage, I will do some precise test
and provide you a simple code to do so (if you are interested), since I am
not sure I will have KNL available for those tests.
On Wed, 5 Apr 2017 at 11:19 Karl Rupp <rupp at iue.tuwien.ac.at> wrote:
> Hi Filippo,
>
> did you compile PETSc with the same level of optimization than your
> template code? In particular, did you turn debugging off for the timings?
>
> Either way, let me share some of the mental stages I've gone through
> with ViennaCL. It started out with the same premises as you provide:
> There is nice C++, compilers are 'now' mature enough, and everything can
> be made a lot 'easier' and 'simpler' for the user. This was in 2010,
> back when all this GPU computing took off.
>
> Now, fast-forward to this day, I consider the extensive use of C++ with
> expression templates and other fanciness to be the biggest mistake of
> ViennaCL. Here is an incomplete list of why:
>
> a) The 'generic' interface mimics that of Boost.uBLAS, so you can just
> mix&match the two libraries. Problems: First, one replicates the really
> poor design choices in uBLAS. Second, most users didn't even *think*
> about using these generic routines for anything other than ViennaCL
> types. There are just many more computational scientists familiar with
> Fortran or C than with "modern C++".
>
> b) Keeping the type system consistent. So you would probably introduce
> a matrix, a vector, and operator overloads. For example, take y = A * x
> for a matrix A and vectors y, x. Sooner or later, users will write y = R
> * A * P * x and all of a sudden your left-to-right associativity results
> in an evaluation as
> y = ((R * A) * P) * x.
> (How to 'solve' this particular case: Introduce even more expression
> templates! See Blaze for an example. So you soon have hundreds of lines
> of code just to fix problems you never had with just calling functions.)
>
> c) Deal with errors: Continuing with the above example, what if R * A
> runs out of memory? Sure, you can throw exceptions, but the resulting
> stack trace will contain all the expression template frenzy, plus you
> will have a hard time to find out whether the operation R * A or the
> subsequent multiplication with P causes the error.
>
> d) Expression templates are limited to a single line of code. Every
> now and then, algorithms require repeated operations on different data.
> For example, you may have vector operations of the form
> x_i <- x_{i-1} + alpha * p_{i-1}
> r_i <- r_{i-1} - alpha * y_i
> p_i <- r_i + beta * p_{i-1}
> (with alpha and beta being scalars, everything else being vectors)
> in a pipelined conjugate gradient formulation. The "C"-way of doing this
> is to grab the pointers to the data and compute all three vector
> operations in a single for-loop, thus saving on repeated data access
> from main memory. With expression templates, there is no way to make
> this as efficient, because the lifetime of expression templates is
> exactly one line of code.
>
> e) Lack of a stable ABI. There is a good reason why many vendor
> libraries come with a C interface, not with a C++ interface. If you try
> to link C++ object files generated by different C++ compilers (it is
> enough to have different versions of the same compiler, see Microsoft),
> it is undefined behavior. In best case, it fails with a linker error. In
> worst case, it will produce random crashes right before the end of a
> one-month simulation that you needed for your paper to be submitted in a
> few days. If I remember correctly, one group got ambitious with C++ for
> writing an MPI library, running into these kinds of problems with the
> C++ standard library.
>
> f) The entry bar gets way too high. PETSc's current code base allows
> several dozens of users to contribute to each release. If some problem
> shows up in a particular function, you just go there, hook up the
> debugger, and figure out what is going on. Now assume there are C++
> expression templates involved. Your stack trace may now spread over
> several screen pages. Once you navigate to the offending line, you find
> that a certain overload of a traits class coupled with two functors
> provides wrong results. The reverse lookup of how exactly the traits
> class was instantiated, which working set was passed in, and how those
> functors interact may easily take you many minutes to digest - if you
> are the guy who has written that piece of code. People new to PETSc are
> likely to just give up. Note that this is a problem for new library
> users across the spectrum of C++ libraries; you can find evidence e.g.
> on the Eigen mailinglist. In the end, C++ templates are leaky
> abstractions and their use will sooner or later hit the user.
>
> Long story short: Keep in mind that there is a cost to "modern C++" in
> exchange for eventual performance benefits. For PETSc applications, the
> bottleneck is all too often not the performance of an expression that
> could be 'expression templated', but in areas where the use of C++
> wouldn't make a difference: algorithm selection, network performance, or
> poor process placement, just to name a few.
>
> Nonetheless, don't feel set back; you may have better ideas and know
> better ways of dealing with C++ than we do. A lightweight wrapper on top
> of PETSc, similar to what petsc4py does for Python, is something that a
> share of users may find handy. Whether it's worth the extra coding and
> maintenance effort? I don't know.
>
> Best regards,
> Karli
>
>
> On 04/05/2017 07:42 AM, Filippo Leonardi wrote:
> > @jed: You assembly is what I would've expected. Let me simplify my code
> > and see if I can provide a useful test example. (also: I assume your
> > assembly is for xeon, so I should definitely use avx512).
> >
> > Let me get back at you in a few days (work permitting) with something
> > you can use.
> >
> > From your example I wouldn't expect any benefit with my code compared to
> > just calling petsc (for those simple kernels).
> >
> > A big plus I hadn't thought of, would be that the compiler is really
> > forced to vectorise (like in my case, where I might'have messed up some
> > config parameter).
> >
> > @barry: I'm definitely too young to comment here (i.e. it's me that
> > changed, not the world). Definitely this is not new stuff, and, for
> > instance, Armadillo/boost/Eigen have been successfully production ready
> > for many years now. I have somehow the impression that now that c++11 is
> > more mainstream, it is much easier to write easily readable/maintainable
> > code (still ugly as hell tough). I think we can now give for granted a
> > c++11 compiler on any "supercomputer", and even c++14 and soon c++17...
> > and this makes development and interfaces much nicer.
> >
> > What I would like to see is something like PETSc (where I have nice,
> > hidden MPI calls for instance), combined with the niceness of those
> > libraries (where many operations can be written in a, if I might say so,
> > more natural way). (My plan is: you did all the hard work, C++ can put a
> > ribbon on it and see what comes out.)
> >
> > On 5 Apr 2017 5:39 am, "Jed Brown" <jed at jedbrown.org
> > <mailto:jed at jedbrown.org>> wrote:
> >
> > Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
> writes:
> >
> > > On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown <jed at jedbrown.org
> > <mailto:jed at jedbrown.org>> wrote:
> > >
> > >> Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
> > writes:
> > >>
> > >> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi
> > <filippo.leon at gmail.com <mailto:filippo.leon at gmail.com>
> > >> >
> > >> > wrote:
> > >> >
> > >> >> I had weird issues where gcc (that I am using for my tests
> > right now)
> > >> >> wasn't vectorising properly (even enabling all flags, from
> > >> tree-vectorize,
> > >> >> to mavx). According to my tests, I know the Intel compiler was
> > a bit
> > >> better
> > >> >> at that.
> > >> >>
> > >> >
> > >> > We are definitely at the mercy of the compiler for this. Maybe
> > Jed has an
> > >> > idea why its not vectorizing.
> > >>
> > >> Is this so bad?
> > >>
> > >> 000000000024080e <VecMAXPY_Seq+0x2fe> mov rax,QWORD PTR
> [rbp-0xb0]
> > >> 0000000000240815 <VecMAXPY_Seq+0x305> add ebx,0x1
> > >> 0000000000240818 <VecMAXPY_Seq+0x308> vmulpd ymm0,ymm7,YMMWORD PTR
> > >> [rax+r9*1]
> > >> 000000000024081e <VecMAXPY_Seq+0x30e> mov rax,QWORD PTR
> [rbp-0xa8]
> > >> 0000000000240825 <VecMAXPY_Seq+0x315> vfmadd231pd
> > ymm0,ymm8,YMMWORD PTR
> > >> [rax+r9*1]
> > >> 000000000024082b <VecMAXPY_Seq+0x31b> mov rax,QWORD PTR
> [rbp-0xb8]
> > >> 0000000000240832 <VecMAXPY_Seq+0x322> vfmadd231pd
> > ymm0,ymm6,YMMWORD PTR
> > >> [rax+r9*1]
> > >> 0000000000240838 <VecMAXPY_Seq+0x328> vfmadd231pd
> > ymm0,ymm5,YMMWORD PTR
> > >> [r10+r9*1]
> > >> 000000000024083e <VecMAXPY_Seq+0x32e> vaddpd ymm0,ymm0,YMMWORD PTR
> > >> [r11+r9*1]
> > >> 0000000000240844 <VecMAXPY_Seq+0x334> vmovapd YMMWORD PTR
> > [r11+r9*1],ymm0
> > >> 000000000024084a <VecMAXPY_Seq+0x33a> add r9,0x20
> > >> 000000000024084e <VecMAXPY_Seq+0x33e> cmp DWORD PTR
> [rbp-0xa0],ebx
> > >> 0000000000240854 <VecMAXPY_Seq+0x344> ja 000000000024080e
> > >> <VecMAXPY_Seq+0x2fe>
> > >>
> > >
> > > I agree that is what we should see. It cannot be what Fillippo has
> > if he is
> > > getting ~4x with the template stuff.
> >
> > I'm using gcc. Fillippo, can you make an easy to run test that we
> can
> > evaluate on Xeon and KNL?
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170405/5a41501b/attachment.html>
More information about the petsc-users
mailing list