[petsc-users] PETSc with modern C++

Wed Apr 5 00:42:42 CDT 2017

@jed: You assembly is what I would've expected. Let me simplify my code and
see if I can provide a useful test example. (also: I assume your assembly
is for xeon, so I should definitely use avx512).

Let me get back at you in a few days (work permitting) with something you
can use.

>From your example I wouldn't expect any benefit with my code compared to
just calling petsc (for those simple kernels).

A big plus I hadn't thought of, would be that the compiler is really forced
to vectorise (like in my case, where I might'have messed up some config
parameter).

@barry: I'm definitely too young to comment here (i.e. it's me that
changed, not the world). Definitely this is not new stuff, and, for
instance, Armadillo/boost/Eigen have been successfully production ready for
many years now. I have somehow the impression that now that c++11 is more
mainstream, it is much easier to write easily readable/maintainable code
(still ugly as hell tough). I think we can now give for granted a c++11
compiler on any "supercomputer", and even c++14 and soon c++17... and this
makes development and interfaces much nicer.

What I would like to see is something like PETSc (where I have nice, hidden
MPI calls for instance), combined with the niceness of those libraries
(where many operations can be written in a, if I might say so, more natural
way). (My plan is: you did all the hard work, C++ can put a ribbon on it
and see what comes out.)

On 5 Apr 2017 5:39 am, "Jed Brown" <jed at jedbrown.org> wrote:

Matthew Knepley <knepley at gmail.com> writes:

> On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown <jed at jedbrown.org> wrote:
>
>> Matthew Knepley <knepley at gmail.com> writes:
>>
>> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi <
filippo.leon at gmail.com
>> >
>> > wrote:
>> >
>> >> I had weird issues where gcc (that I am using for my tests right now)
>> >> wasn't vectorising properly (even enabling all flags, from
>> tree-vectorize,
>> >> to mavx). According to my tests, I know the Intel compiler was a bit
>> better
>> >> at that.
>> >>
>> >
>> > We are definitely at the mercy of the compiler for this. Maybe Jed has
an
>> > idea why its not vectorizing.
>>
>> Is this so bad?
>>
>> 000000000024080e <VecMAXPY_Seq+0x2fe> mov    rax,QWORD PTR [rbp-0xb0]
>> 0000000000240815 <VecMAXPY_Seq+0x305> add    ebx,0x1
>> 0000000000240818 <VecMAXPY_Seq+0x308> vmulpd ymm0,ymm7,YMMWORD PTR
>> [rax+r9*1]
>> 000000000024081e <VecMAXPY_Seq+0x30e> mov    rax,QWORD PTR [rbp-0xa8]
>> 0000000000240825 <VecMAXPY_Seq+0x315> vfmadd231pd ymm0,ymm8,YMMWORD PTR
>> [rax+r9*1]
>> 000000000024082b <VecMAXPY_Seq+0x31b> mov    rax,QWORD PTR [rbp-0xb8]
>> 0000000000240832 <VecMAXPY_Seq+0x322> vfmadd231pd ymm0,ymm6,YMMWORD PTR
>> [rax+r9*1]
>> 0000000000240838 <VecMAXPY_Seq+0x328> vfmadd231pd ymm0,ymm5,YMMWORD PTR
>> [r10+r9*1]
>> 000000000024083e <VecMAXPY_Seq+0x32e> vaddpd ymm0,ymm0,YMMWORD PTR
>> [r11+r9*1]
>> 0000000000240844 <VecMAXPY_Seq+0x334> vmovapd YMMWORD PTR [r11+r9*1],ymm0
>> 000000000024084a <VecMAXPY_Seq+0x33a> add    r9,0x20
>> 000000000024084e <VecMAXPY_Seq+0x33e> cmp    DWORD PTR [rbp-0xa0],ebx
>> 0000000000240854 <VecMAXPY_Seq+0x344> ja     000000000024080e
>> <VecMAXPY_Seq+0x2fe>
>>
>
> I agree that is what we should see. It cannot be what Fillippo has if he
is
> getting ~4x with the template stuff.

I'm using gcc.  Fillippo, can you make an easy to run test that we can
evaluate on Xeon and KNL?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170405/bd10fce3/attachment-0001.html>