<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Apr 4, 2017 at 10:25 AM, Filippo Leonardi <span dir="ltr"><<a href="mailto:filippo.leon@gmail.com" target="_blank">filippo.leon@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>I really appreciate the feedback. Thanks.</div><div><br></div><div><span style="line-height:15.6px">That of deadlock, when the order of destruction is not preserved, is a point I hadn't thought of. Maybe it can be cleverly addressed.</span></div><div><br></div>PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for a single processor, I obtain:<div><br></div><div>Example for MAXPY, with expression templates:<div><p style="margin:0px">BM_Vector_petscxx_MAXPY/8 38 ns 38 ns 18369805</p>
<p style="margin:0px">BM_Vector_petscxx_MAXPY/64 622 ns 622 ns 1364335</p>
<p style="margin:0px">BM_Vector_petscxx_MAXPY/512 281 ns 281 ns 2477718</p>
<p style="margin:0px">BM_Vector_petscxx_MAXPY/4096 2046 ns 2046 ns 349954</p>
<p style="margin:0px">BM_Vector_petscxx_MAXPY/32768 18012 ns 18012 ns 38788</p>
<p style="margin:0px">BM_Vector_petscxx_MAXPY_BigO 0.55 N 0.55 N </p>
<p style="margin:0px">BM_Vector_petscxx_MAXPY_RMS 7 % 7 % </p><p style="margin:0px">Direct call to MAXPY:</p>
<p style="margin:0px">BM_Vector_PETSc_MAXPY/8 33 ns 33 ns 20973674</p>
<p style="margin:0px">BM_Vector_PETSc_MAXPY/64 116 ns 116 ns 5992878</p>
<p style="margin:0px">BM_Vector_PETSc_MAXPY/512 731 ns 731 ns 963340</p>
<p style="margin:0px">BM_Vector_PETSc_MAXPY/4096 5739 ns 5739 ns 122414</p>
<p style="margin:0px">BM_Vector_PETSc_MAXPY/32768 46346 ns 46346 ns 15312</p>
<p style="margin:0px">BM_Vector_PETSc_MAXPY_BigO 1.41 N 1.41 N </p>
<p style="margin:0px">BM_Vector_PETSc_MAXPY_RMS 0 % 0 % </p><p style="margin:0px"><br></p><p style="margin:0px">And 3x speedup on 2 MPI ranks (not much communication here, anyway). I am now convinced that this warrants some further investigation/testing.</p></div></div></div></blockquote><div><br></div><div>1) There is no communication here, so 3x for 2 ranks means the result is not believable</div><div><br></div><div>2) I do not understand what the two cases are. Do you mean that expression templates vectorize the</div><div> scalar work? I am surprised that BLAS is so bad, but I guess not completely surprised. What BLAS?</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_quote"><div dir="ltr">On Tue, 4 Apr 2017 at 01:08 Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Matthew Knepley <<a href="mailto:knepley@gmail.com" class="m_8549240514152078980gmail_msg" target="_blank">knepley@gmail.com</a>> writes:<br class="m_8549240514152078980gmail_msg">
<br class="m_8549240514152078980gmail_msg">
>> BLAS. (Here a interesting point opens: I assume an efficient BLAS<br class="m_8549240514152078980gmail_msg">
>><br class="m_8549240514152078980gmail_msg">
>> implementation, but I am not so sure about how the different BLAS do<br class="m_8549240514152078980gmail_msg">
>> things<br class="m_8549240514152078980gmail_msg">
>><br class="m_8549240514152078980gmail_msg">
>> internally. I work from the assumption that we have a very well tuned BLAS<br class="m_8549240514152078980gmail_msg">
>><br class="m_8549240514152078980gmail_msg">
>> implementation at our disposal).<br class="m_8549240514152078980gmail_msg">
>><br class="m_8549240514152078980gmail_msg">
><br class="m_8549240514152078980gmail_msg">
> The speed improvement comes from pulling vectors through memory fewer<br class="m_8549240514152078980gmail_msg">
> times by merging operations (kernel fusion).<br class="m_8549240514152078980gmail_msg">
<br class="m_8549240514152078980gmail_msg">
Typical examples are VecMAXPY and VecMDot, but note that these are not<br class="m_8549240514152078980gmail_msg">
xGEMV because the vectors are independent arrays rather than single<br class="m_8549240514152078980gmail_msg">
arrays with a constant leading dimension.<br class="m_8549240514152078980gmail_msg">
<br class="m_8549240514152078980gmail_msg">
>> call VecGetArray. However I will inevitably foget to return the array to<br class="m_8549240514152078980gmail_msg">
>><br class="m_8549240514152078980gmail_msg">
>> PETSc. I could have my new VecArray returning an object that restores the<br class="m_8549240514152078980gmail_msg">
>><br class="m_8549240514152078980gmail_msg">
>> array<br class="m_8549240514152078980gmail_msg">
>><br class="m_8549240514152078980gmail_msg">
>> when it goes out of scope. I can also flag the function with [[nodiscard]]<br class="m_8549240514152078980gmail_msg">
>> to<br class="m_8549240514152078980gmail_msg">
>><br class="m_8549240514152078980gmail_msg">
>> prevent the user to destroy the returned object from the start.<br class="m_8549240514152078980gmail_msg">
>><br class="m_8549240514152078980gmail_msg">
><br class="m_8549240514152078980gmail_msg">
> Jed claims that this pattern is no longer preferred, but I have forgotten<br class="m_8549240514152078980gmail_msg">
> his argument.<br class="m_8549240514152078980gmail_msg">
> Jed?<br class="m_8549240514152078980gmail_msg">
<br class="m_8549240514152078980gmail_msg">
Destruction order matters and needs to be collective. If an error<br class="m_8549240514152078980gmail_msg">
condition causes destruction to occur in a different order on different<br class="m_8549240514152078980gmail_msg">
processes, you can get deadlock. I would much rather have an error<br class="m_8549240514152078980gmail_msg">
leave some resources (for the OS to collect) than escalate into<br class="m_8549240514152078980gmail_msg">
deadlock.<br class="m_8549240514152078980gmail_msg">
<br class="m_8549240514152078980gmail_msg">
> We have had this discussion for years on this list. Having separate names<br class="m_8549240514152078980gmail_msg">
> for each type<br class="m_8549240514152078980gmail_msg">
> is really ugly and does not achieve what we want. We want smooth<br class="m_8549240514152078980gmail_msg">
> interoperability between<br class="m_8549240514152078980gmail_msg">
> objects with different backing types, but it is still not clear how to do<br class="m_8549240514152078980gmail_msg">
> this.<br class="m_8549240514152078980gmail_msg">
<br class="m_8549240514152078980gmail_msg">
Hide it internally and implicitly promote. Only the *GetArray functions<br class="m_8549240514152078980gmail_msg">
need to be parametrized on numeric type. But it's a lot of work on the<br class="m_8549240514152078980gmail_msg">
backend.<br class="m_8549240514152078980gmail_msg">
</blockquote></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>
</div></div>