<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="color:rgb(103,78,167)">Here is the plot of run time in old and new petsc using 1,2,4,8, and 16 CPUs (in logarithmic scale):</div><div class="gmail_default" style="color:rgb(103,78,167)"><br></div><div class="gmail_default" style="color:rgb(103,78,167)"><img src="cid:ii_kmtgir050" alt="Screenshot from 2021-03-28 10-48-56.png" width="566" height="321"><br><br></div><div class="gmail_default" style="color:rgb(103,78,167)"><br></div><div class="gmail_default" style="color:rgb(103,78,167)"><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 25, 2021 at 12:51 PM Mohammad Gohardoust <<a href="mailto:gohardoust@gmail.com">gohardoust@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="color:rgb(103,78,167)">That's right, these loops also take roughly half time as well. If I am not mistaken, petsc (MatSetValue) is called after doing some calculations over each tetrahedral element.</div><div class="gmail_default" style="color:rgb(103,78,167)">Thanks for your suggestion. I will try that and will post the results.</div><div class="gmail_default" style="color:rgb(103,78,167)"><br></div><div class="gmail_default" style="color:rgb(103,78,167)">Mohammad<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 24, 2021 at 3:23 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><br></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 24, 2021 at 2:17 AM Mohammad Gohardoust <<a href="mailto:gohardoust@gmail.com" target="_blank">gohardoust@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="color:rgb(103,78,167)">So the code itself is a finite-element scheme and in stage 1 and 3 there are expensive loops over entire mesh elements which consume a lot of time.</div></div></blockquote><div>So these expensive loops must also take half time with newer petsc? And these loops do not call petsc routines?</div><div>I think you can build two PETSc versions with the same configuration options, then run your code with one MPI rank to see if there is a difference. </div><div>If they give the same performance, then scale to 2, 4, ... ranks and see what happens.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="color:rgb(103,78,167)"><br></div><div style="color:rgb(103,78,167)">Mohammad<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 23, 2021 at 6:08 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">In the new log, I saw<div><br><div><pre style="font-family:"Courier New",Courier,monospace,arial,sans-serif;margin-top:0px;margin-bottom:0px;white-space:pre-wrap;font-size:14px"><pre style="font-family:"Courier New",Courier,monospace,arial,sans-serif;margin-top:0px;margin-bottom:0px;white-space:pre-wrap"><font color="#000000">Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 5.4095e+00 2.3% 4.3700e+03 0.0% 4.764e+05 3.0% 3.135e+02 1.0% 2.244e+04 12.6%
</font><font color="#ff0000"> 1: Solute_Assembly: 1.3977e+02 </font><font color="#000000"> 59.4% 7.3353e+09 4.6% 3.263e+06 20.7% 1.278e+03 26.9% 1.059e+04 6.0%</font></pre></pre><pre style="font-family:"Courier New",Courier,monospace,arial,sans-serif;margin-top:0px;margin-bottom:0px;white-space:pre-wrap;color:rgb(0,0,0);font-size:14px"><br></pre>But I didn't see any event in this stage had a cost close to 140s. What happened?<pre style="font-family:"Courier New",Courier,monospace,arial,sans-serif;margin-top:0px;margin-bottom:0px;white-space:pre-wrap;color:rgb(0,0,0);font-size:14px"> </pre><pre style="font-family:"Courier New",Courier,monospace,arial,sans-serif;margin-top:0px;margin-bottom:0px;white-space:pre-wrap;color:rgb(0,0,0);font-size:14px"><pre style="font-family:"Courier New",Courier,monospace,arial,sans-serif;margin-top:0px;margin-bottom:0px;white-space:pre-wrap">--- Event Stage 1: Solute_Assembly
BuildTwoSided 3531 1.0 2.8025e+0026.3 0.00e+00 0.0 3.6e+05 4.0e+00 3.5e+03 1 0 2 0 2 1 0 11 0 33 0
BuildTwoSidedF 3531 1.0 2.8678e+0013.2 0.00e+00 0.0 7.1e+05 3.6e+03 3.5e+03 1 0 5 17 2 1 0 22 62 33 0
VecScatterBegin 7062 1.0 7.1911e-02 1.9 0.00e+00 0.0 7.1e+05 3.5e+02 0.0e+00 0 0 5 2 0 0 0 22 6 0 0
VecScatterEnd 7062 1.0 2.1248e-01 3.0 1.60e+06 2.7 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 73
SFBcastOpBegin 3531 1.0 2.6516e-02 2.4 0.00e+00 0.0 3.6e+05 3.5e+02 0.0e+00 0 0 2 1 0 0 0 11 3 0 0
SFBcastOpEnd 3531 1.0 9.5041e-02 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 3531 1.0 3.8955e-02 2.1 0.00e+00 0.0 3.6e+05 3.5e+02 0.0e+00 0 0 2 1 0 0 0 11 3 0 0
SFReduceEnd 3531 1.0 1.3791e-01 3.9 1.60e+06 2.7 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 112
SFPack 7062 1.0 6.5591e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFUnpack 7062 1.0 7.4186e-03 2.1 1.60e+06 2.7 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2080
MatAssemblyBegin 3531 1.0 4.7846e+00 1.1 0.00e+00 0.0 7.1e+05 3.6e+03 3.5e+03 2 0 5 17 2 3 0 22 62 33 0
MatAssemblyEnd 3531 1.0 1.5468e+00 2.7 1.68e+07 2.7 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 2 0 0 0 104
MatZeroEntries 3531 1.0 3.0998e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0</pre></pre><pre style="font-family:"Courier New",Courier,monospace,arial,sans-serif;margin-top:0px;margin-bottom:0px;white-space:pre-wrap;color:rgb(0,0,0);font-size:14px"><br></pre><pre style="font-family:"Courier New",Courier,monospace,arial,sans-serif;margin-top:0px;margin-bottom:0px;white-space:pre-wrap;color:rgb(0,0,0);font-size:14px"><span style="font-family:Arial,Helvetica,sans-serif;font-size:small;color:rgb(34,34,34)">--Junchao Zhang</span><br></pre><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 23, 2021 at 5:24 PM Mohammad Gohardoust <<a href="mailto:gohardoust@gmail.com" target="_blank">gohardoust@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="color:rgb(103,78,167)">Thanks Dave for your reply.</div><div style="color:rgb(103,78,167)"><br></div><div style="color:rgb(103,78,167)">For sure PETSc is awesome :D<br></div><div style="color:rgb(103,78,167)"><br></div><div style="color:rgb(103,78,167)">Yes, in both cases petsc was configured with --with-debugging=0 and fortunately I do have the old and new -log-veiw outputs which I attached.</div><div style="color:rgb(103,78,167)"><br></div><div style="color:rgb(103,78,167)">Best,</div><div style="color:rgb(103,78,167)">Mohammad<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 23, 2021 at 1:37 AM Dave May <<a href="mailto:dave.mayhem23@gmail.com" target="_blank">dave.mayhem23@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div dir="auto">Nice to hear! </div><div dir="auto">The answer is simple, PETSc is awesome :)</div></div><div dir="auto"><br></div><div dir="auto">Jokes aside, assuming both petsc builds were configured with —with-debugging=0, I don’t think there is a definitive answer to your question with the information you provided.</div><div dir="auto"><br></div><div dir="auto">It could be as simple as one specific implementation you use was improved between petsc releases. Not being an Ubuntu expert, the change might be associated with using a different compiler, and or a more efficient BLAS implementation (non threaded vs threaded). However I doubt this is the origin of your 2x performance increase. </div><div dir="auto"><br></div><div dir="auto">If you really want to understand where the performance improvement originated from, you’d need to send to the email list the result of -log_view from both the old and new versions, running the exact same problem.</div><div dir="auto"><br></div><div dir="auto">>From that info, we can see what implementations in PETSc are being used and where the time reduction is occurring. Knowing that, it should be clearer to provide an explanation for it.</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">Thanks,</div><div dir="auto">Dave</div></div><div><div dir="auto"><br></div><div><br><div class="gmail_quote"><div dir="ltr">On Tue 23. Mar 2021 at 06:24, Mohammad Gohardoust <<a href="mailto:gohardoust@gmail.com" target="_blank">gohardoust@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="color:rgb(103,78,167)">Hi,</div><div style="color:rgb(103,78,167)"><br></div><div style="color:rgb(103,78,167)">I am using a code which is based on petsc (and also parmetis). Recently I made the following changes and now the code is running about two times faster than before:</div><div style="color:rgb(103,78,167)"><ul><li>Upgraded Ubuntu 18.04 to 20.04</li><li>Upgraded petsc 3.13.4 to 3.14.5</li><li>This time I installed parmetis and metis directly via petsc by --download-parmetis --download-metis flags instead of installing them separately and using --with-parmetis-include=... and --with-parmetis-lib=... (the version of installed parmetis was 4.0.3 before)<br></li></ul><div>I was wondering what can possibly explain this speedup? Does anyone have any suggestions?<br></div><div><br></div><div>Thanks,<br></div><div>Mohammad<br></div></div></div>
</blockquote></div></div>
</div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div></div>
</blockquote></div>
</blockquote></div>