<div dir="ltr">Hello, Michael, <div> Sorry for the delay. I am actively doing experiments with your example code. I tested it on a cluster with 36 cores/node. To distribute MPI ranks evenly among nodes, I used 216 and 1728 ranks instead of 125, 1000. So far I have these findings:<br></div><div> 1) It is not a strict weak scaling test since with <span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">1728 ranks it needs more KPS iterations, and more calls to MatSOR etc functions.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> 2) If I use half cores per node but double the nodes (keep MPI ranks the same), the performance is 60~70% better. It implies memory bandwidth plays an important role in performance.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> 3) I find you define the outermost two layers of nodes of the grid as boundary. Boundary processors have less nonzeros than interior processors. It is a source of load imbalance. At coarser grids, it gets worse. But I need to confirm this caused the poor scaling and big vecscatter delays in the experiment. </span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> Thanks.</span></div><div><span style="font-size:small;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div>
<br><div class="gmail_quote">On Tue, Jun 12, 2018 at 12:42 AM, Michael Becker <span dir="ltr"><<a href="mailto:michael.becker@physik.uni-giessen.de" target="_blank">michael.becker@physik.uni-giessen.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<p>Hello,</p>
<p>any new insights yet?</p><span class="HOEnZb"><font color="#888888">
<p>Michael<br>
</p></font></span><span class="">
<br>
<br>
<br>
<div class="m_4694688645138750811moz-cite-prefix">Am 04.06.2018 um 21:56 schrieb Junchao
Zhang:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Miachael, I can compile and run you test. I am
now profiling it. Thanks.</div>
<div class="gmail_extra"><br clear="all">
<div>
<div class="m_4694688645138750811gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
</blockquote>
<br>
</span></div>
</blockquote></div><br></div>