<div dir="ltr">As Matt alluded to, the blocks get smaller and cheaper. That and cache effects could account for all of this superlinear speedup.<div>If convergence rate does not deteriorate with an increased number of subdomains then you want to decouple the number of solver subdomains from your domain decomposition.</div><div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 12, 2022 at 7:08 AM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Tue, Jul 12, 2022 at 1:50 AM Ce Qin <<a href="mailto:qince168@gmail.com" target="_blank">qince168@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks for your quick response.<div><br></div><div>The linear system is complex-valued. We rewrite it into its real form</div><div>and solve it using FGMRES and an optimal block-diagonal preconditioner. </div><div>We use CG and the AMS preconditioner implemented in HYPRE to solve the</div><div>smaller real linear system arised from applying the block preconditioner.</div><div>The iteration number of FGMRES and CG keep almost constant in all the runs.</div></div></blockquote><div><br></div><div>So those blocks decrease in size as you add more processes?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Each node is equipped with a 64-core CPU and 128 GB of memory.</div><div>The matrix-vector production is memory-bandwidth limited. Is this strange behavior</div><div>related to memory bandwidth?</div></div></blockquote><div><br></div><div>I don't see how.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Best,</div><div>Ce</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> 于2022年7月12日周二 04:04写道:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Also, cache effects. 20M DoFs on one core/thread is huge.<div>37x on assembly is probably cache effects.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 11, 2022 at 1:09 PM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Mon, Jul 11, 2022 at 10:34 AM Ce Qin <<a href="mailto:qince168@gmail.com" target="_blank">qince168@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">Dear all,<div><br></div><div>I want to analyze the strong scaling of our in-house FEM code.</div><div>The test problem has about 20M DoFs. I ran the problem using</div><div>various settings. The speedups for the assembly and solving</div><div>procedures are as follows:</div><div><div><font face="monospace"> Assembly Solving</font></div><div><font face="monospace">NProcessors NNodes CoresPerNode </font></div><div><font face="monospace">1 1 1 1.0 1.0</font></div><div><font face="monospace">2 1 2 1.995246 1.898756</font></div><div><font face="monospace"> 2 1 2.121401 2.436149</font></div><div><font face="monospace">4 1 4 4.658187 6.004539</font></div><div><font face="monospace"> 2 2 4.666667 5.942085</font></div><div><font face="monospace"> 4 1 4.65272 6.101214</font></div><div><font face="monospace">8 2 4 9.380985 16.581135</font></div><div><font face="monospace"> 4 2 9.308575 17.258891</font></div><div><font face="monospace"> 8 1 9.314449 17.380612</font></div><div><font face="monospace">16 2 8 18.575953 34.483058</font></div><div><font face="monospace"> 4 4 18.745129 34.854409</font></div><div><font face="monospace"> 8 2 18.828393 36.45509</font></div><div><font face="monospace">32 4 8 37.140626 70.175879</font></div><div><font face="monospace"> 8 4 37.166421 71.533865</font></div></div><div><font face="monospace"><br></font></div><div><font face="arial, sans-serif">I don't quite understand this result. Why we can achieve</font><font face="arial, sans-serif"> a speedup of</font></div><div><font face="arial, sans-serif">about </font><span style="font-family:arial,sans-serif">70+ using 32 processors? Could you please help me explain this?</span></div></div></div></div></blockquote><div><br></div><div>We need more data. I would start with the number of iterates that the solver</div><div>executes. I suspect this is changing. However, it can be more complicated.</div><div>For example, a Block-Jacobi preconditioner gets cheaper as the number of</div><div>subdomains increases. Thus we need to know exactly what the solver is doing.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><font face="arial, sans-serif">Thank you in advance.</font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">Best,</font></div><div><font face="arial, sans-serif">Ce</font></div><div><font face="monospace"><br></font></div><div><font face="monospace"><br></font></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div>