<div dir="ltr">Also, cache effects. 20M DoFs on one core/thread is huge.<div>37x on assembly is probably cache effects.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 11, 2022 at 1:09 PM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Mon, Jul 11, 2022 at 10:34 AM Ce Qin <<a href="mailto:qince168@gmail.com" target="_blank">qince168@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">Dear all,<div><br></div><div>I want to analyze the strong scaling of our in-house FEM code.</div><div>The test problem has about 20M DoFs. I ran the problem using</div><div>various settings. The speedups for the assembly and solving</div><div>procedures are as follows:</div><div><div><font face="monospace"> Assembly Solving</font></div><div><font face="monospace">NProcessors NNodes CoresPerNode </font></div><div><font face="monospace">1 1 1 1.0 1.0</font></div><div><font face="monospace">2 1 2 1.995246 1.898756</font></div><div><font face="monospace"> 2 1 2.121401 2.436149</font></div><div><font face="monospace">4 1 4 4.658187 6.004539</font></div><div><font face="monospace"> 2 2 4.666667 5.942085</font></div><div><font face="monospace"> 4 1 4.65272 6.101214</font></div><div><font face="monospace">8 2 4 9.380985 16.581135</font></div><div><font face="monospace"> 4 2 9.308575 17.258891</font></div><div><font face="monospace"> 8 1 9.314449 17.380612</font></div><div><font face="monospace">16 2 8 18.575953 34.483058</font></div><div><font face="monospace"> 4 4 18.745129 34.854409</font></div><div><font face="monospace"> 8 2 18.828393 36.45509</font></div><div><font face="monospace">32 4 8 37.140626 70.175879</font></div><div><font face="monospace"> 8 4 37.166421 71.533865</font></div></div><div><font face="monospace"><br></font></div><div><font face="arial, sans-serif">I don't quite understand this result. Why we can achieve</font><font face="arial, sans-serif"> a speedup of</font></div><div><font face="arial, sans-serif">about </font><span style="font-family:arial,sans-serif">70+ using 32 processors? Could you please help me explain this?</span></div></div></div></div></blockquote><div><br></div><div>We need more data. I would start with the number of iterates that the solver</div><div>executes. I suspect this is changing. However, it can be more complicated.</div><div>For example, a Block-Jacobi preconditioner gets cheaper as the number of</div><div>subdomains increases. Thus we need to know exactly what the solver is doing.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><font face="arial, sans-serif">Thank you in advance.</font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">Best,</font></div><div><font face="arial, sans-serif">Ce</font></div><div><font face="monospace"><br></font></div><div><font face="monospace"><br></font></div></div></div></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div></div>
</blockquote></div>