<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Thanks, Jed,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 9, 2020 at 3:19 PM Jed Brown <<a href="mailto:jed@jedbrown.org">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Fande Kong <<a href="mailto:fdkong.jd@gmail.com" target="_blank">fdkong.jd@gmail.com</a>> writes:<br>

<br>

> Hi All,<br>

><br>

> I am trying to interpret the results from "make stream" on two compute<br>

> nodes, where each node has 48 cores.<br>

><br>

> If my calculations are memory bandwidth limited, such as AMG, MatVec,<br>

> GMRES, etc..<br>

<br>

There's a lot more to AMG setup than memory bandwidth (architecture<br>

matters a lot, even between different generation CPUs). </blockquote><div><br></div><div>Could you elaborate a bit more on this? From my understanding, one big part of AMG SetUp is RAP that should be pretty much bandwidth.</div><div><br></div><div>So the graph coarsening part is affected by architechniques? </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> MatMult and<br>

Krylov are almost pure bandwidth.<br>

<br>

> The best speedup I could get is 16.6938 if I start from one core?? The<br>

> speedup for function evaluations and Jacobian evaluations can be better<br>

> than16.6938?<br>

<br>

Residual and Jacobians can be faster, especially if your code is slow<br>

(poorly vectorized, branchy, or has a lot of arithmetic).<br></blockquote><div><br></div><div>It will be branchy when we handle complicated mutphyics.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

Are you trying to understand perf on current hardware or make decisions<br>

about new hardware?<br></blockquote><div><br></div><div>The nodes are INL supercomputer nodes. I am trying to understand what could be the best speedup I could get when running moose/petsc on that machine for the linear algebra part.</div><div><br></div><div><br></div><div>Thanks,</div><div><br></div><div>Fande,</div><div> </div></div></div></div></div>