<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br></blockquote><div><br></div></span><div>The why is "We need to run at bandwidth peak on new arches". I do not prescribe the How, just ask for it.</div><div><br></div></div></div></div></blockquote><div><br></div><div>Be careful about specifying an optimization parameter unless it is really what you want.  eg, maximising arithmetic intensity will lead you to Cayley–Hamilton inversion and minimizing ("avoiding") network communication can lead to some funny algorithms if you are not careful. </div><div><br></div><div>Now, I suspect that maximizing bandwidth might not be gameable because that is where Sam Williams ends up -- with large memory -- with HPGMG-FV, which is matrix-free, 4th order accurate, finite volume, multigrid solver of the 3D Laplacian (non-constant coefficient). (While I hesitate to use one person's experience as an implied "proof", Sam is very thorough and honest.)  And keeping the memory bus saturated in the next 10 years may not be achievable even for Sam.</div><div><br></div><div>But, in the space of equation solvers that are emerging architecture friendly, you are latency constrained in practical -- not large memory -- regimes.  Though, large memory is a good place to start, a baseline, and easier to think about and achieve. Still, I wince when I see a goal that only implies good performance.<br><br></div></div></div></div>