<div dir="ltr">Justin, this is interesting data. It would be nice to take the normalization out because GAMG could just be running very slow on one processor, which makes scaling look better. hypre's algorithm is pretty good on scalar Poisson, low order discretization (and more). Hypre and GAMG should be with say 50% of each other one one processor and hypre should be faster, if both are optimized well (which might not be practical here, but just my 2c).<div><br></div><div>Taking the normalization out of the plot is annoying because then you have to have multiple "perfect" lines.  Another way to do this is do what we (Jed) have been calling "dynamic range" instead of "strong scaling" where you decrease the problem size instead of increasing the amount of parallelism.  So all data uses the same number of processors. This is just a different way to look at "strong scaling", in the general sense, and is neither better or worse. But, and advantage of this is that you can plot "equations solved"/second vs Solve time. Now perfect scaling is a flat line so there is no need to draw (multiple) diagonal lines for perfect scaling. The height of the lines (dof/sec) is raw performance. It takes a second to get your head wrapped around these dynamic range plots but I like them.</div><div><br></div><div>I've attached a talk that goes through these plots and explains how to think about them, etc. (see the plot slides in the middle of the slides)</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 30, 2016 at 9:40 AM, Justin Chang <span dir="ltr"><<a href="mailto:jychang48@gmail.com" target="_blank">jychang48@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">That guy's results actually make sense to me.<div><br></div><div>I also get poor strong-scaling for the FEM version of the poisson equation (via firedrake) using HYPRE's boomerAMG. The studies were done on Intel E5-2670 machines and had proper OpenMPI bindings.No HYPRE configure options were set via command line so I just used whatever the default setting were.</div><div><br></div><div>If he used ML, GAMG, or even ILU he would likely get much better scaling as I have.</div><div><br></div><div>Attached is a speedup plot of a much smaller problem I did (225k dofs), but you can still see a similar progression on how HYPRE deteriorates.</div><div><br></div><div>Compared to the other preconditioners, I noticed that HYPRE has a much lower flop-to-byte ratio which suggests to me that based on the current solver configurations, HYPRE is likely going to be more memory-bandwidth and suffer from lack of memory usage as more cores are used.</div><div><br></div><div>Not sure how to properly configure any of these multigrid preconditioners, but figured I'd offer my two cents.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Justin</div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 30, 2016 at 5:18 AM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><br>

> On Jun 29, 2016, at 10:06 PM, Jeff Hammond <<a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a>> wrote:<br>

><br>

><br>

><br>

> On Wednesday, June 29, 2016, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>> wrote:<br>

><br>

>    Who are these people and why to they have this webpage?<br>

><br>

><br>

> Pop up 2-3 directories and you'll see this is a grad student who appears to be trying to learn applied math. Is this really your enemy? Don't you guys have some DOE bigwigs to bash?<br>

><br>

>     Almost for sure they are doing no process binding and no proper assignment of processes to memory domains.<br>

><br>

><br>

> MVAPICH2 sets affinity by default. Details not given but "infiniband enabled" means it might have been used. I don't know what OpenMPI does by default but affinity alone doesn't explain this.<br>

<br>

</span>  By affinity you mean that the process just remains on the same core right? You could be right I think the main affect is a bad assignment of processes to cores/memory domains.<br>

<span><br>

><br>

>  In addition they are likely filling up all the cores on the first node before adding processes to the second core etc.<br>

><br>

><br>

> That's how I would show scaling. Are you suggesting using all the nodes and doing breadth first placement?<br>

<br>

</span>   I would fill up one process per memory domain moving across the nodes; then go back and start a second process on each memory domain. etc You can also just go across nodes as you suggest and then across memory domains<br>

<br>

   If you fill up the entire node of cores and then go to the next node you get this affect that the performance goes way down as you fill up the last of the cores (because no more memory bandwidth is available) and then performance goes up again as you jump to the next node and suddenly have a big chunk of additional bandwidth. You also have weird load balancing problem because the first 16 processes are going slow because they share some bandwidth while the 17 runs much faster since it can hog more bandwidth.<br>

<div><div><br>

><br>

> Jeff<br>

><br>

> If the studies had been done properly there should be very little fail off on the strong scaling in going from 1 to 2 to 4 processes and even beyond. Similarly the huge fail off in going from 4 to 8 to 16 would not occur for weak scaling.<br>

><br>

>    Barry<br>

><br>

><br>

> > On Jun 29, 2016, at 7:47 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>

> ><br>

> ><br>

> ><br>

> >   <a href="http://guest.ams.sunysb.edu/~zgao/work/airfoil/scaling.html" rel="noreferrer" target="_blank">http://guest.ams.sunysb.edu/~zgao/work/airfoil/scaling.html</a><br>

> ><br>

> > Can we rerun this on something at ANL since I think this cannot be true.<br>

> ><br>

> >    Matt<br>

> ><br>

> > --<br>

> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

> > -- Norbert Wiener<br>

><br>

><br>

><br>

> --<br>

> Jeff Hammond<br>

> <a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>

> <a href="http://jeffhammond.github.io/" rel="noreferrer" target="_blank">http://jeffhammond.github.io/</a><br>

<br>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>