<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 29, 2016 at 8:18 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class=""><br>

> On Jun 29, 2016, at 10:06 PM, Jeff Hammond <<a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a>> wrote:<br>

><br>

><br>

><br>

> On Wednesday, June 29, 2016, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>

><br>

>    Who are these people and why to they have this webpage?<br>

><br>

><br>

> Pop up 2-3 directories and you'll see this is a grad student who appears to be trying to learn applied math. Is this really your enemy? Don't you guys have some DOE bigwigs to bash?<br>

><br>

>     Almost for sure they are doing no process binding and no proper assignment of processes to memory domains.<br>

><br>

><br>

> MVAPICH2 sets affinity by default. Details not given but "infiniband enabled" means it might have been used. I don't know what OpenMPI does by default but affinity alone doesn't explain this.<br>

<br>

</span>  By affinity you mean that the process just remains on the same core right? You could be right I think the main affect is a bad assignment of processes to cores/memory domains.<br>

<span class=""><br></span></blockquote><div><br></div><div>Yes, affinity to cores.</div><div><br></div><div>I checked and:</div><div>- Open-MPI does no binding by default (<a href="https://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.4">https://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.4</a>).</div><div>- MVAPICH2 sets affinity by default except when MPI_THREAD_MULTIPLE is used (<a href="http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.0-userguide.pdf">http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.0-userguide.pdf</a>).</div><div>- I am not certain what Intel MPI does in every case, but at least on Xeon Phi it defaults to compact placement (<a href="https://software.intel.com/en-us/articles/mpi-and-process-pinning-on-xeon-phi">https://software.intel.com/en-us/articles/mpi-and-process-pinning-on-xeon-phi</a>), which is almost certainly wrong for bandwidth-limited apps (where scatter makes more sense).</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class="">

><br>

>  In addition they are likely filling up all the cores on the first node before adding processes to the second core etc.<br>

><br>

><br>

> That's how I would show scaling. Are you suggesting using all the nodes and doing breadth first placement?<br>

<br>

</span>   I would fill up one process per memory domain moving across the nodes; then go back and start a second process on each memory domain. etc You can also just go across nodes as you suggest and then across memory domains<br>

<br></blockquote><div><br></div><div>That's reasonable.  I just don't bother showing scaling except in the unit of charge, which in most cases is nodes (exception: Blue Gene).  There is no way to decompose node resources in a completely reliable way, so one should always use the full node as effectively as possible for every node count.</div><div><br></div><div>The other exception is the cloud, there hypervisors are presumably doing a halfway decent job of dividing up resources (and adding enough overhead that performance is irrelevant anyways :-) ) and one can plot scaling in the number of (virtual) cores.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

   If you fill up the entire node of cores and then go to the next node you get this affect that the performance goes way down as you fill up the last of the cores (because no more memory bandwidth is available) and then performance goes up again as you jump to the next node and suddenly have a big chunk of additional bandwidth. You also have weird load balancing problem because the first 16 processes are going slow because they share some bandwidth while the 17 runs much faster since it can hog more bandwidth.<br>

<div class=""><div class="h5"><br></div></div></blockquote><div><br></div><div>Indeed, 17 on 2 should be distributed as 9 and 8, not 16 and 1, although using nproc%nnode!=0 is silly.  I thought you meant scaling up to 20 with 1 ppn on 20 nodes, then going to 40 with 2 ppn, etc.</div><div><br></div><div>Jeff</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div class=""><div class="h5">

><br>

> Jeff<br>

><br>

> If the studies had been done properly there should be very little fail off on the strong scaling in going from 1 to 2 to 4 processes and even beyond. Similarly the huge fail off in going from 4 to 8 to 16 would not occur for weak scaling.<br>

><br>

>    Barry<br>

><br>

><br>

> > On Jun 29, 2016, at 7:47 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br>

> ><br>

> ><br>

> ><br>

> >   <a href="http://guest.ams.sunysb.edu/~zgao/work/airfoil/scaling.html" rel="noreferrer" target="_blank">http://guest.ams.sunysb.edu/~zgao/work/airfoil/scaling.html</a><br>

> ><br>

> > Can we rerun this on something at ANL since I think this cannot be true.<br>

> ><br>

> >    Matt<br>

> ><br>

> > --<br>

> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

> > -- Norbert Wiener<br>

><br>

><br>

><br>

> --<br>

> Jeff Hammond<br>

> <a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a><br>

> <a href="http://jeffhammond.github.io/" rel="noreferrer" target="_blank">http://jeffhammond.github.io/</a><br>

<br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>

</div></div>