[petsc-dev] WTF

Thu Jun 30 08:30:45 CDT 2016

On Wed, Jun 29, 2016 at 8:18 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Jun 29, 2016, at 10:06 PM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
> >
> >
> >
> > On Wednesday, June 29, 2016, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >    Who are these people and why to they have this webpage?
> >
> >
> > Pop up 2-3 directories and you'll see this is a grad student who appears
> to be trying to learn applied math. Is this really your enemy? Don't you
> guys have some DOE bigwigs to bash?
> >
> >     Almost for sure they are doing no process binding and no proper
> assignment of processes to memory domains.
> >
> >
> > MVAPICH2 sets affinity by default. Details not given but "infiniband
> enabled" means it might have been used. I don't know what OpenMPI does by
> default but affinity alone doesn't explain this.
>
>   By affinity you mean that the process just remains on the same core
> right? You could be right I think the main affect is a bad assignment of
> processes to cores/memory domains.
>
>
Yes, affinity to cores.

I checked and:
- Open-MPI does no binding by default (
https://www.open-mpi.org/faq/?category=tuning#using-paffinity-v1.4).
- MVAPICH2 sets affinity by default except when MPI_THREAD_MULTIPLE is used
(
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.0-userguide.pdf
).
- I am not certain what Intel MPI does in every case, but at least on Xeon
Phi it defaults to compact placement (
https://software.intel.com/en-us/articles/mpi-and-process-pinning-on-xeon-phi),
which is almost certainly wrong for bandwidth-limited apps (where scatter
makes more sense).

> >
> >  In addition they are likely filling up all the cores on the first node
> before adding processes to the second core etc.
> >
> >
> > That's how I would show scaling. Are you suggesting using all the nodes
> and doing breadth first placement?
>
>    I would fill up one process per memory domain moving across the nodes;
> then go back and start a second process on each memory domain. etc You can
> also just go across nodes as you suggest and then across memory domains
>
>
That's reasonable.  I just don't bother showing scaling except in the unit
of charge, which in most cases is nodes (exception: Blue Gene).  There is
no way to decompose node resources in a completely reliable way, so one
should always use the full node as effectively as possible for every node
count.

The other exception is the cloud, there hypervisors are presumably doing a
halfway decent job of dividing up resources (and adding enough overhead
that performance is irrelevant anyways :-) ) and one can plot scaling in
the number of (virtual) cores.

>    If you fill up the entire node of cores and then go to the next node
> you get this affect that the performance goes way down as you fill up the
> last of the cores (because no more memory bandwidth is available) and then
> performance goes up again as you jump to the next node and suddenly have a
> big chunk of additional bandwidth. You also have weird load balancing
> problem because the first 16 processes are going slow because they share
> some bandwidth while the 17 runs much faster since it can hog more
> bandwidth.
>
>
Indeed, 17 on 2 should be distributed as 9 and 8, not 16 and 1, although
using nproc%nnode!=0 is silly.  I thought you meant scaling up to 20 with 1
ppn on 20 nodes, then going to 40 with 2 ppn, etc.

Jeff

> >
> > Jeff
> >
> > If the studies had been done properly there should be very little fail
> off on the strong scaling in going from 1 to 2 to 4 processes and even
> beyond. Similarly the huge fail off in going from 4 to 8 to 16 would not
> occur for weak scaling.
> >
> >    Barry
> >
> >
> > > On Jun 29, 2016, at 7:47 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > >
> > >
> > >
> > >   http://guest.ams.sunysb.edu/~zgao/work/airfoil/scaling.html
> > >
> > > Can we rerun this on something at ANL since I think this cannot be
> true.
> > >
> > >    Matt
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > > -- Norbert Wiener
> >
> >
> >
> > --
> > Jeff Hammond
> > jeff.science at gmail.com
> > http://jeffhammond.github.io/
>
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20160630/2b3d682e/attachment.html>