[petsc-dev] WTF

Thu Jun 30 10:26:46 CDT 2016

On Thu, Jun 30, 2016 at 4:55 PM, Yang, Ulrike Meier <yang11 at llnl.gov> wrote:

> Do you know which settings were used for BoomerAMG?
>
> The default settings often don’t work well for 3D problems.
>
>
Ulrike, the default setting were not coarsening aggressively enough. The
FireDrake guys
used these which seem to work well:

  -ksp_type cg -pc_type hypre -pc_hypre_type boomeramg
-pc_hypre_boomeramg_strong_threshold 0.75 -pc_hypre_boomeramg_agg_nl 2
-ksp_rtol 1e-6

I sent that group these options to try.

  Thanks,

     Matt

> Ulrike
>
>
>
> *From:* Justin Chang [mailto:jychang48 at gmail.com]
> *Sent:* Thursday, June 30, 2016 12:40 AM
> *To:* Barry Smith <bsmith at mcs.anl.gov>
> *Cc:* Jeff Hammond <jeff.science at gmail.com>; Yang, Ulrike Meier <
> yang11 at llnl.gov>; PETSc <petsc-dev at mcs.anl.gov>; Falgout Rob (
> rfalgout at llnl.gov) <rfalgout at llnl.gov>
> *Subject:* Re: [petsc-dev] WTF
>
>
>
> That guy's results actually make sense to me.
>
>
>
> I also get poor strong-scaling for the FEM version of the poisson equation
> (via firedrake) using HYPRE's boomerAMG. The studies were done on Intel
> E5-2670 machines and had proper OpenMPI bindings.No HYPRE configure options
> were set via command line so I just used whatever the default setting were.
>
>
>
> If he used ML, GAMG, or even ILU he would likely get much better scaling
> as I have.
>
>
>
> Attached is a speedup plot of a much smaller problem I did (225k dofs),
> but you can still see a similar progression on how HYPRE deteriorates.
>
>
>
> Compared to the other preconditioners, I noticed that HYPRE has a much
> lower flop-to-byte ratio which suggests to me that based on the current
> solver configurations, HYPRE is likely going to be more memory-bandwidth
> and suffer from lack of memory usage as more cores are used.
>
>
>
> Not sure how to properly configure any of these multigrid preconditioners,
> but figured I'd offer my two cents.
>
>
>
> Justin
>
>
>
> On Thu, Jun 30, 2016 at 5:18 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>
> > On Jun 29, 2016, at 10:06 PM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
> >
> >
> >
> > On Wednesday, June 29, 2016, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >    Who are these people and why to they have this webpage?
> >
> >
> > Pop up 2-3 directories and you'll see this is a grad student who appears
> to be trying to learn applied math. Is this really your enemy? Don't you
> guys have some DOE bigwigs to bash?
> >
> >     Almost for sure they are doing no process binding and no proper
> assignment of processes to memory domains.
> >
> >
> > MVAPICH2 sets affinity by default. Details not given but "infiniband
> enabled" means it might have been used. I don't know what OpenMPI does by
> default but affinity alone doesn't explain this.
>
>   By affinity you mean that the process just remains on the same core
> right? You could be right I think the main affect is a bad assignment of
> processes to cores/memory domains.
>
> >
> >  In addition they are likely filling up all the cores on the first node
> before adding processes to the second core etc.
> >
> >
> > That's how I would show scaling. Are you suggesting using all the nodes
> and doing breadth first placement?
>
>    I would fill up one process per memory domain moving across the nodes;
> then go back and start a second process on each memory domain. etc You can
> also just go across nodes as you suggest and then across memory domains
>
>    If you fill up the entire node of cores and then go to the next node
> you get this affect that the performance goes way down as you fill up the
> last of the cores (because no more memory bandwidth is available) and then
> performance goes up again as you jump to the next node and suddenly have a
> big chunk of additional bandwidth. You also have weird load balancing
> problem because the first 16 processes are going slow because they share
> some bandwidth while the 17 runs much faster since it can hog more
> bandwidth.
>
>
> >
> > Jeff
> >
> > If the studies had been done properly there should be very little fail
> off on the strong scaling in going from 1 to 2 to 4 processes and even
> beyond. Similarly the huge fail off in going from 4 to 8 to 16 would not
> occur for weak scaling.
> >
> >    Barry
> >
> >
> > > On Jun 29, 2016, at 7:47 PM, Matthew Knepley <knepley at gmail.com>
> wrote:
> > >
> > >
> > >
> > >   http://guest.ams.sunysb.edu/~zgao/work/airfoil/scaling.html
> > >
> > > Can we rerun this on something at ANL since I think this cannot be
> true.
> > >
> > >    Matt
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > > -- Norbert Wiener
> >
> >
> >
> > --
> > Jeff Hammond
> > jeff.science at gmail.com
> > http://jeffhammond.github.io/
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20160630/16b56a20/attachment.html>