[petsc-users] Obtaining bytes per second

Justin Chang jychang48 at gmail.com
Wed May 6 11:41:27 CDT 2015


I suppose I have two objectives that I think are achievable within PETSc
means:

1) How much wall-clock time can be reduced as you increase the number of
processors. I have strong-scaling and parallel efficiency metrics that
convey this.

2) The "optimal" problem size for these two methods/solvers are. What I
mean by this is, at what point do I achieve the maximum FLOPS/s. If
starting off with a really small problem then this metric should increase
with problem size. My hypothesis is that as problem size increases, the
ratio of wall-clock time spent in idle (e.g., waiting for cache to free up,
accessing main memory, etc) to performing work also increases, and the
reported FLOPS/s should start decreasing at some point. "Efficiency" in
this context simply means the highest possible FLOPS/s.

Does that make sense and/or is "interesting" enough?

Thanks,
Justin

On Wed, May 6, 2015 at 11:28 AM, Jed Brown <jed at jedbrown.org> wrote:

> Justin Chang <jychang48 at gmail.com> writes:
> > I already have speedup/strong scaling results that essentially depict the
> > difference between the KSPSolve() and TaoSolve(). However, I have been
> told
> > by someone that strong-scaling isn't enough - that I should somehow
> include
> > something to show the "efficiency" of these two methodologies.
>
> "Efficiency" is irrelevant if one is wrong.  Can you set up a problem
> where both get the right answer and vary a parameter to get to the case
> where one fails?  Then you can look at efficiency for a given accuracy
> (and you might have to refine the grid differently) as you vary the
> parameter.
>
> It's really hard to demonstrate that an implicit solver is optimal in
> terms of mathematical convergence rate.  Improvements there can dwarf
> any differences in implementation efficiency.
>
> > That is, how much of the wall-clock time reported by these two very
> > different solvers is spent doing useful work.
> >
> > Is such an "efficiency" metric necessary to report in addition to
> > strong-scaling results? The overall computational framework is the same
> for
> > both problems, the only difference being one uses a linear solver and the
> > other uses an optimization solver. My first thought was to use PAPI to
> > include hardware counters, but these are notoriously inaccurate. Then I
> > thought about simply reporting the manual FLOPS and FLOPS/s via PETSc,
> but
> > these metrics ignore memory bandwidth. And so here I am looking at the
> idea
> > of implementing the Roofline model, but now I am wondering if any of this
> > is worth the trouble.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150506/1fd9e99d/attachment.html>


More information about the petsc-users mailing list