[petsc-users] Obtaining bytes per second

Wed May 6 11:48:03 CDT 2015

On Wed, May 6, 2015 at 11:41 AM, Justin Chang <jychang48 at gmail.com> wrote:

> I suppose I have two objectives that I think are achievable within PETSc
> means:
>
> 1) How much wall-clock time can be reduced as you increase the number of
> processors. I have strong-scaling and parallel efficiency metrics that
> convey this.
>
> 2) The "optimal" problem size for these two methods/solvers are. What I
> mean by this is, at what point do I achieve the maximum FLOPS/s. If
> starting off with a really small problem then this metric should increase
> with problem size. My hypothesis is that as problem size increases, the
> ratio of wall-clock time spent in idle (e.g., waiting for cache to free up,
> accessing main memory, etc) to performing work also increases, and the
> reported FLOPS/s should start decreasing at some point. "Efficiency" in
> this context simply means the highest possible FLOPS/s.
>
> Does that make sense and/or is "interesting" enough?
>

I think 2) is not really that interesting because

  a) it is so easily gamed. Just stick in high flop count operations, like
DGEMM.

  b) Time really matters to people who run the code, but flops never do.

  c) Floating point performance is not your limiting factor for time

I think it would be much more interesting, and no more work to

  a) Model the flop/byte \beta ratio simply

  b) Report how close you get to the max performance given \beta on your
machine

  Thanks,

     Matt

> Thanks,
> Justin
>
> On Wed, May 6, 2015 at 11:28 AM, Jed Brown <jed at jedbrown.org> wrote:
>
>> Justin Chang <jychang48 at gmail.com> writes:
>> > I already have speedup/strong scaling results that essentially depict
>> the
>> > difference between the KSPSolve() and TaoSolve(). However, I have been
>> told
>> > by someone that strong-scaling isn't enough - that I should somehow
>> include
>> > something to show the "efficiency" of these two methodologies.
>>
>> "Efficiency" is irrelevant if one is wrong.  Can you set up a problem
>> where both get the right answer and vary a parameter to get to the case
>> where one fails?  Then you can look at efficiency for a given accuracy
>> (and you might have to refine the grid differently) as you vary the
>> parameter.
>>
>> It's really hard to demonstrate that an implicit solver is optimal in
>> terms of mathematical convergence rate.  Improvements there can dwarf
>> any differences in implementation efficiency.
>>
>> > That is, how much of the wall-clock time reported by these two very
>> > different solvers is spent doing useful work.
>> >
>> > Is such an "efficiency" metric necessary to report in addition to
>> > strong-scaling results? The overall computational framework is the same
>> for
>> > both problems, the only difference being one uses a linear solver and
>> the
>> > other uses an optimization solver. My first thought was to use PAPI to
>> > include hardware counters, but these are notoriously inaccurate. Then I
>> > thought about simply reporting the manual FLOPS and FLOPS/s via PETSc,
>> but
>> > these metrics ignore memory bandwidth. And so here I am looking at the
>> idea
>> > of implementing the Roofline model, but now I am wondering if any of
>> this
>> > is worth the trouble.
>>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150506/017510b9/attachment-0001.html>