<div dir="ltr"><div><div><div>I suppose I have two objectives that I think are achievable within PETSc means: <br><br>1)

 How much wall-clock time can be reduced as you increase the number of 

processors. I have strong-scaling and parallel efficiency metrics that 

convey this. <br><br>2) The "optimal" problem size for these two 

methods/solvers are. What I mean by this is, at what point do I achieve 

the maximum FLOPS/s. If starting off with a really small problem then 

this metric should increase with problem size. My hypothesis is that as problem size increases, the ratio of wall-clock time spent in idle (e.g., waiting 

for cache to free up, accessing main memory, etc) to performing work also increases, and the reported FLOPS/s should start decreasing at some point. "Efficiency" in this context simply means the highest possible FLOPS/s.<br><br></div>Does that make sense and/or is "interesting" enough?<br><br></div>Thanks,<br></div>Justin<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 6, 2015 at 11:28 AM, Jed Brown <span dir="ltr"><<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Justin Chang <<a href="mailto:jychang48@gmail.com">jychang48@gmail.com</a>> writes:<br>

> I already have speedup/strong scaling results that essentially depict the<br>

> difference between the KSPSolve() and TaoSolve(). However, I have been told<br>

> by someone that strong-scaling isn't enough - that I should somehow include<br>

> something to show the "efficiency" of these two methodologies.<br>

<br>

</span>"Efficiency" is irrelevant if one is wrong.  Can you set up a problem<br>

where both get the right answer and vary a parameter to get to the case<br>

where one fails?  Then you can look at efficiency for a given accuracy<br>

(and you might have to refine the grid differently) as you vary the<br>

parameter.<br>

<br>

It's really hard to demonstrate that an implicit solver is optimal in<br>

terms of mathematical convergence rate.  Improvements there can dwarf<br>

any differences in implementation efficiency.<br>

<div class="HOEnZb"><div class="h5"><br>

> That is, how much of the wall-clock time reported by these two very<br>

> different solvers is spent doing useful work.<br>

><br>

> Is such an "efficiency" metric necessary to report in addition to<br>

> strong-scaling results? The overall computational framework is the same for<br>

> both problems, the only difference being one uses a linear solver and the<br>

> other uses an optimization solver. My first thought was to use PAPI to<br>

> include hardware counters, but these are notoriously inaccurate. Then I<br>

> thought about simply reporting the manual FLOPS and FLOPS/s via PETSc, but<br>

> these metrics ignore memory bandwidth. And so here I am looking at the idea<br>

> of implementing the Roofline model, but now I am wondering if any of this<br>

> is worth the trouble.<br>

<br>

</div></div></blockquote></div><br></div>