[petsc-users] PetscLogFlop for a sqrt()

Matthew Knepley knepley at gmail.com
Tue Apr 21 12:29:15 CDT 2015


On Tue, Apr 21, 2015 at 12:23 PM, Justin Chang <jychang48 at gmail.com> wrote:

> Last question
>
> I would like to report the efficiency of my code. That is, flops/s over
> the theoretical peak performance (on n-cores). Where the TPP is clock *
> FLOPS/cycle * n. My current machine is a Intel® Core™ i7-4790 CPU @ 3.60GHz
> and I am assuming that the FLOPS/cycle is 4.
>
> One of my serial test runs has achieved a FLOPS/s of 2.01e+09, which
> translates to an efficiency of almost 14%. I know these are crude
> measurements but would these manual flop counts be appropriate for this
> kind of measurement? Or would hardware counts from PAPI?
>

1) For this, I think the manual counts are good enough for the estimate

2) You really should not compare to TPP, which ignores memory bandwidth
constraints:

  http://www.mcs.anl.gov/petsc/documentation/faq.html#computers

You should run the STREAMS benchmark on your machine as the link says, and
then
use a "roofline" model to estimate the peak performance based on bandwidth.
Here is
a talk on that model:

  http://crd.lbl.gov/assets/pubs_presos/parlab08-roofline-talk.pdf

and here is a paper which does exactly this for sparse MatVec (Krylov
methods)

  http://www.cs.odu.edu/~keyes/papers/pcfd99_gkks.pdf

Basically, you have some smaller multiplier to the bandwidth (arithmetic
intensity), which
gives you the real performance upper bound, not TPP.

  Thanks,

     Matt


> Thanks,
> Justin
>
> On Tue, Apr 21, 2015 at 11:16 AM, Jed Brown <jed at jedbrown.org> wrote:
>
>> Matthew Knepley <knepley at gmail.com> writes:
>> > Flop is Floating Point Operation. The index calculation is an Integer
>> > Operation. I agree that we could probably start counting
>> > those as well since in some sorts of applications its important, but
>> right
>> > now we don't.
>>
>> Index calculations often satisfy recurrences that the compiler folds
>> into pointer increments and the like.  Also, some architectures, like
>> PowerPC, have floating point instructions that include mutating index
>> operations in the true spirit of RISC. ;-)
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150421/5c37119c/attachment.html>


More information about the petsc-users mailing list