[petsc-users] Obtaining bytes per second

Mon May 4 11:01:22 CDT 2015

On Mon, May 4, 2015 at 7:07 AM, Justin Chang <jychang48 at gmail.com> wrote:

> Hi Jed,
>
> Thanks for the reply. Not too long ago one of you guys (Matt I think) had
> mentioned the Roofline model and I was hoping to emulate something like it
> for my application. If I understand the presentation slides (and the paper
> implementing it) correctly, the upper bound FLOPS/s is calculated by
> multiplying the stream BW by the ratio of DRAM flop to byte (aka arithmetic
> intensity). The workload (i.e., flops) can be counted via PetscLogFlops()
> and in the paper, the sparse matvec total bytes transferred for fmadd was
> manually counted. Since my program involves more than just matvec I am
> curious if there's a way to obtain the bytes for all operations and
> functions invoked.
>
> Or if I really should go with what you had suggested, could you elaborate
> a little more on it, or point me to some papers/links/slides that talk
> about it?
>

The best we can do is estimates here (because of all the caveats that Jed
points out). I suggest just counting
how many bytes come down manually, just as we do for flops.

  Thanks,

    Matt

> Thanks,
> Justin
>
>
> On Monday, May 4, 2015, Jed Brown <jed at jedbrown.org> wrote:
>
>> Justin Chang <jychang48 at gmail.com> writes:
>>
>> > Hello everyone,
>> >
>> > If I wanted to obtain the bytes/second for my PETSc program, is there a
>> > generic way of doing this? My initial thought would be to first run the
>> > program with valgrind to obtain the total memory usage, and then run it
>> > without valgrind to get the wall clock time. These two metrics then give
>> > you the bytes/second.
>>
>> Not really, because usually we're interested in useful bandwidth
>> sustained from some level of cache.  You can use hardware performance
>> counters to measure the number of cache lines transferred, but this is
>> usually an overestimate of the amount of useful data.  You really need a
>> performance model for your application and a cache model for the machine
>> to say what bandwidth is useful.
>>
>> > Or can PETSc manually count the load/stores the way it's done for
>> > flops?
>>
>> No, this information is not available in source code and would be nearly
>> meaningless even if it was.
>>
>> > I was looking at the PetscMemXXX() functions but wasn't sure if this
>> > is what I was looking for.
>> >
>> > Thanks,
>> > Justin
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150504/16afdf8a/attachment.html>