[petsc-users] Obtaining bytes per second

Mon May 4 11:07:32 CDT 2015

Justin Chang <jychang48 at gmail.com> writes:

> Hi Jed,
>
> Thanks for the reply. Not too long ago one of you guys (Matt I think) had
> mentioned the Roofline model and I was hoping to emulate something like it
> for my application. If I understand the presentation slides (and the paper
> implementing it) correctly, the upper bound FLOPS/s is calculated by
> multiplying the stream BW by the ratio of DRAM flop to byte (aka arithmetic
> intensity). The workload (i.e., flops) can be counted via PetscLogFlops()
> and in the paper, the sparse matvec total bytes transferred for fmadd was
> manually counted. Since my program involves more than just matvec I am
> curious if there's a way to obtain the bytes for all operations and
> functions invoked.

Counting "useful" data motion subject to some cache granularity is not
automatic.  You can look at performance analysis of stencil operations
for an example of what this can look like.  I go through examples in my
class, but I do it interactively with experiments rather than off of
slides.

> Or if I really should go with what you had suggested, could you elaborate a
> little more on it, or point me to some papers/links/slides that talk about
> it?
>
> Thanks,
> Justin
>
> On Monday, May 4, 2015, Jed Brown <jed at jedbrown.org> wrote:
>
>> Justin Chang <jychang48 at gmail.com> writes:
>>
>> > Hello everyone,
>> >
>> > If I wanted to obtain the bytes/second for my PETSc program, is there a
>> > generic way of doing this? My initial thought would be to first run the
>> > program with valgrind to obtain the total memory usage, and then run it
>> > without valgrind to get the wall clock time. These two metrics then give
>> > you the bytes/second.
>>
>> Not really, because usually we're interested in useful bandwidth
>> sustained from some level of cache.  You can use hardware performance
>> counters to measure the number of cache lines transferred, but this is
>> usually an overestimate of the amount of useful data.  You really need a
>> performance model for your application and a cache model for the machine
>> to say what bandwidth is useful.
>>
>> > Or can PETSc manually count the load/stores the way it's done for
>> > flops?
>>
>> No, this information is not available in source code and would be nearly
>> meaningless even if it was.
>>
>> > I was looking at the PetscMemXXX() functions but wasn't sure if this
>> > is what I was looking for.
>> >
>> > Thanks,
>> > Justin
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150504/69e92757/attachment-0001.pgp>