[petsc-users] Obtaining bytes per second
Justin Chang
jychang48 at gmail.com
Wed May 6 02:30:27 CDT 2015
Thank you guys for your responses. If I want to estimate the number of
bytes that come down, would -memory_info give me that information?
And with this information plus the total number of logged flops, i can get
the ratio of flop to bytes and hence the (crude estimation of) upper bound
FLOPS/s based on the reported stream BW?
Thanks,
Justin
On Mon, May 4, 2015 at 11:07 AM, Jed Brown <jed at jedbrown.org> wrote:
> Justin Chang <jychang48 at gmail.com> writes:
>
> > Hi Jed,
> >
> > Thanks for the reply. Not too long ago one of you guys (Matt I think) had
> > mentioned the Roofline model and I was hoping to emulate something like
> it
> > for my application. If I understand the presentation slides (and the
> paper
> > implementing it) correctly, the upper bound FLOPS/s is calculated by
> > multiplying the stream BW by the ratio of DRAM flop to byte (aka
> arithmetic
> > intensity). The workload (i.e., flops) can be counted via PetscLogFlops()
> > and in the paper, the sparse matvec total bytes transferred for fmadd was
> > manually counted. Since my program involves more than just matvec I am
> > curious if there's a way to obtain the bytes for all operations and
> > functions invoked.
>
> Counting "useful" data motion subject to some cache granularity is not
> automatic. You can look at performance analysis of stencil operations
> for an example of what this can look like. I go through examples in my
> class, but I do it interactively with experiments rather than off of
> slides.
>
> > Or if I really should go with what you had suggested, could you
> elaborate a
> > little more on it, or point me to some papers/links/slides that talk
> about
> > it?
> >
> > Thanks,
> > Justin
> >
> > On Monday, May 4, 2015, Jed Brown <jed at jedbrown.org> wrote:
> >
> >> Justin Chang <jychang48 at gmail.com> writes:
> >>
> >> > Hello everyone,
> >> >
> >> > If I wanted to obtain the bytes/second for my PETSc program, is there
> a
> >> > generic way of doing this? My initial thought would be to first run
> the
> >> > program with valgrind to obtain the total memory usage, and then run
> it
> >> > without valgrind to get the wall clock time. These two metrics then
> give
> >> > you the bytes/second.
> >>
> >> Not really, because usually we're interested in useful bandwidth
> >> sustained from some level of cache. You can use hardware performance
> >> counters to measure the number of cache lines transferred, but this is
> >> usually an overestimate of the amount of useful data. You really need a
> >> performance model for your application and a cache model for the machine
> >> to say what bandwidth is useful.
> >>
> >> > Or can PETSc manually count the load/stores the way it's done for
> >> > flops?
> >>
> >> No, this information is not available in source code and would be nearly
> >> meaningless even if it was.
> >>
> >> > I was looking at the PetscMemXXX() functions but wasn't sure if this
> >> > is what I was looking for.
> >> >
> >> > Thanks,
> >> > Justin
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150506/e5a6e70d/attachment.html>
More information about the petsc-users
mailing list