[petsc-dev] Hardware counter logging in PETSc (was Re: Where next with PETSc and KNL?)

Richard Mills richardtmills at gmail.com
Wed Sep 28 19:02:49 CDT 2016


Hi Barry,

Thanks for starting a thread about this on petsc-dev; I was planning to do
so but still hadn't gotten to it.

We can certainly get the performance data we need from various performance
analysis tools, and for some kinds of data, those are the best way to try
to get it.  The reasons I'd like to add some PETSc logging support for
collecting hardware data are primarily

1) Many of the tools are rather "heavy weight" or otherwise cumbersome to
use.  I've always loved how lightweight the PETSc logging framework is, and
have always preferred to use that for performance tuning work until I get
down to a level that requires the use of independent tools.  I also like
working with the text reports that I get from PETSc.  Some performance
tools do a decent job generating text reports, but many require me to fire
up an annoying GUI to do even trivial tasks.  This is especially annoying
when the data I want to work with are on some supercomputer to which I have
a slow Internet connection.

2) External performance analysis tools know nothing of things like PETSc
logging stages or events.  If I am using a tool like VTune to analyze
something like a flow and reactive transport problem in PFLOTRAN, VTune
doesn't know that I want to consider calls to SNESSolve() and children in
the flow stage separately from those made in the transport stage.  Many
tools provide ways to identify things like this, but it generally requires
instrumenting the code by hand using a proprietary API.  Furthermore, most
of these APIs don't have a sort of push/pop mechanism like we have for
PETSc stages.  I really don't want to have to instrument my code for each
tool that I might want to use, especially since I've already gone to the
trouble of defining various stages/events with PETSc -- I'd like to just
use those!

Both of the above are important motivations, but I think (2) is my primary
driver.  I'd be happier with many of the tools if they were aware of PETSc
stages and events.

--Richard

On Wed, Sep 28, 2016 at 2:33 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>    Moving to petsc-dev so everyone can see this discussion.
>
>     To get more detailed "performance" information on runs we have two
> (not necessarily orthogonal) choices.
>
>   1) use an integrated system that is independent of PETSc. These
> sometimes require compiling with additional options and then running a
> post-processor after the run. These systems then display the results in
> some kind of GUI. Intel has such a thing, as does Apple. Do they allow
> logging/display of things we care about such as cache misses, ....? Depends
> on each system, and some of the systems are improving over time.
>
>   2) add additional logging of values into the PETSc logging and then have
> PetscLogView() process the raw logged values into useful information.
>
>
>    Both approaches have advantages and disadvantages but we do take on a
> large development and maintenance burden if we try to incorporate more
> logging directly into PETSc. So what does incorporating into PETSc buy us
> that is worth the extra hassle? That is can we do something with the "in
> PETSc" approach we could not achieve otherwise? (I don't thing arguments
> about it being more portable and not requiring you to buy vtune etc from
> Intel are enough reason to do the work internally.)
>
>    In other words if I am interested in finding out why my MatMult() is
> slower then I think it should be is it such a terrible thing to have crank
> up vtune (or similar beast) to get details about the computational phase I
> am interested in?
>
>    Barry
>
> You should be able to guess that I am leaning towards 1) and want to know
> why that is a fatal mistake, if it is?
>
>
>
> > On Sep 24, 2016, at 12:00 PM, Richard Tran Mills <
> richard.t.mills at intel.com> wrote:
> >
> > Hi Folks,
> >
> > I'm breaking up replies to my long email message into smaller chunks to
> make it easier to keep track of the discussion.  Just address the perf
> counter issue here.
> >
> > On 9/24/16 6:54 AM, Jed Brown wrote:
> >> 7) I still think we should add some support for collecting hardware
> >>> counter information in the PETSc logging framework.  I see that the
> >>> latest PAPI release adds some KNL support, though I don't know if it
> >>> supports the uncore counters.  Anyhow, I should start a thread on
> >>> petsc-dev about this...
> >> There was some PAPI support once upon a time (before my time), but I
> >> think Barry stripped it out because it's crappy software.  I haven't
> >> seriously looked at using the linux performance counter interface
> >> directly, but it would be less to install and not streaked with Dongarra
> >> poo.
> > An alternative that I came across is something written by some Intel
> folks, with the terribly generic name of "Intel Performance Counter
> Monitor".  The webpage for it is at
> >
> > https://software.intel.com/en-us/articles/intel-performance-
> counter-monitor
> >
> > It provides a simple C++ API (I wish there was a C one; we'd need to
> wrap things to keep from polluting the PETSc code with C++ stuff) that lets
> you capture essentially any of the PMU events.  This looks a lot nicer than
> PAPI in several ways, but has the downside of being Intel-specific.  I also
> don't see any KNL-specific counter support yet.
> >
> > --Richard
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20160928/ac5cd580/attachment.html>


More information about the petsc-dev mailing list