[petsc-users] CPU speed or DRAM speed bottlenecks ?

Thu Dec 3 20:58:30 CST 2020

You can try HPCTookit (http://hpctoolkit.org/), Tau (
https://www.cs.uoregon.edu/research/tau/home.php), or Intel VTune. But for
each, you need to read its manual to learn it.

--Junchao Zhang

On Thu, Dec 3, 2020 at 5:29 PM C B <cebau.mail at gmail.com> wrote:

> Barry,
>
> Thank you so much for your quick reply and insight.
>
> Are there any tools/simple ways to determine how much time is lost in
> cache misses / etc, please direct me to any resources to learn about this.
>
> Thanks again!
>
>
> On Thu, Dec 3, 2020 at 4:09 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>
>> On Dec 3, 2020, at 2:25 PM, C B <cebau.mail at gmail.com> wrote:
>>
>> Resorting to your expertise in software performance:
>>
>> Subject: Looking for a crude assessment of CPU speed or DRAM speed
>> bottlenecks in shared memory multi-core PCs
>>
>> On a typical PC with one Xeon CPU (8 cores),  a serial code runs a case
>> in say 10 hours of Wall time, and on the same computer 4 instances of the
>> same code running simultaneously (the same case) take essentially the same
>> Wall time, 10 hrs or a marginal increase such as 10hrs 30 mins.   There is
>> no I/O, lots of free physical RAM, each core running an instance shows ~
>> 100% utilization.
>>
>> Q1: What could we conclude about this hardware-software-case combination
>> in terms of being CPU bound, memory bandwidth bound, etc ?
>>
>>    It does not appear to be memory bandwidth bound.  Presumably the 4
>> cases will each be utilizing the same memory bandwidth as one case so I
>> think one can conclude that the 1 case is using at most 25 percent of the
>> memory bandwidth.
>>
>>
>> Q2: Can we say that this hardware-software-case combination is not DRAM
>> bound, and that it “may be amenable” to a good speedup running multiple
>> threads in the same shared memory environment ?
>>
>>    I think this is good a way to say it, "since it is not DRAM bound it
>> may be amendable to good speedup running multiple threads", it may also be
>> amendable to MPI parallelism. There are other factors that affect parallel
>> performance besides memory bandwidth without more information these are
>> unknown".
>>
>>   Barry
>>
>>
>>
>> I did look into the shared memory benchmark
>> http://www.cs.virginia.edu/stream  but I could not draw any conclusions.
>>
>> If this is a trivial question, please point me to a good resource to
>> learn.
>>
>> Thanks!
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201203/f8e0c643/attachment.html>