[petsc-dev] MatMult on Summit

Smith, Barry F. bsmith at mcs.anl.gov
Sun Sep 22 09:46:25 CDT 2019


   I'm guessing it would be very difficult to connect this particular performance bug with a decrease in performance for an actual full application since models don't catch this level of detail well (and  since you cannot run the application without the bug to see the better performance)?  IBM/Nvidia are not going to care about it if is just an abstract oddity as opposed to clearly demonstrating a problem for the use of the machine, especially if the machine is an orphan.

> On Sep 22, 2019, at 8:35 AM, Jed Brown via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
> 
> Karl Rupp <rupp at iue.tuwien.ac.at> writes:
> 
>>> I wonder if the single-node latency bugs on AC922 are related to these
>>> weird performance results.
>>> 
>>> https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0
>>> 
>> 
>> Thanks for these numbers!
>> Intra-Node > Inter-Node is indeed weird. I haven't observed such an 
>> inversion before.
> 
> As far as I know, it's been there since the machines were deployed
> despite obviously being a bug.  I know people at LLNL regard it as a
> bug, but it has not been their top priority (presumably at least in part
> because applications have not clearly expressed the impact of latency
> regressions on their science).



More information about the petsc-dev mailing list