[petsc-dev] MatMult on Summit

Sat Sep 21 23:51:35 CDT 2019

> On Sep 21, 2019, at 11:43 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> "Smith, Barry F." <bsmith at mcs.anl.gov> writes:
> 
>>  Jed,
>> 
>>  What does latency as a function of message size mean?   It is in the plots
> 
> It's just the wall-clock time to ping-pong a message of that size.  All
> the small sizes take the same amount of time (i.e., the latency), then
> transition to being network bandwidth limited for large sizes.

   Thanks, this is fine for the small size. But he has the graph up to size 1000000 and the plotted values change for larger sizes, surely for 1000000 the time is a combination of latency and bandwidth? Isn't calling it latency a misnomer, or do people use this inconsistent terminology when doing ping-pongs? 

> 
>> 
>>> On Sep 21, 2019, at 11:15 PM, Jed Brown via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
>>> 
>>> Karl Rupp via petsc-dev <petsc-dev at mcs.anl.gov> writes:
>>> 
>>>> Hi Junchao,
>>>> 
>>>> thanks, these numbers are interesting.
>>>> 
>>>> Do you have an easy way to evaluate the benefits of a CUDA-aware MPI vs. 
>>>> a non-CUDA-aware MPI that still keeps the benefits of your 
>>>> packing/unpacking routines?
>>>> 
>>>> I'd like to get a feeling of where the performance gains come from. Is 
>>>> it due to the reduced PCI-Express transfer 
>>> 
>>> It's NVLink, not PCI-express.
>>> 
>>> I wonder if the single-node latency bugs on AC922 are related to these
>>> weird performance results.
>>> 
>>> https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0