[petsc-dev] MatMult on Summit

Sat Sep 21 23:43:53 CDT 2019

"Smith, Barry F." <bsmith at mcs.anl.gov> writes:

>   Jed,
>
>   What does latency as a function of message size mean?   It is in the plots

It's just the wall-clock time to ping-pong a message of that size.  All
the small sizes take the same amount of time (i.e., the latency), then
transition to being network bandwidth limited for large sizes.

>
>> On Sep 21, 2019, at 11:15 PM, Jed Brown via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
>> 
>> Karl Rupp via petsc-dev <petsc-dev at mcs.anl.gov> writes:
>> 
>>> Hi Junchao,
>>> 
>>> thanks, these numbers are interesting.
>>> 
>>> Do you have an easy way to evaluate the benefits of a CUDA-aware MPI vs. 
>>> a non-CUDA-aware MPI that still keeps the benefits of your 
>>> packing/unpacking routines?
>>> 
>>> I'd like to get a feeling of where the performance gains come from. Is 
>>> it due to the reduced PCI-Express transfer 
>> 
>> It's NVLink, not PCI-express.
>> 
>> I wonder if the single-node latency bugs on AC922 are related to these
>> weird performance results.
>> 
>> https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0