[petsc-dev] MatMult on Summit

Sun Sep 22 00:01:18 CDT 2019

On 9/22/19 6:15 AM, Jed Brown wrote:
> Karl Rupp via petsc-dev <petsc-dev at mcs.anl.gov> writes:
> 
>> Hi Junchao,
>>
>> thanks, these numbers are interesting.
>>
>> Do you have an easy way to evaluate the benefits of a CUDA-aware MPI vs.
>> a non-CUDA-aware MPI that still keeps the benefits of your
>> packing/unpacking routines?
>>
>> I'd like to get a feeling of where the performance gains come from. Is
>> it due to the reduced PCI-Express transfer
> 
> It's NVLink, not PCI-express.

Indeed.

> I wonder if the single-node latency bugs on AC922 are related to these
> weird performance results.
> 
> https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0
> 

Thanks for these numbers!
Intra-Node > Inter-Node is indeed weird. I haven't observed such an 
inversion before.

Best regards,
Karli