[petsc-dev] MatMult on Summit

Sat Sep 21 23:15:35 CDT 2019

Karl Rupp via petsc-dev <petsc-dev at mcs.anl.gov> writes:

> Hi Junchao,
>
> thanks, these numbers are interesting.
>
> Do you have an easy way to evaluate the benefits of a CUDA-aware MPI vs. 
> a non-CUDA-aware MPI that still keeps the benefits of your 
> packing/unpacking routines?
>
> I'd like to get a feeling of where the performance gains come from. Is 
> it due to the reduced PCI-Express transfer 

It's NVLink, not PCI-express.

I wonder if the single-node latency bugs on AC922 are related to these
weird performance results.

https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0