[petsc-users] CUDA-Aware MPI & PETSc

Mon Oct 7 17:09:18 CDT 2019

Hello, David,
   It took a longer time than I expected to add the CUDA-aware MPI feature in PETSc. It is now in PETSc-3.12, released last week. I have a little fix after that, so you better use petsc master.  Use petsc option -use_gpu_aware_mpi to enable it. On Summit, you also need jsrun --smpiargs="-gpu" to enable IBM Spectrum MPI's CUDA support. If you run with multiple MPI ranks per GPU, you also need #BSUB -alloc_flags gpumps in your job script.
  My experiments (using a simple test doing repeated MatMult) on Summit is mixed. With one MPI rank per GPU, I saw very good performance improvement (up to 25%). But with multiple ranks per GPU, I did not see improvement.  That sounds absurd since it should be easier for MPI ranks communicate data on the same GPU. I'm investigating this issue.
  If you can also evaluate this feature with your production code, that would be helpful.
  Thanks.
--Junchao Zhang

On Thu, Aug 22, 2019 at 11:34 AM David Gutzwiller <david.gutzwiller at gmail.com<mailto:david.gutzwiller at gmail.com>> wrote:
Hello Junchao,

Spectacular news!

I have our production code running on Summit (Power9 + Nvidia V100) and on local x86 workstations, and I can definitely provide comparative benchmark data with this feature once it is ready.  Just let me know when it is available for testing and I'll be happy to contribute.

Thanks,

-David

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>    Virus-free. www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>

On Thu, Aug 22, 2019 at 7:22 AM Zhang, Junchao <jczhang at mcs.anl.gov<mailto:jczhang at mcs.anl.gov>> wrote:
This feature is under active development. I hope I can make it usable in a couple of weeks. Thanks.
--Junchao Zhang

On Wed, Aug 21, 2019 at 3:21 PM David Gutzwiller via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
Hello,

I'm currently using PETSc for the GPU acceleration of simple Krylov solver with GMRES, without preconditioning.   This is within the framework of our in-house multigrid solver.  I am getting a good GPU speedup on the finest grid level but progressively worse performance on each coarse level.   This is not surprising, but I still hope to squeeze out some more performance, hopefully making it worthwhile to run some or all of the coarse grids on the GPU.

I started investigating with nvprof / nsight and essentially came to the same conclusion that Xiangdong reported in a recent thread (July 16, "MemCpy (HtoD and DtoH) in Krylov solver").  My question is a follow-up to that thread:

The MPI communication is staged from the host, which results in some H<->D transfers for every mat-vec operation.   A CUDA-aware MPI implementation might avoid these transfers for communication between ranks that are assigned to the same accelerator.   Has this been implemented or tested?

In our solver we typically run with multiple MPI ranks all assigned to a single device, and running with a single rank is not really feasible as we still have a sizable amount of work for the CPU to chew through.  Thus, I think quite a lot of the H<->D transfers could be avoided if I can skip the MPI staging on the host. I am quite new to PETSc so I wanted to ask around before blindly digging into this.

Thanks for your help,

David

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>    Virus-free. www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191007/34de3108/attachment.html>