[petsc-dev] Exiting with error when using GPUs and non GPU-aware MPI

Richard Tran Mills rtmills at anl.gov
Mon Mar 23 18:21:29 CDT 2020


Colleagues,

I did not notice this, but Junchao's MR, "Directly pass root/leafdata to 
MPI in SF when possible"

   https://gitlab.com/petsc/petsc/-/merge_requests/2506

that was merged into master over the weekend causes PETSc to error out 
if PETSc has been configured with GPU support but the MPI implementation 
is "GPU-aware", unless the user has specified "-use_gpu_aware_mpi 0":

 > [0]PETSC ERROR: PETSc is configured with GPU support, but your MPI is 
not GPU-aware. For better performance, please use a GPU-aware MPI.
 > [0]PETSC ERROR: For IBM Spectrum MPI on OLCF Summit, you may need 
jsrun --smpiargs=-gpu.
 > [0]PETSC ERROR: For OpenMPI, you need to configure it --with-cuda 
(https://www.open-mpi.org/faq/?category=buildcuda)
 > [0]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1 
(http://mvapich.cse.ohio-state.edu/userguide/gdr/)
 > [0]PETSC ERROR: For Cray-MPICH, you need to set 
MPICH_RDMA_ENABLED_CUDA=1 
(https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/)
 > [0]PETSC ERROR: If you do not care, use option -use_gpu_aware_mpi 0, 
then PETSc will copy data from GPU to CPU for communication.
 > application called MPI_Abort(MPI_COMM_WORLD, 90693076) - process 0

I like that we are warning users about a potential performance problem, 
but this seems like something that should print a warning, rather than 
exiting with an error. So I am wondering

1) Do people agree that this should be a warning instead of an error?

and

2) Shouldn't we add a standard mechanism for reporting these sorts of 
warnings at runtime?

--Richard



More information about the petsc-dev mailing list