[petsc-dev] Exiting with error when using GPUs and non GPU-aware MPI
Richard Tran Mills
rtmills at anl.gov
Mon Mar 23 18:21:29 CDT 2020
Colleagues,
I did not notice this, but Junchao's MR, "Directly pass root/leafdata to
MPI in SF when possible"
https://gitlab.com/petsc/petsc/-/merge_requests/2506
that was merged into master over the weekend causes PETSc to error out
if PETSc has been configured with GPU support but the MPI implementation
is "GPU-aware", unless the user has specified "-use_gpu_aware_mpi 0":
> [0]PETSC ERROR: PETSc is configured with GPU support, but your MPI is
not GPU-aware. For better performance, please use a GPU-aware MPI.
> [0]PETSC ERROR: For IBM Spectrum MPI on OLCF Summit, you may need
jsrun --smpiargs=-gpu.
> [0]PETSC ERROR: For OpenMPI, you need to configure it --with-cuda
(https://www.open-mpi.org/faq/?category=buildcuda)
> [0]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1
(http://mvapich.cse.ohio-state.edu/userguide/gdr/)
> [0]PETSC ERROR: For Cray-MPICH, you need to set
MPICH_RDMA_ENABLED_CUDA=1
(https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/)
> [0]PETSC ERROR: If you do not care, use option -use_gpu_aware_mpi 0,
then PETSc will copy data from GPU to CPU for communication.
> application called MPI_Abort(MPI_COMM_WORLD, 90693076) - process 0
I like that we are warning users about a potential performance problem,
but this seems like something that should print a warning, rather than
exiting with an error. So I am wondering
1) Do people agree that this should be a warning instead of an error?
and
2) Shouldn't we add a standard mechanism for reporting these sorts of
warnings at runtime?
--Richard
More information about the petsc-dev
mailing list