[petsc-dev] CUDA + OpenMP on Summit with Hypre

Jacob Faibussowitsch jacob.fai at gmail.com
Mon Nov 15 08:26:49 CST 2021


> > [0]PETSC ERROR: PETSc is configured with GPU support, but your MPI is not GPU-aware. For better performance, please use a GPU-aware MPI.
> > [0]PETSC ERROR: If you do not care, add option -use_gpu_aware_mpi 0. To not see the message again, add the option to your .petscrc, OR add it to the env var PETSC_OPTIONS.
> > [0]PETSC ERROR: If you do care, for IBM Spectrum MPI on OLCF Summit, you may need jsrun --smpiargs=-gpu.
> > [0]PETSC ERROR: For OpenMPI, you need to configure it --with-cuda (https://www.open-mpi.org/faq/?category=buildcuda <https://www.open-mpi.org/faq/?category=buildcuda>)
> > [0]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1 (http://mvapich.cse.ohio-state.edu/userguide/gdr/ <http://mvapich.cse.ohio-state.edu/userguide/gdr/>)
> > [0]PETSC ERROR: For Cray-MPICH, you need to set MPICH_RDMA_ENABLED_CUDA=1 (https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/ <https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/>)

You seem to also be tripping up the gpu aware mpi checker. IIRC we discussed removing this at some point? I think Stefano mentioned we now do this check at configure time?

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

> On Nov 13, 2021, at 22:57, Junchao Zhang <junchao.zhang at gmail.com> wrote:
> 
> 
> 
> 
> On Sat, Nov 13, 2021 at 2:24 PM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> I have a user that wants CUDA + Hypre on Sumit and they want to use OpenMP in their code. I configured with openmp but without thread safety and got this error.
> 
> Maybe there is no need for us to do anything with omp in our configuration. Not sure.
> 
> 15:08 main= summit:/gpfs/alpine/csc314/scratch/adams/petsc$ make PETSC_DIR=/gpfs/alpine/world-shared/geo127/petsc/arch-opt-gcc9.1.0-omp-cuda11.0.3 PETSC_ARCH="" check
> Running check examples to verify correct installation
> Using PETSC_DIR=/gpfs/alpine/world-shared/geo127/petsc/arch-opt-gcc9.1.0-omp-cuda11.0.3 and PETSC_ARCH=
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
> See http://www.mcs.anl.gov/petsc/documentation/faq.html <http://www.mcs.anl.gov/petsc/documentation/faq.html>
> [1] (280696) Warning: Could not find key lid0:0:2 in cache <=========================
> [1] (280696) Warning: Could not find key qpn0:0:0:2 in cache <=========================
> Unable to connect queue-pairs
> [h37n08:280696] Error: common_pami.c:1094 - ompi_common_pami_init() 1: Unable to create 1 PAMI communication context(s) rc=1
> I don't know what petsc's thread safety is.  But this error seems to be in the environment.   You can report to OLCF help.
>  
> --------------------------------------------------------------------------
> No components were able to be opened in the pml framework.
> 
> This typically means that either no components of this type were
> installed, or none of the installed components can be loaded.
> Sometimes this means that shared libraries required by these
> components are unable to be found/loaded.
> 
>   Host:      h37n08
>   Framework: pml
> --------------------------------------------------------------------------
> [h37n08:280696] PML pami cannot be selected
> 1,5c1,16
> < lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
> <   0 SNES Function norm 0.0406612
> <   1 SNES Function norm 4.12227e-06
> <   2 SNES Function norm 6.098e-11
> < Number of SNES iterations = 2
> ---
> > [1] (280721) Warning: Could not find key lid0:0:2 in cache <=========================
> > [1] (280721) Warning: Could not find key qpn0:0:0:2 in cache <=========================
> > Unable to connect queue-pairs
> > [h37n08:280721] Error: common_pami.c:1094 - ompi_common_pami_init() 1: Unable to create 1 PAMI communication context(s) rc=1
> > --------------------------------------------------------------------------
> > No components were able to be opened in the pml framework.
> >
> > This typically means that either no components of this type were
> > installed, or none of the installed components can be loaded.
> > Sometimes this means that shared libraries required by these
> > components are unable to be found/loaded.
> >
> >   Host:      h37n08
> >   Framework: pml
> > --------------------------------------------------------------------------
> > [h37n08:280721] PML pami cannot be selected
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials
> Possible problem with ex19 running with hypre, diffs above
> =========================================
> 2,15c2,15
> <   0 SNES Function norm 2.391552133017e-01
> <     0 KSP Residual norm 2.325621076120e-01
> <     1 KSP Residual norm 1.654206318674e-02
> <     2 KSP Residual norm 7.202836119880e-04
> <     3 KSP Residual norm 1.796861424199e-05
> <     4 KSP Residual norm 2.461332992052e-07
> <   1 SNES Function norm 6.826585648929e-05
> <     0 KSP Residual norm 2.347339172985e-05
> <     1 KSP Residual norm 8.356798075993e-07
> <     2 KSP Residual norm 1.844045309619e-08
> <     3 KSP Residual norm 5.336386977405e-10
> <     4 KSP Residual norm 2.662608472862e-11
> <   2 SNES Function norm 6.549682264799e-11
> < Number of SNES iterations = 2
> ---
> > [0]PETSC ERROR: PETSc is configured with GPU support, but your MPI is not GPU-aware. For better performance, please use a GPU-aware MPI.
> > [0]PETSC ERROR: If you do not care, add option -use_gpu_aware_mpi 0. To not see the message again, add the option to your .petscrc, OR add it to the env var PETSC_OPTIONS.
> > [0]PETSC ERROR: If you do care, for IBM Spectrum MPI on OLCF Summit, you may need jsrun --smpiargs=-gpu.
> > [0]PETSC ERROR: For OpenMPI, you need to configure it --with-cuda (https://www.open-mpi.org/faq/?category=buildcuda <https://www.open-mpi.org/faq/?category=buildcuda>)
> > [0]PETSC ERROR: For MVAPICH2-GDR, you need to set MV2_USE_CUDA=1 (http://mvapich.cse.ohio-state.edu/userguide/gdr/ <http://mvapich.cse.ohio-state.edu/userguide/gdr/>)
> > [0]PETSC ERROR: For Cray-MPICH, you need to set MPICH_RDMA_ENABLED_CUDA=1 (https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/ <https://www.olcf.ornl.gov/tutorials/gpudirect-mpich-enabled-cuda/>)
> > --------------------------------------------------------------------------
> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
> > with errorcode 76.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > You may or may not see output from other processes, depending on
> > exactly when Open MPI kills them.
> > --------------------------------------------------------------------------
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials
> Possible problem with ex19 running with cuda, diffs above
> =========================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20211115/34466e59/attachment.html>


More information about the petsc-dev mailing list