[petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

Rob Kudyba rk3199 at columbia.edu
Fri Oct 7 12:40:44 CDT 2022


We are on RHEL 8, using modules that we can load/unload various version of
packages/libraries, and I have OpenMPI 4.1.1 with CUDA aware loaded along
with GDAL 3.3.0, GCC 10.2.0, and cmake 3.22.1

make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug check
fails with the below errors,
Running check examples to verify correct installation

Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug
Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process
See https://petsc.org/release/faq/
--------------------------------------------------------------------------
The library attempted to open the following supporting CUDA libraries,
but each of them failed.  CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
libcuda.dylib: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or
directory
/usr/lib64/libcuda.dylib: cannot open shared object file: No such file or
directory
If you are not interested in CUDA-aware support, then run with
--mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you are
interested
in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to get passed this issue.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   g117
  Local device: mlx5_0
--------------------------------------------------------------------------
lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
Number of SNES iterations = 2
Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
See https://petsc.org/release/faq/

The library attempted to open the following supporting CUDA libraries,
but each of them failed.  CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
libcuda.dylib: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or
directory
/usr/lib64/libcuda.dylib: cannot open shared object file: No such file or
directory
If you are not interested in CUDA-aware support, then run with
--mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you are
interested in CUDA-aware support, then try setting LD_LIBRARY_PATH to the
locationof libcuda.so.1 to get passed this issue.

WARNING: There was an error initializing an OpenFabrics device.

  Local host:   xxx
  Local device: mlx5_0

lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
Number of SNES iterations = 2
[g117:4162783] 1 more process has sent help message
help-mpi-common-cuda.txt / dlopen failed
[g117:4162783] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages
[g117:4162783] 1 more process has sent help message help-mpi-btl-openib.txt
/ error in device init
Completed test examples
Error while running make check
gmake[1]: *** [makefile:149: check] Error 1
make: *** [GNUmakefile:17: check] Error 2

Where is $MPI_RUN set? I'd like to be able to pass options such as --mca
orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml
ucx --mca btl '^openib' which will help me troubleshoot and hide unneeded
warnings.

Thanks,
Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221007/c9e3cc77/attachment.html>


More information about the petsc-users mailing list