[petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

Rob Kudyba rk3199 at columbia.edu
Fri Oct 7 22:18:25 CDT 2022


> Thanks for the quick reply. I added these options to make and make check
>> still produce the warnings so I used the command like this:
>> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug
>>  MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca
>> opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'" check
>> Running check examples to verify correct installation
>> Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug
>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI
>> processes
>> Completed test examples
>>
>> Could be useful for the FAQ.
>>
> You mentioned you had "OpenMPI 4.1.1 with CUDA aware",  so I think a
> workable mpicc should automatically find cuda libraries.  Maybe you
> unloaded cuda libraries?
>
Oh let me clarify, OpenMPI is CUDA aware however this code and the node
where PET_Sc is compiling does not have a GPU, hence not needed and using
the MPIEXEC option worked during the 'check' to suppress the warning.

I'm not trying to use PetSC to compile and linking appears to go awry:
>> [ 58%] Building CXX object
>> CMakeFiles/wtm.dir/src/update_effective_storativity.cpp.o
>> [ 62%] Linking CXX static library libwtm.a
>> [ 62%] Built target wtm
>> [ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o
>> [ 70%] Linking CXX executable wtm.x
>> /usr/bin/ld: cannot find -lpetsc
>> collect2: error: ld returned 1 exit status
>> make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
>> make[1]: *** [CMakeFiles/Makefile2:269: CMakeFiles/wtm.x.dir/all] Error 2
>> make: *** [Makefile:136: all] Error 2
>>
> It seems cmake could not find petsc.   Look
> at $PETSC_DIR/share/petsc/CMakeLists.txt and try to modify your
> CMakeLists.txt.
>

There is an explicit reference to the path in CMakeLists.txt:
# NOTE: You may need to update this path to identify PETSc's location
set(ENV{PKG_CONFIG_PATH}
"$ENV{PKG_CONFIG_PATH}:/path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/")
pkg_check_modules(PETSC PETSc>=3.17.1 IMPORTED_TARGET REQUIRED)
message(STATUS "Found PETSc ${PETSC_VERSION}")
add_subdirectory(common/richdem EXCLUDE_FROM_ALL)
add_subdirectory(common/fmt EXCLUDE_FROM_ALL)

And that exists:
ls /path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/
petsc.pc  PETSc.pc

 Is there an environment variable I'm missing? I've seen the suggestion
> <https://www.mail-archive.com/search?l=petsc-users@mcs.anl.gov&q=subject:%22%5C%5Bpetsc%5C-users%5C%5D+CMake+error+in+PETSc%22&o=newest&f=1>
> to add it to LD_LIBRARY_PATH which I did with export
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib and that
> points to:
>
>> ls -l /path/to/petsc/arch-linux-c-debug/lib
>> total 83732
>> lrwxrwxrwx 1 rk3199 user       18 Oct  7 13:56 libpetsc.so ->
>> libpetsc.so.3.18.0
>> lrwxrwxrwx 1 rk3199 user       18 Oct  7 13:56 libpetsc.so.3.18 ->
>> libpetsc.so.3.18.0
>> -rwxr-xr-x 1 rk3199 user 85719200 Oct  7 13:56 libpetsc.so.3.18.0
>> drwxr-xr-x 3 rk3199 user     4096 Oct  6 10:22 petsc
>> drwxr-xr-x 2 rk3199 user     4096 Oct  6 10:23 pkgconfig
>>
>> Anything else to check?
>>
> If modifying  CMakeLists.txt does not work, you can try export
> LIBRARY_PATH=$LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib
> LD_LIBRARY_PATHis is for run time, but the error happened at link time,
>

Yes that's what I already had. Any other debug that I can provide?



> On Fri, Oct 7, 2022 at 1:53 PM Satish Balay <balay at mcs.anl.gov> wrote:
>>
>>> you can try
>>>
>>> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug
>>> MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca
>>> opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'"
>>>
>>> Wrt configure - it can be set with --with-mpiexec option - its saved in
>>> PETSC_ARCH/lib/petsc/conf/petscvariables
>>>
>>> Satish
>>>
>>> On Fri, 7 Oct 2022, Rob Kudyba wrote:
>>>
>>> > We are on RHEL 8, using modules that we can load/unload various
>>> version of
>>> > packages/libraries, and I have OpenMPI 4.1.1 with CUDA aware loaded
>>> along
>>> > with GDAL 3.3.0, GCC 10.2.0, and cmake 3.22.1
>>> >
>>> > make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug check
>>> > fails with the below errors,
>>> > Running check examples to verify correct installation
>>> >
>>> > Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug
>>> > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process
>>> > See https://petsc.org/release/faq/
>>> >
>>> --------------------------------------------------------------------------
>>> > The library attempted to open the following supporting CUDA libraries,
>>> > but each of them failed.  CUDA-aware support is disabled.
>>> > libcuda.so.1: cannot open shared object file: No such file or directory
>>> > libcuda.dylib: cannot open shared object file: No such file or
>>> directory
>>> > /usr/lib64/libcuda.so.1: cannot open shared object file: No such file
>>> or
>>> > directory
>>> > /usr/lib64/libcuda.dylib: cannot open shared object file: No such file
>>> or
>>> > directory
>>> > If you are not interested in CUDA-aware support, then run with
>>> > --mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you
>>> are
>>> > interested
>>> > in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
>>> > of libcuda.so.1 to get passed this issue.
>>> >
>>> --------------------------------------------------------------------------
>>> >
>>> --------------------------------------------------------------------------
>>> > WARNING: There was an error initializing an OpenFabrics device.
>>> >
>>> >   Local host:   g117
>>> >   Local device: mlx5_0
>>> >
>>> --------------------------------------------------------------------------
>>> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
>>> > Number of SNES iterations = 2
>>> > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI
>>> processes
>>> > See https://petsc.org/release/faq/
>>> >
>>> > The library attempted to open the following supporting CUDA libraries,
>>> > but each of them failed.  CUDA-aware support is disabled.
>>> > libcuda.so.1: cannot open shared object file: No such file or directory
>>> > libcuda.dylib: cannot open shared object file: No such file or
>>> directory
>>> > /usr/lib64/libcuda.so.1: cannot open shared object file: No such file
>>> or
>>> > directory
>>> > /usr/lib64/libcuda.dylib: cannot open shared object file: No such file
>>> or
>>> > directory
>>> > If you are not interested in CUDA-aware support, then run with
>>> > --mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you
>>> are
>>> > interested in CUDA-aware support, then try setting LD_LIBRARY_PATH to
>>> the
>>> > locationof libcuda.so.1 to get passed this issue.
>>> >
>>> > WARNING: There was an error initializing an OpenFabrics device.
>>> >
>>> >   Local host:   xxx
>>> >   Local device: mlx5_0
>>> >
>>> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
>>> > Number of SNES iterations = 2
>>> > [g117:4162783] 1 more process has sent help message
>>> > help-mpi-common-cuda.txt / dlopen failed
>>> > [g117:4162783] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all
>>> > help / error messages
>>> > [g117:4162783] 1 more process has sent help message
>>> help-mpi-btl-openib.txt
>>> > / error in device init
>>> > Completed test examples
>>> > Error while running make check
>>> > gmake[1]: *** [makefile:149: check] Error 1
>>> > make: *** [GNUmakefile:17: check] Error 2
>>> >
>>> > Where is $MPI_RUN set? I'd like to be able to pass options such as
>>> --mca
>>> > orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca
>>> pml
>>> > ucx --mca btl '^openib' which will help me troubleshoot and hide
>>> unneeded
>>> > warnings.
>>> >
>>> > Thanks,
>>> > Rob
>>> >
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221007/0f5eaa3e/attachment-0001.html>


More information about the petsc-users mailing list