[petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

Rob Kudyba rk3199 at columbia.edu
Fri Oct 7 22:45:10 CDT 2022


The error changes now and at an earlier place, 66% vs 70%:
make LDFLAGS="-Wl,--copy-dt-needed-entries"
Consolidate compiler generated dependencies of target fmt
[ 12%] Built target fmt
Consolidate compiler generated dependencies of target richdem
[ 37%] Built target richdem
Consolidate compiler generated dependencies of target wtm
[ 62%] Built target wtm
Consolidate compiler generated dependencies of target wtm.x
[ 66%] Linking CXX executable wtm.x
/usr/bin/ld: libwtm.a(transient_groundwater.cpp.o): undefined reference to
symbol 'MPI_Abort'
/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40: error
adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
make[1]: *** [CMakeFiles/Makefile2:225: CMakeFiles/wtm.x.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

So perhaps PET_Sc is now being found. Any other suggestions?

On Fri, Oct 7, 2022 at 11:18 PM Rob Kudyba <rk3199 at columbia.edu> wrote:

>
> Thanks for the quick reply. I added these options to make and make check
>>> still produce the warnings so I used the command like this:
>>> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug
>>>  MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca
>>> opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'" check
>>> Running check examples to verify correct installation
>>> Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug
>>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
>>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI
>>> processes
>>> Completed test examples
>>>
>>> Could be useful for the FAQ.
>>>
>> You mentioned you had "OpenMPI 4.1.1 with CUDA aware",  so I think a
>> workable mpicc should automatically find cuda libraries.  Maybe you
>> unloaded cuda libraries?
>>
> Oh let me clarify, OpenMPI is CUDA aware however this code and the node
> where PET_Sc is compiling does not have a GPU, hence not needed and using
> the MPIEXEC option worked during the 'check' to suppress the warning.
>
> I'm not trying to use PetSC to compile and linking appears to go awry:
>>> [ 58%] Building CXX object
>>> CMakeFiles/wtm.dir/src/update_effective_storativity.cpp.o
>>> [ 62%] Linking CXX static library libwtm.a
>>> [ 62%] Built target wtm
>>> [ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o
>>> [ 70%] Linking CXX executable wtm.x
>>> /usr/bin/ld: cannot find -lpetsc
>>> collect2: error: ld returned 1 exit status
>>> make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
>>> make[1]: *** [CMakeFiles/Makefile2:269: CMakeFiles/wtm.x.dir/all] Error 2
>>> make: *** [Makefile:136: all] Error 2
>>>
>> It seems cmake could not find petsc.   Look
>> at $PETSC_DIR/share/petsc/CMakeLists.txt and try to modify your
>> CMakeLists.txt.
>>
>
> There is an explicit reference to the path in CMakeLists.txt:
> # NOTE: You may need to update this path to identify PETSc's location
> set(ENV{PKG_CONFIG_PATH}
> "$ENV{PKG_CONFIG_PATH}:/path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/")
> pkg_check_modules(PETSC PETSc>=3.17.1 IMPORTED_TARGET REQUIRED)
> message(STATUS "Found PETSc ${PETSC_VERSION}")
> add_subdirectory(common/richdem EXCLUDE_FROM_ALL)
> add_subdirectory(common/fmt EXCLUDE_FROM_ALL)
>
> And that exists:
> ls /path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/
> petsc.pc  PETSc.pc
>
>  Is there an environment variable I'm missing? I've seen the suggestion
>> <https://www.mail-archive.com/search?l=petsc-users@mcs.anl.gov&q=subject:%22%5C%5Bpetsc%5C-users%5C%5D+CMake+error+in+PETSc%22&o=newest&f=1>
>> to add it to LD_LIBRARY_PATH which I did with export
>> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib and that
>> points to:
>>
>>> ls -l /path/to/petsc/arch-linux-c-debug/lib
>>> total 83732
>>> lrwxrwxrwx 1 rk3199 user       18 Oct  7 13:56 libpetsc.so ->
>>> libpetsc.so.3.18.0
>>> lrwxrwxrwx 1 rk3199 user       18 Oct  7 13:56 libpetsc.so.3.18 ->
>>> libpetsc.so.3.18.0
>>> -rwxr-xr-x 1 rk3199 user 85719200 Oct  7 13:56 libpetsc.so.3.18.0
>>> drwxr-xr-x 3 rk3199 user     4096 Oct  6 10:22 petsc
>>> drwxr-xr-x 2 rk3199 user     4096 Oct  6 10:23 pkgconfig
>>>
>>> Anything else to check?
>>>
>> If modifying  CMakeLists.txt does not work, you can try export
>> LIBRARY_PATH=$LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib
>> LD_LIBRARY_PATHis is for run time, but the error happened at link time,
>>
>
> Yes that's what I already had. Any other debug that I can provide?
>
>
>
>> On Fri, Oct 7, 2022 at 1:53 PM Satish Balay <balay at mcs.anl.gov> wrote:
>>>
>>>> you can try
>>>>
>>>> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug
>>>> MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca
>>>> opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'"
>>>>
>>>> Wrt configure - it can be set with --with-mpiexec option - its saved in
>>>> PETSC_ARCH/lib/petsc/conf/petscvariables
>>>>
>>>> Satish
>>>>
>>>> On Fri, 7 Oct 2022, Rob Kudyba wrote:
>>>>
>>>> > We are on RHEL 8, using modules that we can load/unload various
>>>> version of
>>>> > packages/libraries, and I have OpenMPI 4.1.1 with CUDA aware loaded
>>>> along
>>>> > with GDAL 3.3.0, GCC 10.2.0, and cmake 3.22.1
>>>> >
>>>> > make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug check
>>>> > fails with the below errors,
>>>> > Running check examples to verify correct installation
>>>> >
>>>> > Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug
>>>> > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI
>>>> process
>>>> > See https://petsc.org/release/faq/
>>>> >
>>>> --------------------------------------------------------------------------
>>>> > The library attempted to open the following supporting CUDA libraries,
>>>> > but each of them failed.  CUDA-aware support is disabled.
>>>> > libcuda.so.1: cannot open shared object file: No such file or
>>>> directory
>>>> > libcuda.dylib: cannot open shared object file: No such file or
>>>> directory
>>>> > /usr/lib64/libcuda.so.1: cannot open shared object file: No such file
>>>> or
>>>> > directory
>>>> > /usr/lib64/libcuda.dylib: cannot open shared object file: No such
>>>> file or
>>>> > directory
>>>> > If you are not interested in CUDA-aware support, then run with
>>>> > --mca opal_warn_on_missing_libcuda 0 to suppress this message.  If
>>>> you are
>>>> > interested
>>>> > in CUDA-aware support, then try setting LD_LIBRARY_PATH to the
>>>> location
>>>> > of libcuda.so.1 to get passed this issue.
>>>> >
>>>> --------------------------------------------------------------------------
>>>> >
>>>> --------------------------------------------------------------------------
>>>> > WARNING: There was an error initializing an OpenFabrics device.
>>>> >
>>>> >   Local host:   g117
>>>> >   Local device: mlx5_0
>>>> >
>>>> --------------------------------------------------------------------------
>>>> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
>>>> > Number of SNES iterations = 2
>>>> > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI
>>>> processes
>>>> > See https://petsc.org/release/faq/
>>>> >
>>>> > The library attempted to open the following supporting CUDA libraries,
>>>> > but each of them failed.  CUDA-aware support is disabled.
>>>> > libcuda.so.1: cannot open shared object file: No such file or
>>>> directory
>>>> > libcuda.dylib: cannot open shared object file: No such file or
>>>> directory
>>>> > /usr/lib64/libcuda.so.1: cannot open shared object file: No such file
>>>> or
>>>> > directory
>>>> > /usr/lib64/libcuda.dylib: cannot open shared object file: No such
>>>> file or
>>>> > directory
>>>> > If you are not interested in CUDA-aware support, then run with
>>>> > --mca opal_warn_on_missing_libcuda 0 to suppress this message.  If
>>>> you are
>>>> > interested in CUDA-aware support, then try setting LD_LIBRARY_PATH to
>>>> the
>>>> > locationof libcuda.so.1 to get passed this issue.
>>>> >
>>>> > WARNING: There was an error initializing an OpenFabrics device.
>>>> >
>>>> >   Local host:   xxx
>>>> >   Local device: mlx5_0
>>>> >
>>>> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
>>>> > Number of SNES iterations = 2
>>>> > [g117:4162783] 1 more process has sent help message
>>>> > help-mpi-common-cuda.txt / dlopen failed
>>>> > [g117:4162783] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>>> see all
>>>> > help / error messages
>>>> > [g117:4162783] 1 more process has sent help message
>>>> help-mpi-btl-openib.txt
>>>> > / error in device init
>>>> > Completed test examples
>>>> > Error while running make check
>>>> > gmake[1]: *** [makefile:149: check] Error 1
>>>> > make: *** [GNUmakefile:17: check] Error 2
>>>> >
>>>> > Where is $MPI_RUN set? I'd like to be able to pass options such as
>>>> --mca
>>>> > orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca
>>>> pml
>>>> > ucx --mca btl '^openib' which will help me troubleshoot and hide
>>>> unneeded
>>>> > warnings.
>>>> >
>>>> > Thanks,
>>>> > Rob
>>>> >
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221007/d0bce3de/attachment.html>


More information about the petsc-users mailing list