[petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

Barry Smith bsmith at petsc.dev
Sat Oct 8 14:19:48 CDT 2022


   I hate these kinds of make rules that hide what the compiler is doing (in the name of having less output, I guess) it makes it difficult to figure out what is going wrong.

   Anyways, either some of the MPI libraries are missing from the link line or they are in the wrong order and thus it is not able to search them properly. Here is a bunch of discussions on why that error message can appear https://stackoverflow.com/questions/19901934/libpthread-so-0-error-adding-symbols-dso-missing-from-command-line


  Barry


> On Oct 7, 2022, at 11:45 PM, Rob Kudyba <rk3199 at columbia.edu> wrote:
> 
> The error changes now and at an earlier place, 66% vs 70%:
> make LDFLAGS="-Wl,--copy-dt-needed-entries"
> Consolidate compiler generated dependencies of target fmt
> [ 12%] Built target fmt
> Consolidate compiler generated dependencies of target richdem
> [ 37%] Built target richdem
> Consolidate compiler generated dependencies of target wtm
> [ 62%] Built target wtm
> Consolidate compiler generated dependencies of target wtm.x
> [ 66%] Linking CXX executable wtm.x
> /usr/bin/ld: libwtm.a(transient_groundwater.cpp.o): undefined reference to symbol 'MPI_Abort'
> /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40: error adding symbols: DSO missing from command line
> collect2: error: ld returned 1 exit status
> make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
> make[1]: *** [CMakeFiles/Makefile2:225: CMakeFiles/wtm.x.dir/all] Error 2
> make: *** [Makefile:136: all] Error 2
> 
> So perhaps PET_Sc is now being found. Any other suggestions?
> 
> On Fri, Oct 7, 2022 at 11:18 PM Rob Kudyba <rk3199 at columbia.edu <mailto:rk3199 at columbia.edu>> wrote:
> 
> Thanks for the quick reply. I added these options to make and make check still produce the warnings so I used the command like this:
> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug  MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'" check
> Running check examples to verify correct installation
> Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> Completed test examples
> 
> Could be useful for the FAQ.
> You mentioned you had "OpenMPI 4.1.1 with CUDA aware",  so I think a workable mpicc should automatically find cuda libraries.  Maybe you unloaded cuda libraries?
> Oh let me clarify, OpenMPI is CUDA aware however this code and the node where PET_Sc is compiling does not have a GPU, hence not needed and using the MPIEXEC option worked during the 'check' to suppress the warning. 
> 
> I'm not trying to use PetSC to compile and linking appears to go awry:
> [ 58%] Building CXX object CMakeFiles/wtm.dir/src/update_effective_storativity.cpp.o
> [ 62%] Linking CXX static library libwtm.a
> [ 62%] Built target wtm
> [ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o
> [ 70%] Linking CXX executable wtm.x
> /usr/bin/ld: cannot find -lpetsc
> collect2: error: ld returned 1 exit status
> make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
> make[1]: *** [CMakeFiles/Makefile2:269: CMakeFiles/wtm.x.dir/all] Error 2
> make: *** [Makefile:136: all] Error 2
> It seems cmake could not find petsc.   Look at $PETSC_DIR/share/petsc/CMakeLists.txt and try to modify your CMakeLists.txt.
> 
> There is an explicit reference to the path in CMakeLists.txt:
> # NOTE: You may need to update this path to identify PETSc's location
> set(ENV{PKG_CONFIG_PATH} "$ENV{PKG_CONFIG_PATH}:/path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/")
> pkg_check_modules(PETSC PETSc>=3.17.1 IMPORTED_TARGET REQUIRED)
> message(STATUS "Found PETSc ${PETSC_VERSION}")
> add_subdirectory(common/richdem EXCLUDE_FROM_ALL)
> add_subdirectory(common/fmt EXCLUDE_FROM_ALL)
>  
> And that exists:
> ls /path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/
> petsc.pc  PETSc.pc
> 
>  Is there an environment variable I'm missing? I've seen the suggestion <https://www.mail-archive.com/search?l=petsc-users@mcs.anl.gov&q=subject:%22%5C%5Bpetsc%5C-users%5C%5D+CMake+error+in+PETSc%22&o=newest&f=1> to add it to LD_LIBRARY_PATH which I did with export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib and that points to:
> ls -l /path/to/petsc/arch-linux-c-debug/lib
> total 83732
> lrwxrwxrwx 1 rk3199 user       18 Oct  7 13:56 libpetsc.so -> libpetsc.so.3.18.0
> lrwxrwxrwx 1 rk3199 user       18 Oct  7 13:56 libpetsc.so.3.18 -> libpetsc.so.3.18.0
> -rwxr-xr-x 1 rk3199 user 85719200 Oct  7 13:56 libpetsc.so.3.18.0
> drwxr-xr-x 3 rk3199 user     4096 Oct  6 10:22 petsc
> drwxr-xr-x 2 rk3199 user     4096 Oct  6 10:23 pkgconfig
> 
> Anything else to check?
> If modifying  CMakeLists.txt does not work, you can try export LIBRARY_PATH=$LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib
> LD_LIBRARY_PATHis is for run time, but the error happened at link time, 
> 
> Yes that's what I already had. Any other debug that I can provide?
> 
>  
> On Fri, Oct 7, 2022 at 1:53 PM Satish Balay <balay at mcs.anl.gov <mailto:balay at mcs.anl.gov>> wrote:
> you can try
> 
> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'"
> 
> Wrt configure - it can be set with --with-mpiexec option - its saved in PETSC_ARCH/lib/petsc/conf/petscvariables
> 
> Satish
> 
> On Fri, 7 Oct 2022, Rob Kudyba wrote:
> 
> > We are on RHEL 8, using modules that we can load/unload various version of
> > packages/libraries, and I have OpenMPI 4.1.1 with CUDA aware loaded along
> > with GDAL 3.3.0, GCC 10.2.0, and cmake 3.22.1
> > 
> > make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug check
> > fails with the below errors,
> > Running check examples to verify correct installation
> > 
> > Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug
> > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process
> > See https://petsc.org/release/faq/ <https://petsc.org/release/faq/>
> > --------------------------------------------------------------------------
> > The library attempted to open the following supporting CUDA libraries,
> > but each of them failed.  CUDA-aware support is disabled.
> > libcuda.so.1: cannot open shared object file: No such file or directory
> > libcuda.dylib: cannot open shared object file: No such file or directory
> > /usr/lib64/libcuda.so.1: cannot open shared object file: No such file or
> > directory
> > /usr/lib64/libcuda.dylib: cannot open shared object file: No such file or
> > directory
> > If you are not interested in CUDA-aware support, then run with
> > --mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you are
> > interested
> > in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
> > of libcuda.so.1 to get passed this issue.
> > --------------------------------------------------------------------------
> > --------------------------------------------------------------------------
> > WARNING: There was an error initializing an OpenFabrics device.
> > 
> >   Local host:   g117
> >   Local device: mlx5_0
> > --------------------------------------------------------------------------
> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
> > Number of SNES iterations = 2
> > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
> > See https://petsc.org/release/faq/ <https://petsc.org/release/faq/>
> > 
> > The library attempted to open the following supporting CUDA libraries,
> > but each of them failed.  CUDA-aware support is disabled.
> > libcuda.so.1: cannot open shared object file: No such file or directory
> > libcuda.dylib: cannot open shared object file: No such file or directory
> > /usr/lib64/libcuda.so.1: cannot open shared object file: No such file or
> > directory
> > /usr/lib64/libcuda.dylib: cannot open shared object file: No such file or
> > directory
> > If you are not interested in CUDA-aware support, then run with
> > --mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you are
> > interested in CUDA-aware support, then try setting LD_LIBRARY_PATH to the
> > locationof libcuda.so.1 to get passed this issue.
> > 
> > WARNING: There was an error initializing an OpenFabrics device.
> > 
> >   Local host:   xxx
> >   Local device: mlx5_0
> > 
> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
> > Number of SNES iterations = 2
> > [g117:4162783] 1 more process has sent help message
> > help-mpi-common-cuda.txt / dlopen failed
> > [g117:4162783] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
> > help / error messages
> > [g117:4162783] 1 more process has sent help message help-mpi-btl-openib.txt
> > / error in device init
> > Completed test examples
> > Error while running make check
> > gmake[1]: *** [makefile:149: check] Error 1
> > make: *** [GNUmakefile:17: check] Error 2
> > 
> > Where is $MPI_RUN set? I'd like to be able to pass options such as --mca
> > orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml
> > ucx --mca btl '^openib' which will help me troubleshoot and hide unneeded
> > warnings.
> > 
> > Thanks,
> > Rob
> > 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221008/f9e9a73e/attachment.html>


More information about the petsc-users mailing list