<div dir="ltr">Perhaps we can back one step:<div>Use your mpicc to build a "hello world" mpi test, then run it on a compute node (with GPU) to see if it works.</div><div>If no, then your MPI environment has problems;</div><div>If yes, then use it to build petsc (turn on petsc's gpu support, --with-cuda --with-cudac=nvcc), and then your code.</div><div><br></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Oct 7, 2022 at 10:45 PM Rob Kudyba <<a href="mailto:rk3199@columbia.edu">rk3199@columbia.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">The error changes now and at an earlier place, 66% vs 70%:<div><font face="monospace">make LDFLAGS="-Wl,--copy-dt-needed-entries"<br>Consolidate compiler generated dependencies of target fmt<br>[ 12%] Built target fmt<br>Consolidate compiler generated dependencies of target richdem<br>[ 37%] Built target richdem<br>Consolidate compiler generated dependencies of target wtm<br>[ 62%] Built target wtm<br>Consolidate compiler generated dependencies of target wtm.x<br>[ 66%] Linking CXX executable wtm.x<br>/usr/bin/ld: libwtm.a(transient_groundwater.cpp.o): undefined reference to symbol 'MPI_Abort'<br>/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40: error adding symbols: DSO missing from command line<br>collect2: error: ld returned 1 exit status<br>make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1<br>make[1]: *** [CMakeFiles/Makefile2:225: CMakeFiles/wtm.x.dir/all] Error 2<br>make: *** [Makefile:136: all] Error 2</font><br></div><div><br></div></div>So perhaps PET_Sc is now being found. Any other suggestions?<div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Oct 7, 2022 at 11:18 PM Rob Kudyba <<a href="mailto:rk3199@columbia.edu" target="_blank">rk3199@columbia.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Thanks for the quick reply. I added these options to make and make check still produce the warnings so I used the command like this:<div><font face="monospace">make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'" check<br>Running check examples to verify correct installation<br>Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug<br>C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process<br>C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes<br>Completed test examples</font><br></div><div><br></div><div>Could be useful for the FAQ.</div></div></div></blockquote><div>You mentioned you had "OpenMPI 4.1.1 with CUDA aware", so I think a workable mpicc should automatically find cuda libraries. Maybe you unloaded cuda libraries?</div></div></div></blockquote><div>Oh let me clarify, OpenMPI is CUDA aware however this code and the node where PET_Sc is compiling does not have a GPU, hence not needed and using the MPIEXEC option worked during the 'check' to suppress the warning. </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>I'm not trying to use PetSC to compile and linking appears to go awry:</div><div><font face="monospace">[ 58%] Building CXX object CMakeFiles/wtm.dir/src/update_effective_storativity.cpp.o<br>[ 62%] Linking CXX static library libwtm.a<br>[ 62%] Built target wtm<br>[ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o<br>[ 70%] Linking CXX executable wtm.x<br>/usr/bin/ld: cannot find -lpetsc<br>collect2: error: ld returned 1 exit status<br>make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1<br>make[1]: *** [CMakeFiles/Makefile2:269: CMakeFiles/wtm.x.dir/all] Error 2<br>make: *** [Makefile:136: all] Error 2</font></div></div></div></blockquote><div>It seems cmake could not find petsc. Look at $PETSC_DIR/share/petsc/CMakeLists.txt and try to modify your CMakeLists.txt.</div></div></div></blockquote><div><br></div><div>There is an explicit reference to the path in CMakeLists.txt:</div><div><font face="monospace"># NOTE: You may need to update this path to identify PETSc's location<br>set(ENV{PKG_CONFIG_PATH} "$ENV{PKG_CONFIG_PATH}:/path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/")<br>pkg_check_modules(PETSC PETSc>=3.17.1 IMPORTED_TARGET REQUIRED)<br>message(STATUS "Found PETSc ${PETSC_VERSION}")</font></div><div><font face="monospace">add_subdirectory(common/richdem EXCLUDE_FROM_ALL)<br>add_subdirectory(common/fmt EXCLUDE_FROM_ALL)<br></font></div><div> </div><div>And that exists:</div><div><font face="monospace">ls /path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/<br>petsc.pc PETSc.pc</font><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> Is there an environment variable I'm missing? I've <a href="https://www.mail-archive.com/search?l=petsc-users@mcs.anl.gov&q=subject:%22%5C%5Bpetsc%5C-users%5C%5D+CMake+error+in+PETSc%22&o=newest&f=1" target="_blank">seen the suggestion</a> to add it to <font face="monospace">LD_LIBRARY_PATH</font> which I did with export <font face="monospace">LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib</font> and that points to:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><font face="monospace">ls -l /path/to/petsc/arch-linux-c-debug/lib<br>total 83732<br>lrwxrwxrwx 1 rk3199 user 18 Oct 7 13:56 libpetsc.so -> libpetsc.so.3.18.0<br>lrwxrwxrwx 1 rk3199 user 18 Oct 7 13:56 libpetsc.so.3.18 -> libpetsc.so.3.18.0<br>-rwxr-xr-x 1 rk3199 user 85719200 Oct 7 13:56 libpetsc.so.3.18.0<br>drwxr-xr-x 3 rk3199 user 4096 Oct 6 10:22 petsc<br>drwxr-xr-x 2 rk3199 user 4096 Oct 6 10:23 pkgconfig</font><br></div></div><div><br></div>Anything else to check?</div></blockquote><div>If modifying CMakeLists.txt does not work, you can try export <font face="monospace">LIBRARY_PATH=$LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib</font></div><div><span style="font-family:monospace">LD_LIBRARY_PATH</span>is is for run time, but the error happened at link time, <br></div></div></div></blockquote><div><br></div><div>Yes that's what I already had. Any other debug that I can provide?</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Oct 7, 2022 at 1:53 PM Satish Balay <<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">you can try<br>
<br>
make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'"<br>
<br>
Wrt configure - it can be set with --with-mpiexec option - its saved in PETSC_ARCH/lib/petsc/conf/petscvariables<br>
<br>
Satish<br>
<br>
On Fri, 7 Oct 2022, Rob Kudyba wrote:<br>
<br>
> We are on RHEL 8, using modules that we can load/unload various version of<br>
> packages/libraries, and I have OpenMPI 4.1.1 with CUDA aware loaded along<br>
> with GDAL 3.3.0, GCC 10.2.0, and cmake 3.22.1<br>
> <br>
> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug check<br>
> fails with the below errors,<br>
> Running check examples to verify correct installation<br>
> <br>
> Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug<br>
> Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process<br>
> See <a href="https://petsc.org/release/faq/" rel="noreferrer" target="_blank">https://petsc.org/release/faq/</a><br>
> --------------------------------------------------------------------------<br>
> The library attempted to open the following supporting CUDA libraries,<br>
> but each of them failed. CUDA-aware support is disabled.<br>
> libcuda.so.1: cannot open shared object file: No such file or directory<br>
> libcuda.dylib: cannot open shared object file: No such file or directory<br>
> /usr/lib64/libcuda.so.1: cannot open shared object file: No such file or<br>
> directory<br>
> /usr/lib64/libcuda.dylib: cannot open shared object file: No such file or<br>
> directory<br>
> If you are not interested in CUDA-aware support, then run with<br>
> --mca opal_warn_on_missing_libcuda 0 to suppress this message. If you are<br>
> interested<br>
> in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location<br>
> of libcuda.so.1 to get passed this issue.<br>
> --------------------------------------------------------------------------<br>
> --------------------------------------------------------------------------<br>
> WARNING: There was an error initializing an OpenFabrics device.<br>
> <br>
> Local host: g117<br>
> Local device: mlx5_0<br>
> --------------------------------------------------------------------------<br>
> lid velocity = 0.0016, prandtl # = 1., grashof # = 1.<br>
> Number of SNES iterations = 2<br>
> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes<br>
> See <a href="https://petsc.org/release/faq/" rel="noreferrer" target="_blank">https://petsc.org/release/faq/</a><br>
> <br>
> The library attempted to open the following supporting CUDA libraries,<br>
> but each of them failed. CUDA-aware support is disabled.<br>
> libcuda.so.1: cannot open shared object file: No such file or directory<br>
> libcuda.dylib: cannot open shared object file: No such file or directory<br>
> /usr/lib64/libcuda.so.1: cannot open shared object file: No such file or<br>
> directory<br>
> /usr/lib64/libcuda.dylib: cannot open shared object file: No such file or<br>
> directory<br>
> If you are not interested in CUDA-aware support, then run with<br>
> --mca opal_warn_on_missing_libcuda 0 to suppress this message. If you are<br>
> interested in CUDA-aware support, then try setting LD_LIBRARY_PATH to the<br>
> locationof libcuda.so.1 to get passed this issue.<br>
> <br>
> WARNING: There was an error initializing an OpenFabrics device.<br>
> <br>
> Local host: xxx<br>
> Local device: mlx5_0<br>
> <br>
> lid velocity = 0.0016, prandtl # = 1., grashof # = 1.<br>
> Number of SNES iterations = 2<br>
> [g117:4162783] 1 more process has sent help message<br>
> help-mpi-common-cuda.txt / dlopen failed<br>
> [g117:4162783] Set MCA parameter "orte_base_help_aggregate" to 0 to see all<br>
> help / error messages<br>
> [g117:4162783] 1 more process has sent help message help-mpi-btl-openib.txt<br>
> / error in device init<br>
> Completed test examples<br>
> Error while running make check<br>
> gmake[1]: *** [makefile:149: check] Error 1<br>
> make: *** [GNUmakefile:17: check] Error 2<br>
> <br>
> Where is $MPI_RUN set? I'd like to be able to pass options such as --mca<br>
> orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml<br>
> ucx --mca btl '^openib' which will help me troubleshoot and hide unneeded<br>
> warnings.<br>
> <br>
> Thanks,<br>
> Rob<br>
> <br>
<br>
</blockquote></div></div></div>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div></div></div>
</blockquote></div>