<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Perhaps we can back one step:<br>Use your mpicc to build a "hello world" mpi test, then run it on a compute node (with GPU) to see if it works.<br>If no, then your MPI environment has problems;<br>If yes, then use it to build petsc (turn on petsc's gpu support, --with-cuda --with-cudac=nvcc), and then your code.<br>--Junchao Zhang</blockquote><div>OK tried this just to eliminate that the CUDA-capable OpenMPI is a factor:</div><div><font face="monospace">./configure --with-debugging=0 --with-cmake=true --with-mpi=true --with-mpi-dir=/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support --with-fc=0 --with-cuda=1<br></font></div><div><font face="monospace">[..]</font></div><div><font face="monospace">cuda:<br> Version: 11.7<br> Includes: -I/path/to/cuda11.7/toolkit/11.7.1/include<br> Libraries: -Wl,-rpath,/path/to/cuda11.7/toolkit/11.7.1/lib64 -L/cm/shared/apps/cuda11.7/toolkit/11.7.1/lib64 -L/path/to/cuda11.7/toolkit/11.7.1/lib64/stubs -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda<br> CUDA SM 75<br> CUDA underlying compiler: CUDA_CXX="/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/bin"/mpicxx<br> CUDA underlying compiler flags: CUDA_CXXFLAGS=<br> CUDA underlying linker libraries: CUDA_CXXLIBS=<br></font></div><div><font face="monospace">[...]</font></div><div><font face="monospace"> Configure stage complete. Now build PETSc libraries with:<br> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-opt all<br><br>C++ compiler version: g++ (GCC) 10.2.0<br>Using C++ compiler to compile PETSc<br>-----------------------------------------<br>Using C/C++ linker: /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/bin/mpicxx<br>Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -g -O0<br>-----------------------------------------<br>Using system modules: shared:slurm/20.02.6:DefaultModules:openmpi/gcc/64/4.1.1_cuda_11.0.3_aware:gdal/3.3.0:cmake/3.22.1:cuda11.7/toolkit/11.7.1:openblas/dynamic/0.3.7:gcc/10.2.0<br>Using mpi.h: # 1 "/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/include/mpi.h" 1<br>-----------------------------------------<br>Using libraries: -Wl,-rpath,/path/to/petsc/arch-linux-cxx-debug/lib -L/path/to/petsc/arch-linux-cxx-debug/lib -lpetsc -lopenblas -lm -lX11 -lquadmath -lstdc++ -ldl<br>------------------------------------------<br>Using mpiexec: mpiexec -mca orte_base_help_aggregate 0 -mca pml ucx --mca btl '^openib'<br>------------------------------------------<br>Using MAKE: /path/to/petsc/arch-linux-cxx-debug/bin/make<br>Using MAKEFLAGS: -j24 -l48.0 --no-print-directory -- MPIEXEC=mpiexec\ -mca\ orte_base_help_aggregate\ 0\ \ -mca\ pml\ ucx\ --mca\ btl\ '^openib' PETSC_ARCH=arch-linux-cxx-debug PETSC_DIR=/path/to/petsc<br>==========================================<br>make[3]: Nothing to be done for 'libs'.<br>=========================================<br>Now to check if the libraries are working do:<br>make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-cxx-debug check<br>=========================================<br>[me@xxx petsc]$ make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-cxx-debug MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 -mca pml ucx --mca btl '^openib'" check<br>Running check examples to verify correct installation<br>Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-cxx-debug<br>C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process<br>C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes</font><br></div><div><br></div><div><font face="monospace">./bandwidthTest <br>[CUDA Bandwidth Test] - Starting...<br>Running on...<br><br> Device 0: Quadro RTX 8000<br> Quick Mode<br><br> Host to Device Bandwidth, 1 Device(s)<br> PINNED Memory Transfers<br> Transfer Size (Bytes) Bandwidth(GB/s)<br> 32000000 12.3<br><br> Device to Host Bandwidth, 1 Device(s)<br> PINNED Memory Transfers<br> Transfer Size (Bytes) Bandwidth(GB/s)<br> 32000000 13.2<br><br> Device to Device Bandwidth, 1 Device(s)<br> PINNED Memory Transfers<br> Transfer Size (Bytes) Bandwidth(GB/s)<br> 32000000 466.2<br><br>Result = PASS</font></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Oct 8, 2022 at 7:56 PM Barry Smith <<a href="mailto:bsmith@petsc.dev">bsmith@petsc.dev</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
True, but when users send reports back to us they will never have used the VERBOSE=1 option, so it requires one more round trip of email to get this additional information. <br>
<br>
> On Oct 8, 2022, at 6:48 PM, Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>> wrote:<br>
> <br>
> Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> writes:<br>
> <br>
>> I hate these kinds of make rules that hide what the compiler is doing (in the name of having less output, I guess) it makes it difficult to figure out what is going wrong.<br>
> <br>
> You can make VERBOSE=1 with CMake-generated makefiles.<br></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div>Anyways, either some of the MPI libraries are missing from the link line or they are in the wrong order and thus it is not able to search them properly. Here is a bunch of discussions on why that error message can appear <a href="https://stackoverflow.com/questions/19901934/libpthread-so-0-error-adding-symbols-dso-missing-from-command-line" target="_blank">https://stackoverflow.com/questions/19901934/libpthread-so-0-error-adding-symbols-dso-missing-from-command-line</a></div></blockquote><div> </div><div><br></div><div>Still same but more noise and I have been using the suggestion of <font face="monospace">LDFLAGS="-Wl,--copy-dt-needed-entries"</font> along with <font face="monospace">make</font>:</div><font face="monospace">make[2]: Entering directory '/path/to/WTM/build'<br>cd /path/to/WTM/build && /path/to/cmake/cmake-3.22.1-linux-x86_64/bin/cmake -E cmake_depends "Unix Makefiles" /path/to/WTM /path/to/WTM /path/to/WTM/build /path/to/WTM/build /path/to/WTM/build/CMakeFiles/wtm.x.dir/DependInfo.cmake --color=<br>make[2]: Leaving directory '/path/to/WTM/build'<br>make -f CMakeFiles/wtm.x.dir/build.make CMakeFiles/wtm.x.dir/build<br>make[2]: Entering directory '/path/to/WTM/build'<br>[ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o<br>/cm/local/apps/gcc/10.2.0/bin/c++ -I/path/to/WTM/common/richdem/include -I/path/to/gdal-3.3.0/include -I/path/to/WTM/common/fmt/include -isystem /path/to/petsc/arch-linux-cxx-debug/include -isystem /path/to/petsc/include -isystem -O3 -g -Wall -Wextra -pedantic -Wshadow -Wfloat-conversion -Wall -Wextra -pedantic -Wshadow -DRICHDEM_GIT_HASH=\"xxx\" -DRICHDEM_COMPILE_TIME=\"2022-10-09T02:21:11Z\" -DUSEGDAL -Xpreprocessor -fopenmp /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40.30.1 -I/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/include -std=gnu++2a -MD -MT CMakeFiles/wtm.x.dir/src/WTM.cpp.o -MF CMakeFiles/wtm.x.dir/src/WTM.cpp.o.d -o CMakeFiles/wtm.x.dir/src/WTM.cpp.o -c /path/to/WTM/src/WTM.cpp<br>c++: warning: /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40.30.1: linker input file unused because linking not done<br>[ 70%] Linking CXX executable wtm.x<br>/path/to/cmake/cmake-3.22.1-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/wtm.x.dir/link.txt --verbose=1<br>/cm/local/apps/gcc/10.2.0/bin/c++ -isystem -O3 -g -Wall -Wextra -pedantic -Wshadow CMakeFiles/wtm.x.dir/src/WTM.cpp.o -o wtm.x -Wl,-rpath,/path/to/WTM/build/common/richdem:/path/to/gdal-3.3.0/lib:/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib:/path/to/petsc/arch-linux-cxx-debug/lib libwtm.a common/richdem/librichdem.so /path/to/gdal-3.3.0/lib/libgdal.so /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libompitrace.so.40.30.0 common/fmt/libfmt.a /path/to/petsc/arch-linux-cxx-debug/lib/libpetsc.so <br>/usr/bin/ld: CMakeFiles/wtm.x.dir/src/WTM.cpp.o: undefined reference to symbol 'ompi_mpi_comm_self'<br>/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40: error adding symbols: DSO missing from command line<br>collect2: error: ld returned 1 exit status<br>make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1<br>make[2]: Leaving directory '/path/to/WTM/build'<br>make[1]: *** [CMakeFiles/Makefile2:225: CMakeFiles/wtm.x.dir/all] Error 2<br>make[1]: Leaving directory '/path/to/WTM/build'<br></font><div><font face="monospace">make: *** [Makefile:136: all] Error 2</font></div><div> </div><div>Anything stick out?</div></div></div>