[petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

Junchao Zhang junchao.zhang at gmail.com
Sun Oct 9 20:31:23 CDT 2022


In the last link step to generate the executable
/cm/local/apps/gcc/10.2.0/bin/c++ -isystem -O3 -g -Wall -Wextra -pedantic
-Wshadow CMakeFiles/wtm.x.dir/src/WTM.cpp.o -o wtm.x
 -Wl,-rpath,/path/to/WTM/build/common/richdem:/path/to/
gdal-3.3.0/lib:/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_
support/lib:/path/to/petsc/arch-linux-cxx-debug/lib libwtm.a
common/richdem/librichdem.so /path/to/gdal-3.3.0/lib/libgdal.so
/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libompitrace.so.40.30.0
common/fmt/libfmt.a /path/to/petsc/arch-linux-cxx-debug/lib/libpetsc.so

I did not find -lmpi to link in the mpi library.  You can try to use  cmake
-DCMAKE_C_COMPILER=/path/to/mpicc  -DCMAKE_CXX_COMPILER=/path/to/mpicxx to
build your code

On Sat, Oct 8, 2022 at 9:32 PM Rob Kudyba <rk3199 at columbia.edu> wrote:

> Perhaps we can back one step:
>> Use your mpicc to build a "hello world" mpi test, then run it on a
>> compute node (with GPU) to see if it works.
>> If no, then your MPI environment has problems;
>> If yes, then use it to build petsc (turn on petsc's gpu support,
>>  --with-cuda  --with-cudac=nvcc), and then your code.
>> --Junchao Zhang
>
> OK tried this just to eliminate that the CUDA-capable OpenMPI is a factor:
> ./configure --with-debugging=0 --with-cmake=true   --with-mpi=true
>  --with-mpi-dir=/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support --with-fc=0
>   --with-cuda=1
> [..]
> cuda:
>   Version:    11.7
>   Includes:   -I/path/to/cuda11.7/toolkit/11.7.1/include
>   Libraries:  -Wl,-rpath,/path/to/cuda11.7/toolkit/11.7.1/lib64
> -L/cm/shared/apps/cuda11.7/toolkit/11.7.1/lib64
> -L/path/to/cuda11.7/toolkit/11.7.1/lib64/stubs -lcudart -lnvToolsExt
> -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda
>   CUDA SM 75
>   CUDA underlying compiler:
> CUDA_CXX="/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/bin"/mpicxx
>   CUDA underlying compiler flags: CUDA_CXXFLAGS=
>   CUDA underlying linker libraries: CUDA_CXXLIBS=
> [...]
>  Configure stage complete. Now build PETSc libraries with:
>    make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-opt all
>
> C++ compiler version: g++ (GCC) 10.2.0
> Using C++ compiler to compile PETSc
> -----------------------------------------
> Using C/C++ linker:
> /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/bin/mpicxx
> Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing
> -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector
> -fvisibility=hidden -g -O0
> -----------------------------------------
> Using system modules:
> shared:slurm/20.02.6:DefaultModules:openmpi/gcc/64/4.1.1_cuda_11.0.3_aware:gdal/3.3.0:cmake/3.22.1:cuda11.7/toolkit/11.7.1:openblas/dynamic/0.3.7:gcc/10.2.0
> Using mpi.h: # 1
> "/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/include/mpi.h" 1
> -----------------------------------------
> Using libraries: -Wl,-rpath,/path/to/petsc/arch-linux-cxx-debug/lib
> -L/path/to/petsc/arch-linux-cxx-debug/lib -lpetsc -lopenblas -lm -lX11
> -lquadmath -lstdc++ -ldl
> ------------------------------------------
> Using mpiexec: mpiexec -mca orte_base_help_aggregate 0  -mca pml ucx --mca
> btl '^openib'
> ------------------------------------------
> Using MAKE: /path/to/petsc/arch-linux-cxx-debug/bin/make
> Using MAKEFLAGS: -j24 -l48.0  --no-print-directory -- MPIEXEC=mpiexec\
> -mca\ orte_base_help_aggregate\ 0\ \ -mca\ pml\ ucx\ --mca\ btl\ '^openib'
> PETSC_ARCH=arch-linux-cxx-debug PETSC_DIR=/path/to/petsc
> ==========================================
> make[3]: Nothing to be done for 'libs'.
> =========================================
> Now to check if the libraries are working do:
> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-cxx-debug check
> =========================================
> [me at xxx petsc]$ make PETSC_DIR=/path/to/petsc
> PETSC_ARCH=arch-linux-cxx-debug MPIEXEC="mpiexec -mca
> orte_base_help_aggregate 0  -mca pml ucx --mca btl '^openib'" check
> Running check examples to verify correct installation
> Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-cxx-debug
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
>
> ./bandwidthTest
> [CUDA Bandwidth Test] - Starting...
> Running on...
>
>  Device 0: Quadro RTX 8000
>  Quick Mode
>
>  Host to Device Bandwidth, 1 Device(s)
>  PINNED Memory Transfers
>    Transfer Size (Bytes) Bandwidth(GB/s)
>    32000000 12.3
>
>  Device to Host Bandwidth, 1 Device(s)
>  PINNED Memory Transfers
>    Transfer Size (Bytes) Bandwidth(GB/s)
>    32000000 13.2
>
>  Device to Device Bandwidth, 1 Device(s)
>  PINNED Memory Transfers
>    Transfer Size (Bytes) Bandwidth(GB/s)
>    32000000 466.2
>
> Result = PASS
>
> On Sat, Oct 8, 2022 at 7:56 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>   True, but when users send reports back to us they will never have used
>> the VERBOSE=1 option, so it requires one more round trip of email to get
>> this additional information.
>>
>> > On Oct 8, 2022, at 6:48 PM, Jed Brown <jed at jedbrown.org> wrote:
>> >
>> > Barry Smith <bsmith at petsc.dev> writes:
>> >
>> >>   I hate these kinds of make rules that hide what the compiler is
>> doing (in the name of having less output, I guess) it makes it difficult to
>> figure out what is going wrong.
>> >
>> > You can make VERBOSE=1 with CMake-generated makefiles.
>>
>
>
>> Anyways, either some of the MPI libraries are missing from the link line
>> or they are in the wrong order and thus it is not able to search them
>> properly. Here is a bunch of discussions on why that error message can
>> appear
>> https://stackoverflow.com/questions/19901934/libpthread-so-0-error-adding-symbols-dso-missing-from-command-line
>>
>
>
> Still same but more noise and I have been using the suggestion of
> LDFLAGS="-Wl,--copy-dt-needed-entries" along with make:
> make[2]: Entering directory '/path/to/WTM/build'
> cd /path/to/WTM/build &&
> /path/to/cmake/cmake-3.22.1-linux-x86_64/bin/cmake -E cmake_depends "Unix
> Makefiles" /path/to/WTM /path/to/WTM /path/to/WTM/build /path/to/WTM/build
> /path/to/WTM/build/CMakeFiles/wtm.x.dir/DependInfo.cmake --color=
> make[2]: Leaving directory '/path/to/WTM/build'
> make  -f CMakeFiles/wtm.x.dir/build.make CMakeFiles/wtm.x.dir/build
> make[2]: Entering directory '/path/to/WTM/build'
> [ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o
> /cm/local/apps/gcc/10.2.0/bin/c++  -I/path/to/WTM/common/richdem/include
> -I/path/to/gdal-3.3.0/include -I/path/to/WTM/common/fmt/include -isystem
> /path/to/petsc/arch-linux-cxx-debug/include -isystem /path/to/petsc/include
> -isystem -O3 -g -Wall -Wextra -pedantic -Wshadow -Wfloat-conversion -Wall
> -Wextra -pedantic -Wshadow -DRICHDEM_GIT_HASH=\"xxx\"
> -DRICHDEM_COMPILE_TIME=\"2022-10-09T02:21:11Z\" -DUSEGDAL -Xpreprocessor
> -fopenmp
> /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40.30.1
> -I/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/include -std=gnu++2a -MD
> -MT CMakeFiles/wtm.x.dir/src/WTM.cpp.o -MF
> CMakeFiles/wtm.x.dir/src/WTM.cpp.o.d -o CMakeFiles/wtm.x.dir/src/WTM.cpp.o
> -c /path/to/WTM/src/WTM.cpp
> c++: warning:
> /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40.30.1:
> linker input file unused because linking not done
> [ 70%] Linking CXX executable wtm.x
> /path/to/cmake/cmake-3.22.1-linux-x86_64/bin/cmake -E cmake_link_script
> CMakeFiles/wtm.x.dir/link.txt --verbose=1
> /cm/local/apps/gcc/10.2.0/bin/c++ -isystem -O3 -g -Wall -Wextra -pedantic
> -Wshadow CMakeFiles/wtm.x.dir/src/WTM.cpp.o -o wtm.x
>  -Wl,-rpath,/path/to/WTM/build/common/richdem:/path/to/gdal-3.3.0/lib:/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib:/path/to/petsc/arch-linux-cxx-debug/lib
> libwtm.a common/richdem/librichdem.so /path/to/gdal-3.3.0/lib/libgdal.so
> /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libompitrace.so.40.30.0
> common/fmt/libfmt.a /path/to/petsc/arch-linux-cxx-debug/lib/libpetsc.so
> /usr/bin/ld: CMakeFiles/wtm.x.dir/src/WTM.cpp.o: undefined reference to
> symbol 'ompi_mpi_comm_self'
> /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40: error
> adding symbols: DSO missing from command line
> collect2: error: ld returned 1 exit status
> make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
> make[2]: Leaving directory '/path/to/WTM/build'
> make[1]: *** [CMakeFiles/Makefile2:225: CMakeFiles/wtm.x.dir/all] Error 2
> make[1]: Leaving directory '/path/to/WTM/build'
> make: *** [Makefile:136: all] Error 2
>
> Anything stick out?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221009/16dacf27/attachment.html>


More information about the petsc-users mailing list