[petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

Rob Kudyba rk3199 at columbia.edu
Sat Oct 8 21:31:48 CDT 2022


>
> Perhaps we can back one step:
> Use your mpicc to build a "hello world" mpi test, then run it on a compute
> node (with GPU) to see if it works.
> If no, then your MPI environment has problems;
> If yes, then use it to build petsc (turn on petsc's gpu support,
>  --with-cuda  --with-cudac=nvcc), and then your code.
> --Junchao Zhang

OK tried this just to eliminate that the CUDA-capable OpenMPI is a factor:
./configure --with-debugging=0 --with-cmake=true   --with-mpi=true
 --with-mpi-dir=/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support --with-fc=0
  --with-cuda=1
[..]
cuda:
  Version:    11.7
  Includes:   -I/path/to/cuda11.7/toolkit/11.7.1/include
  Libraries:  -Wl,-rpath,/path/to/cuda11.7/toolkit/11.7.1/lib64
-L/cm/shared/apps/cuda11.7/toolkit/11.7.1/lib64
-L/path/to/cuda11.7/toolkit/11.7.1/lib64/stubs -lcudart -lnvToolsExt
-lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda
  CUDA SM 75
  CUDA underlying compiler:
CUDA_CXX="/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/bin"/mpicxx
  CUDA underlying compiler flags: CUDA_CXXFLAGS=
  CUDA underlying linker libraries: CUDA_CXXLIBS=
[...]
 Configure stage complete. Now build PETSc libraries with:
   make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-opt all

C++ compiler version: g++ (GCC) 10.2.0
Using C++ compiler to compile PETSc
-----------------------------------------
Using C/C++ linker:
/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/bin/mpicxx
Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector
-fvisibility=hidden -g -O0
-----------------------------------------
Using system modules:
shared:slurm/20.02.6:DefaultModules:openmpi/gcc/64/4.1.1_cuda_11.0.3_aware:gdal/3.3.0:cmake/3.22.1:cuda11.7/toolkit/11.7.1:openblas/dynamic/0.3.7:gcc/10.2.0
Using mpi.h: # 1
"/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/include/mpi.h" 1
-----------------------------------------
Using libraries: -Wl,-rpath,/path/to/petsc/arch-linux-cxx-debug/lib
-L/path/to/petsc/arch-linux-cxx-debug/lib -lpetsc -lopenblas -lm -lX11
-lquadmath -lstdc++ -ldl
------------------------------------------
Using mpiexec: mpiexec -mca orte_base_help_aggregate 0  -mca pml ucx --mca
btl '^openib'
------------------------------------------
Using MAKE: /path/to/petsc/arch-linux-cxx-debug/bin/make
Using MAKEFLAGS: -j24 -l48.0  --no-print-directory -- MPIEXEC=mpiexec\
-mca\ orte_base_help_aggregate\ 0\ \ -mca\ pml\ ucx\ --mca\ btl\ '^openib'
PETSC_ARCH=arch-linux-cxx-debug PETSC_DIR=/path/to/petsc
==========================================
make[3]: Nothing to be done for 'libs'.
=========================================
Now to check if the libraries are working do:
make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-cxx-debug check
=========================================
[me at xxx petsc]$ make PETSC_DIR=/path/to/petsc
PETSC_ARCH=arch-linux-cxx-debug MPIEXEC="mpiexec -mca
orte_base_help_aggregate 0  -mca pml ucx --mca btl '^openib'" check
Running check examples to verify correct installation
Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-cxx-debug
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes

./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Quadro RTX 8000
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(GB/s)
   32000000 12.3

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(GB/s)
   32000000 13.2

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(GB/s)
   32000000 466.2

Result = PASS

On Sat, Oct 8, 2022 at 7:56 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>   True, but when users send reports back to us they will never have used
> the VERBOSE=1 option, so it requires one more round trip of email to get
> this additional information.
>
> > On Oct 8, 2022, at 6:48 PM, Jed Brown <jed at jedbrown.org> wrote:
> >
> > Barry Smith <bsmith at petsc.dev> writes:
> >
> >>   I hate these kinds of make rules that hide what the compiler is doing
> (in the name of having less output, I guess) it makes it difficult to
> figure out what is going wrong.
> >
> > You can make VERBOSE=1 with CMake-generated makefiles.
>


> Anyways, either some of the MPI libraries are missing from the link line
> or they are in the wrong order and thus it is not able to search them
> properly. Here is a bunch of discussions on why that error message can
> appear
> https://stackoverflow.com/questions/19901934/libpthread-so-0-error-adding-symbols-dso-missing-from-command-line
>


Still same but more noise and I have been using the suggestion of
LDFLAGS="-Wl,--copy-dt-needed-entries" along with make:
make[2]: Entering directory '/path/to/WTM/build'
cd /path/to/WTM/build && /path/to/cmake/cmake-3.22.1-linux-x86_64/bin/cmake
-E cmake_depends "Unix Makefiles" /path/to/WTM /path/to/WTM
/path/to/WTM/build /path/to/WTM/build
/path/to/WTM/build/CMakeFiles/wtm.x.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/path/to/WTM/build'
make  -f CMakeFiles/wtm.x.dir/build.make CMakeFiles/wtm.x.dir/build
make[2]: Entering directory '/path/to/WTM/build'
[ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o
/cm/local/apps/gcc/10.2.0/bin/c++  -I/path/to/WTM/common/richdem/include
-I/path/to/gdal-3.3.0/include -I/path/to/WTM/common/fmt/include -isystem
/path/to/petsc/arch-linux-cxx-debug/include -isystem /path/to/petsc/include
-isystem -O3 -g -Wall -Wextra -pedantic -Wshadow -Wfloat-conversion -Wall
-Wextra -pedantic -Wshadow -DRICHDEM_GIT_HASH=\"xxx\"
-DRICHDEM_COMPILE_TIME=\"2022-10-09T02:21:11Z\" -DUSEGDAL -Xpreprocessor
-fopenmp
/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40.30.1
-I/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/include -std=gnu++2a -MD
-MT CMakeFiles/wtm.x.dir/src/WTM.cpp.o -MF
CMakeFiles/wtm.x.dir/src/WTM.cpp.o.d -o CMakeFiles/wtm.x.dir/src/WTM.cpp.o
-c /path/to/WTM/src/WTM.cpp
c++: warning:
/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40.30.1:
linker input file unused because linking not done
[ 70%] Linking CXX executable wtm.x
/path/to/cmake/cmake-3.22.1-linux-x86_64/bin/cmake -E cmake_link_script
CMakeFiles/wtm.x.dir/link.txt --verbose=1
/cm/local/apps/gcc/10.2.0/bin/c++ -isystem -O3 -g -Wall -Wextra -pedantic
-Wshadow CMakeFiles/wtm.x.dir/src/WTM.cpp.o -o wtm.x
 -Wl,-rpath,/path/to/WTM/build/common/richdem:/path/to/gdal-3.3.0/lib:/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib:/path/to/petsc/arch-linux-cxx-debug/lib
libwtm.a common/richdem/librichdem.so /path/to/gdal-3.3.0/lib/libgdal.so
/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libompitrace.so.40.30.0
common/fmt/libfmt.a /path/to/petsc/arch-linux-cxx-debug/lib/libpetsc.so
/usr/bin/ld: CMakeFiles/wtm.x.dir/src/WTM.cpp.o: undefined reference to
symbol 'ompi_mpi_comm_self'
/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40: error
adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
make[2]: Leaving directory '/path/to/WTM/build'
make[1]: *** [CMakeFiles/Makefile2:225: CMakeFiles/wtm.x.dir/all] Error 2
make[1]: Leaving directory '/path/to/WTM/build'
make: *** [Makefile:136: all] Error 2

Anything stick out?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221008/6be5c81e/attachment-0001.html>


More information about the petsc-users mailing list