[petsc-users] Configure error while building PETSc with CUDA/MVAPICH2-GDR

Wed Apr 17 07:58:12 CDT 2024

On Wed, Apr 17, 2024 at 7:51 AM Sreeram R Venkat <srvenkat at utexas.edu>
wrote:

> Do you know if there are plans for NCCL support in PETSc?
>
What is your need?  Do you mean using NCCL for the MPI communication?

>
> On Tue, Apr 16, 2024, 10:41 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> Glad to hear you found a way.   Did you use Frontera at TACC?  If yes, I
>> could have a try.
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Apr 16, 2024 at 8:35 PM Sreeram R Venkat <srvenkat at utexas.edu>
>> wrote:
>>
>>> I finally figured out a way to make it work. I had to build PETSc and my
>>> application using the (non GPU-aware) Intel MPI. Then, before running, I
>>> switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only
>>> way I've
>>> ZjQcmQRYFpfptBannerStart
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>>
>>> ZjQcmQRYFpfptBannerEnd
>>> I finally figured out a way to make it work. I had to build PETSc and my
>>> application using the (non GPU-aware) Intel MPI. Then, before running, I
>>> switch to the MVAPICH2-GDR.
>>> I'm not sure why that works, but it's the only way I've found to compile
>>> and run successfully without throwing any errors about not having a
>>> GPU-aware MPI.
>>>
>>>
>>>
>>> On Fri, Dec 8, 2023 at 5:30 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> You may need to set some env variables. This can be system specific so
>>>> you might want to look at docs or ask TACC how to run with GPU-aware MPI.
>>>>
>>>> Mark
>>>>
>>>> On Fri, Dec 8, 2023 at 5:17 PM Sreeram R Venkat <srvenkat at utexas.edu>
>>>> wrote:
>>>>
>>>>> Actually, when I compile my program with this build of PETSc and run,
>>>>> I still get the error:
>>>>>
>>>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is not
>>>>> GPU-aware. For better performance, please use a GPU-aware MPI.
>>>>>
>>>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1.
>>>>>
>>>>> Is there anything else I need to do?
>>>>>
>>>>> Thanks,
>>>>> Sreeram
>>>>>
>>>>> On Fri, Dec 8, 2023 at 3:29 PM Sreeram R Venkat <srvenkat at utexas.edu>
>>>>> wrote:
>>>>>
>>>>>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr
>>>>>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0
>>>>>>
>>>>>> On Fri, Dec 8, 2023 at 1:15 PM Satish Balay <balay at mcs.anl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>> Executing: mpicc -show
>>>>>>> stdout: icc -I/opt/apps/cuda/11.4/include
>>>>>>> -I/opt/apps/cuda/11.4/include -lcuda -L/opt/apps/cuda/11.4/lib64/stubs
>>>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt
>>>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder
>>>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm
>>>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include
>>>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath
>>>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags -lmpi
>>>>>>>
>>>>>>>     Checking for program /opt/apps/cuda/12.0/bin/nvcc...found
>>>>>>>
>>>>>>> Looks like you are trying to mix in 2 different cuda versions in
>>>>>>> this build.
>>>>>>>
>>>>>>> Perhaps you need to use cuda-11.4 - with this install of mvapich..
>>>>>>>
>>>>>>> Satish
>>>>>>>
>>>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote:
>>>>>>>
>>>>>>> > On Fri, Dec 8, 2023 at 1:54 PM Sreeram R Venkat <
>>>>>>> srvenkat at utexas.edu> wrote:
>>>>>>> >
>>>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware
>>>>>>> MVAPICH2-GDR.
>>>>>>> > >
>>>>>>> > > Here is my configure command:
>>>>>>> > >
>>>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr
>>>>>>> --download-hypre
>>>>>>> > >  --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true
>>>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental
>>>>>>> --download-metis
>>>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx
>>>>>>> --with-fc=mpif90
>>>>>>> > >
>>>>>>> > > which errors with:
>>>>>>> > >
>>>>>>> > >           UNABLE to CONFIGURE with GIVEN OPTIONS (see
>>>>>>> configure.log for
>>>>>>> > > details):
>>>>>>> > >
>>>>>>> > >
>>>>>>> ---------------------------------------------------------------------------------------------
>>>>>>> > >   CUDA compile failed with arch flags " -ccbin mpic++ -std=c++14
>>>>>>> > > -Xcompiler -fPIC
>>>>>>> > >   -Xcompiler -fvisibility=hidden -g -lineinfo -gencode
>>>>>>> > > arch=compute_80,code=sm_80"
>>>>>>> > >   generated from "--with-cuda-arch=80"
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > The same configure command works when I use the Intel MPI and I
>>>>>>> can build
>>>>>>> > > with CUDA. The full config.log file is attached. Please let me
>>>>>>> know if you
>>>>>>> > > need any other information. I appreciate your help with this.
>>>>>>> > >
>>>>>>> >
>>>>>>> > The proximate error is
>>>>>>> >
>>>>>>> > Executing: nvcc -c -o
>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o
>>>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers
>>>>>>> > -I/tmp/petsc-kn3f29gl/config.types
>>>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda  -ccbin mpic++
>>>>>>> -std=c++14
>>>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo
>>>>>>> -gencode
>>>>>>> > arch=compute_80,code=sm_80
>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/
>>>>>>> > conftest.cu
>>>>>>> <https://urldefense.us/v3/__http://conftest.cu__;!!G_uCfscf7eWS!duKUz7pE9N0adJ-FOW7PLZ_1cSZvYlnqh7J0TIcZN0v8RLplcWxh1YE8Vis29K0cuw_zAvjdK-H9H2JYYuUUKRXxlA$>
>>>>>>> > stdout:
>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more
>>>>>>> than one
>>>>>>> > instance of overloaded function
>>>>>>> "__nv_associate_access_property_impl" has
>>>>>>> > "C" linkage
>>>>>>> > 1 error detected in the compilation of
>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu
>>>>>>> <https://urldefense.us/v3/__http://conftest.cu__;!!G_uCfscf7eWS!duKUz7pE9N0adJ-FOW7PLZ_1cSZvYlnqh7J0TIcZN0v8RLplcWxh1YE8Vis29K0cuw_zAvjdK-H9H2JYYuUUKRXxlA$>
>>>>>>> ".
>>>>>>> > Possible ERROR while running compiler: exit code 1
>>>>>>> > stderr:
>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more
>>>>>>> than one
>>>>>>> > instance of overloaded function
>>>>>>> "__nv_associate_access_property_impl" has
>>>>>>> > "C" linkage
>>>>>>> >
>>>>>>> > 1 error detected in the compilation of
>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda
>>>>>>> >
>>>>>>> > This looks like screwed up headers to me, but I will let someone
>>>>>>> that
>>>>>>> > understands CUDA compilation reply.
>>>>>>> >
>>>>>>> >   Thanks,
>>>>>>> >
>>>>>>> >      Matt
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > > Sreeram
>>>>>>> > >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240417/cf550bb0/attachment-0001.html>