[petsc-users] cuda gpu eager initialization error cudaErrorNotSupported
Mark Lohry
mlohry at gmail.com
Fri Jan 6 08:17:35 CST 2023
It built+ran fine on a different system with an sm75 arch. Is there a
documented minimum version if that indeed is the cause?
One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12,
due to cusprase removing csrsv2Info_t (although it's still referenced in
their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8
worked.
On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:
> Jacob, is it because the cuda arch is too old?
>
> --Junchao Zhang
>
>
> On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry <mlohry at gmail.com> wrote:
>
>> I'm seeing the same thing on latest main with a different machine and
>> -sm52 card, cuda 11.8. make check fails with the below, where the indicated
>> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool,
>> static_cast<int>(device->deviceId))); in the initialize function.
>>
>>
>> Running check examples to verify correct installation
>> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI
>> processes
>> 2,17c2,46
>> < 0 SNES Function norm 2.391552133017e-01
>> < 0 KSP Residual norm 2.928487269734e-01
>> < 1 KSP Residual norm 1.876489580142e-02
>> < 2 KSP Residual norm 3.291394847944e-03
>> < 3 KSP Residual norm 2.456493072124e-04
>> < 4 KSP Residual norm 1.161647147715e-05
>> < 5 KSP Residual norm 1.285648407621e-06
>> < 1 SNES Function norm 6.846805706142e-05
>> < 0 KSP Residual norm 2.292783790384e-05
>> < 1 KSP Residual norm 2.100673631699e-06
>> < 2 KSP Residual norm 2.121341386147e-07
>> < 3 KSP Residual norm 2.455932678957e-08
>> < 4 KSP Residual norm 1.753095730744e-09
>> < 5 KSP Residual norm 7.489214418904e-11
>> < 2 SNES Function norm 2.103908447865e-10
>> < Number of SNES iterations = 2
>> ---
>> > [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> > [0]PETSC ERROR: GPU error
>> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
>> supported
>> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used!
>> Could be the program crashed before they were used or a spelling mistake,
>> etc!
>> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3
>> source: command line
>> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
>> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source:
>> environment
>> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
>> source: command line
>> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>> shooting.
>> > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb
>> GIT Date: 2023-01-05 17:22:48 +0000
>> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry
>> Thu Jan 5 17:25:17 2023
>> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
>> > [0]PETSC ERROR: #1 initialize() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
>> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/
>> cupmcontext.cu:10
>> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
>> > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
>> at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
>> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
>> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
>> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
>> > [0]PETSC ERROR: #7 GetHandleDispatch_() at
>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
>> > [0]PETSC ERROR: #8 create() at
>> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
>> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at
>> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
>> > [0]PETSC ERROR: #10 VecSetType() at
>> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
>> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at
>> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
>> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at
>> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023
>> > [0]PETSC ERROR: #13 main() at ex19.c:149
>>
>>
>> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry <mlohry at gmail.com> wrote:
>>
>>> I'm trying to compile the cuda example
>>>
>>> ./config/examples/arch-ci-linux-cuda-double-64idx.py
>>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>>>
>>> and running make test passes the test ok
>>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
>>> but the eager variant fails, pasted below.
>>>
>>> I get a similar error running my client code, pasted after. There when
>>> running with -info, it seems that some lazy initialization happens first,
>>> and i also call VecCreateSeqCuda which seems to have no issue.
>>>
>>> Any idea? This happens to be with an -sm 3.5 device if it matters,
>>> otherwise it's a recent cuda compiler+driver.
>>>
>>>
>>> petsc test code output:
>>>
>>>
>>>
>>> not ok
>>> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager #
>>> Error code: 97
>>> # [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> # [0]PETSC ERROR: GPU error
>>> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
>>> supported
>>> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>>> shooting.
>>> # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
>>> # [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5
>>> 15:22:33 2023
>>> # [0]PETSC ERROR: Configure options
>>> --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2
>>> --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g
>>> -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1
>>> --with-cuda=1 --with-precision=double --with-clanguage=c
>>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>>> PETSC_ARCH=arch-ci-linux-cuda-double-64idx
>>> # [0]PETSC ERROR: #1 CUPMAwareMPI_() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194
>>> # [0]PETSC ERROR: #2 initialize() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71
>>> # [0]PETSC ERROR: #3 init_device_id_() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290
>>> # [0]PETSC ERROR: #4 getDevice() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99
>>> # [0]PETSC ERROR: #5 PetscDeviceCreate() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104
>>> # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375
>>> # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499
>>> # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634
>>> # [0]PETSC ERROR: #9 PetscInitialize_Common() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001
>>> # [0]PETSC ERROR: #10 PetscInitialize() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267
>>> # [0]PETSC ERROR: #11 main() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12
>>> # [0]PETSC ERROR: PETSc Option Table entries:
>>> # [0]PETSC ERROR: -default_device_type host
>>> # [0]PETSC ERROR: -device_enable eager
>>> # [0]PETSC ERROR: ----------------End of Error Message -------send
>>> entire error message to petsc-maint at mcs.anl.gov----------
>>>
>>>
>>>
>>>
>>>
>>> solver code output:
>>>
>>>
>>>
>>> [0] <sys> PetscDetermineInitialFPTrap(): Floating point trapping is off
>>> by default 0
>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private():
>>> PetscDeviceType host available, initializing
>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice
>>> host initialized, default device id 0, view FALSE, init type lazy
>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private():
>>> PetscDeviceType cuda available, initializing
>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice
>>> cuda initialized, default device id 0, view FALSE, init type lazy
>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private():
>>> PetscDeviceType hip not available
>>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private():
>>> PetscDeviceType sycl not available
>>> [0] <sys> PetscInitialize_Common(): PETSc successfully started: number
>>> of processors = 1
>>> [0] <sys> PetscGetHostName(): Rejecting domainname, likely is NIS
>>> lancer.(none)
>>> [0] <sys> PetscInitialize_Common(): Running on machine: lancer
>>> # [Info] Petsc initialization complete.
>>> # [Trace] Timing: Starting solver...
>>> # [Info] RNG initial conditions have mean 0.000004, renormalizing.
>>> # [Trace] Timing: PetscTimeIntegrator initialization...
>>> # [Trace] Timing: Allocating Petsc CUDA arrays...
>>> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 2 3 max tags
>>> = 100000000
>>> [0] <sys> configure(): Configured device 0
>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3
>>> # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439
>>> seconds.
>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3
>>> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 1 4 max tags
>>> = 100000000
>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>>> [0] <dm> DMGetDMTS(): Creating new DMTS
>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>>> [0] <dm> DMGetDMSNES(): Creating new DMSNES
>>> [0] <dm> DMGetDMSNESWrite(): Copying DMSNES due to write
>>> # [Info] Initializing petsc with ode23 integrator
>>> # [Trace] Timing: PetscTimeIntegrator initialization finished in
>>> 0.016754 seconds.
>>>
>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>>> [0] <device> PetscDeviceContextSetupGlobalContext_Private():
>>> Initializing global PetscDeviceContext with device type cuda
>>> [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [0]PETSC ERROR: GPU error
>>> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
>>> supported
>>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>>> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
>>> [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu
>>> Jan 5 15:39:14 2023
>>> [0]PETSC ERROR: Configure options
>>> PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc
>>> PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++
>>> --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS
>>> COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0
>>> --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>>> --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/
>>> --download-hwloc=1
>>> [0]PETSC ERROR: #1 initialize() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255
>>> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/
>>> cupmcontext.cu:10
>>> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244
>>> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
>>> at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259
>>> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
>>> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
>>> [0]PETSC ERROR: #7
>>> PetscDeviceContextGetCurrentContextAssertType_Internal() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371
>>> [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/
>>> cupmcontext.cu:23
>>> [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/
>>> veccuda2.cu:261
>>> [0]PETSC ERROR: #10 VecMAXPY() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221
>>> [0]PETSC ERROR: #11 TSStep_RK() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814
>>> [0]PETSC ERROR: #12 TSStep() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424
>>> [0]PETSC ERROR: #13 TSSolve() at
>>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230106/2c1d1bae/attachment-0001.html>
More information about the petsc-users
mailing list