[petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

Junchao Zhang junchao.zhang at gmail.com
Thu Jan 5 17:37:42 CST 2023


Jacob, is it because the cuda arch is too old?

--Junchao Zhang


On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry <mlohry at gmail.com> wrote:

> I'm seeing the same thing on latest main with a different machine and
> -sm52 card, cuda 11.8. make check fails with the below, where the indicated
> line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool,
> static_cast<int>(device->deviceId)));   in the initialize function.
>
>
> Running check examples to verify correct installation
> Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> 2,17c2,46
> <   0 SNES Function norm 2.391552133017e-01
> <     0 KSP Residual norm 2.928487269734e-01
> <     1 KSP Residual norm 1.876489580142e-02
> <     2 KSP Residual norm 3.291394847944e-03
> <     3 KSP Residual norm 2.456493072124e-04
> <     4 KSP Residual norm 1.161647147715e-05
> <     5 KSP Residual norm 1.285648407621e-06
> <   1 SNES Function norm 6.846805706142e-05
> <     0 KSP Residual norm 2.292783790384e-05
> <     1 KSP Residual norm 2.100673631699e-06
> <     2 KSP Residual norm 2.121341386147e-07
> <     3 KSP Residual norm 2.455932678957e-08
> <     4 KSP Residual norm 1.753095730744e-09
> <     5 KSP Residual norm 7.489214418904e-11
> <   2 SNES Function norm 2.103908447865e-10
> < Number of SNES iterations = 2
> ---
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [0]PETSC ERROR: GPU error
> > [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
> supported
> > [0]PETSC ERROR: WARNING! There are option(s) set that were not used!
> Could be the program crashed before they were used or a spelling mistake,
> etc!
> > [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source:
> command line
> > [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
> > [0]PETSC ERROR: Option left: name:-nox_warning (no value) source:
> environment
> > [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
> source: command line
> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> > [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb
>  GIT Date: 2023-01-05 17:22:48 +0000
> > [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry
> Thu Jan  5 17:25:17 2023
> > [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
> > [0]PETSC ERROR: #1 initialize() at
> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
> > [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
> /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/
> cupmcontext.cu:10
> > [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
> /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
> > [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
> at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
> > [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
> > [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
> /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
> > [0]PETSC ERROR: #7 GetHandleDispatch_() at
> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
> > [0]PETSC ERROR: #8 create() at
> /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
> > [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at
> /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
> > [0]PETSC ERROR: #10 VecSetType() at
> /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
> > [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at
> /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
> > [0]PETSC ERROR: #12 DMCreateGlobalVector() at
> /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023
> > [0]PETSC ERROR: #13 main() at ex19.c:149
>
>
> On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry <mlohry at gmail.com> wrote:
>
>> I'm trying to compile the cuda example
>>
>> ./config/examples/arch-ci-linux-cuda-double-64idx.py
>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>>
>> and running make test passes the test ok
>> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
>> but the eager variant fails, pasted below.
>>
>> I get a similar error running my client code, pasted after. There when
>> running with -info, it seems that some lazy initialization happens first,
>> and i also call VecCreateSeqCuda which seems to have no issue.
>>
>> Any idea? This happens to be with an -sm 3.5 device if it matters,
>> otherwise it's a recent cuda compiler+driver.
>>
>>
>> petsc test code output:
>>
>>
>>
>> not ok
>> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager #
>> Error code: 97
>> # [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> # [0]PETSC ERROR: GPU error
>> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
>> supported
>> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>> shooting.
>> # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
>> # [0]PETSC ERROR: ../ex1 on a  named lancer by mlohry Thu Jan  5 15:22:33
>> 2023
>> # [0]PETSC ERROR: Configure options
>> --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2
>> --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g
>> -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1
>> --with-cuda=1 --with-precision=double --with-clanguage=c
>> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>> PETSC_ARCH=arch-ci-linux-cuda-double-64idx
>> # [0]PETSC ERROR: #1 CUPMAwareMPI_() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194
>> # [0]PETSC ERROR: #2 initialize() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71
>> # [0]PETSC ERROR: #3 init_device_id_() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290
>> # [0]PETSC ERROR: #4 getDevice() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99
>> # [0]PETSC ERROR: #5 PetscDeviceCreate() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104
>> # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375
>> # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499
>> # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634
>> # [0]PETSC ERROR: #9 PetscInitialize_Common() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001
>> # [0]PETSC ERROR: #10 PetscInitialize() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267
>> # [0]PETSC ERROR: #11 main() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12
>> # [0]PETSC ERROR: PETSc Option Table entries:
>> # [0]PETSC ERROR: -default_device_type host
>> # [0]PETSC ERROR: -device_enable eager
>> # [0]PETSC ERROR: ----------------End of Error Message -------send entire
>> error message to petsc-maint at mcs.anl.gov----------
>>
>>
>>
>>
>>
>> solver code output:
>>
>>
>>
>> [0] <sys> PetscDetermineInitialFPTrap(): Floating point trapping is off
>> by default 0
>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
>> host available, initializing
>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice
>> host initialized, default device id 0, view FALSE, init type lazy
>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
>> cuda available, initializing
>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice
>> cuda initialized, default device id 0, view FALSE, init type lazy
>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
>> hip not available
>> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
>> sycl not available
>> [0] <sys> PetscInitialize_Common(): PETSc successfully started: number of
>> processors = 1
>> [0] <sys> PetscGetHostName(): Rejecting domainname, likely is NIS
>> lancer.(none)
>> [0] <sys> PetscInitialize_Common(): Running on machine: lancer
>> # [Info] Petsc initialization complete.
>> # [Trace] Timing: Starting solver...
>> # [Info] RNG initial conditions have mean 0.000004, renormalizing.
>> # [Trace] Timing: PetscTimeIntegrator initialization...
>> # [Trace] Timing: Allocating Petsc CUDA arrays...
>> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 2 3 max tags =
>> 100000000
>> [0] <sys> configure(): Configured device 0
>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3
>> # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439
>> seconds.
>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3
>> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 1 4 max tags =
>> 100000000
>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>> [0] <dm> DMGetDMTS(): Creating new DMTS
>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>> [0] <dm> DMGetDMSNES(): Creating new DMSNES
>> [0] <dm> DMGetDMSNESWrite(): Copying DMSNES due to write
>> # [Info] Initializing petsc with ode23 integrator
>> # [Trace] Timing: PetscTimeIntegrator initialization finished in 0.016754
>> seconds.
>>
>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
>> [0] <device> PetscDeviceContextSetupGlobalContext_Private(): Initializing
>> global PetscDeviceContext with device type cuda
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: GPU error
>> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
>> supported
>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
>> [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu
>> Jan  5 15:39:14 2023
>> [0]PETSC ERROR: Configure options
>> PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc
>> PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++
>> --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS
>> COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0
>> --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>> --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/
>> --download-hwloc=1
>> [0]PETSC ERROR: #1 initialize() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255
>> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/
>> cupmcontext.cu:10
>> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244
>> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
>> at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259
>> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
>> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
>> [0]PETSC ERROR: #7
>> PetscDeviceContextGetCurrentContextAssertType_Internal() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371
>> [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/
>> cupmcontext.cu:23
>> [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/
>> veccuda2.cu:261
>> [0]PETSC ERROR: #10 VecMAXPY() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221
>> [0]PETSC ERROR: #11 TSStep_RK() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814
>> [0]PETSC ERROR: #12 TSStep() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424
>> [0]PETSC ERROR: #13 TSSolve() at
>> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230105/e51eec91/attachment-0001.html>


More information about the petsc-users mailing list