[petsc-users] cuda gpu eager initialization error cudaErrorNotSupported

Mark Lohry mlohry at gmail.com
Thu Jan 5 16:30:24 CST 2023


I'm seeing the same thing on latest main with a different machine and -sm52
card, cuda 11.8. make check fails with the below, where the indicated line
249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool,
static_cast<int>(device->deviceId)));   in the initialize function.


Running check examples to verify correct installation
Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
2,17c2,46
<   0 SNES Function norm 2.391552133017e-01
<     0 KSP Residual norm 2.928487269734e-01
<     1 KSP Residual norm 1.876489580142e-02
<     2 KSP Residual norm 3.291394847944e-03
<     3 KSP Residual norm 2.456493072124e-04
<     4 KSP Residual norm 1.161647147715e-05
<     5 KSP Residual norm 1.285648407621e-06
<   1 SNES Function norm 6.846805706142e-05
<     0 KSP Residual norm 2.292783790384e-05
<     1 KSP Residual norm 2.100673631699e-06
<     2 KSP Residual norm 2.121341386147e-07
<     3 KSP Residual norm 2.455932678957e-08
<     4 KSP Residual norm 1.753095730744e-09
<     5 KSP Residual norm 7.489214418904e-11
<   2 SNES Function norm 2.103908447865e-10
< Number of SNES iterations = 2
---
> [0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
supported
> [0]PETSC ERROR: WARNING! There are option(s) set that were not used!
Could be the program crashed before they were used or a spelling mistake,
etc!
> [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source:
command line
> [0]PETSC ERROR: Option left: name:-nox (no value) source: environment
> [0]PETSC ERROR: Option left: name:-nox_warning (no value) source:
environment
> [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10
source: command line
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb
 GIT Date: 2023-01-05 17:22:48 +0000
> [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry Thu
Jan  5 17:25:17 2023
> [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1
> [0]PETSC ERROR: #1 initialize() at
/home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249
> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
/home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/
cupmcontext.cu:10
> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
/home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247
> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal()
at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260
> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
/home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
/home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
> [0]PETSC ERROR: #7 GetHandleDispatch_() at
/home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499
> [0]PETSC ERROR: #8 create() at
/home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069
> [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at
/home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.cu:10
> [0]PETSC ERROR: #10 VecSetType() at
/home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89
> [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at
/home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31
> [0]PETSC ERROR: #12 DMCreateGlobalVector() at
/home/mlohry/dev/petsc/src/dm/interface/dm.c:1023
> [0]PETSC ERROR: #13 main() at ex19.c:149


On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry <mlohry at gmail.com> wrote:

> I'm trying to compile the cuda example
>
> ./config/examples/arch-ci-linux-cuda-double-64idx.py
> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
>
> and running make test passes the test ok
> diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy
> but the eager variant fails, pasted below.
>
> I get a similar error running my client code, pasted after. There when
> running with -info, it seems that some lazy initialization happens first,
> and i also call VecCreateSeqCuda which seems to have no issue.
>
> Any idea? This happens to be with an -sm 3.5 device if it matters,
> otherwise it's a recent cuda compiler+driver.
>
>
> petsc test code output:
>
>
>
> not ok
> sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager #
> Error code: 97
> # [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> # [0]PETSC ERROR: GPU error
> # [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
> supported
> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> # [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
> # [0]PETSC ERROR: ../ex1 on a  named lancer by mlohry Thu Jan  5 15:22:33
> 2023
> # [0]PETSC ERROR: Configure options
> --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2
> --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g
> -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1
> --with-cuda=1 --with-precision=double --with-clanguage=c
> --with-cudac=/usr/local/cuda-11.5/bin/nvcc
> PETSC_ARCH=arch-ci-linux-cuda-double-64idx
> # [0]PETSC ERROR: #1 CUPMAwareMPI_() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194
> # [0]PETSC ERROR: #2 initialize() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71
> # [0]PETSC ERROR: #3 init_device_id_() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290
> # [0]PETSC ERROR: #4 getDevice() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99
> # [0]PETSC ERROR: #5 PetscDeviceCreate() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104
> # [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375
> # [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499
> # [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634
> # [0]PETSC ERROR: #9 PetscInitialize_Common() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001
> # [0]PETSC ERROR: #10 PetscInitialize() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267
> # [0]PETSC ERROR: #11 main() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12
> # [0]PETSC ERROR: PETSc Option Table entries:
> # [0]PETSC ERROR: -default_device_type host
> # [0]PETSC ERROR: -device_enable eager
> # [0]PETSC ERROR: ----------------End of Error Message -------send entire
> error message to petsc-maint at mcs.anl.gov----------
>
>
>
>
>
> solver code output:
>
>
>
> [0] <sys> PetscDetermineInitialFPTrap(): Floating point trapping is off by
> default 0
> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
> host available, initializing
> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host
> initialized, default device id 0, view FALSE, init type lazy
> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
> cuda available, initializing
> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice cuda
> initialized, default device id 0, view FALSE, init type lazy
> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
> hip not available
> [0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType
> sycl not available
> [0] <sys> PetscInitialize_Common(): PETSc successfully started: number of
> processors = 1
> [0] <sys> PetscGetHostName(): Rejecting domainname, likely is NIS
> lancer.(none)
> [0] <sys> PetscInitialize_Common(): Running on machine: lancer
> # [Info] Petsc initialization complete.
> # [Trace] Timing: Starting solver...
> # [Info] RNG initial conditions have mean 0.000004, renormalizing.
> # [Trace] Timing: PetscTimeIntegrator initialization...
> # [Trace] Timing: Allocating Petsc CUDA arrays...
> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 2 3 max tags =
> 100000000
> [0] <sys> configure(): Configured device 0
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3
> # [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439
> seconds.
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3
> [0] <sys> PetscCommDuplicate(): Duplicating a communicator 1 4 max tags =
> 100000000
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
> [0] <dm> DMGetDMTS(): Creating new DMTS
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
> [0] <dm> DMGetDMSNES(): Creating new DMSNES
> [0] <dm> DMGetDMSNESWrite(): Copying DMSNES due to write
> # [Info] Initializing petsc with ode23 integrator
> # [Trace] Timing: PetscTimeIntegrator initialization finished in 0.016754
> seconds.
>
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
> [0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4
> [0] <device> PetscDeviceContextSetupGlobalContext_Private(): Initializing
> global PetscDeviceContext with device type cuda
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not
> supported
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
> [0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu Jan
>  5 15:39:14 2023
> [0]PETSC ERROR: Configure options
> PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc
> PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++
> --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS
> COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0
> --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc
> --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/
> --download-hwloc=1
> [0]PETSC ERROR: #1 initialize() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255
> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/
> cupmcontext.cu:10
> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244
> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259
> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52
> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84
> [0]PETSC ERROR: #7
> PetscDeviceContextGetCurrentContextAssertType_Internal() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371
> [0]PETSC ERROR: #8 PetscCUBLASGetHandle() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/
> cupmcontext.cu:23
> [0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/
> veccuda2.cu:261
> [0]PETSC ERROR: #10 VecMAXPY() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221
> [0]PETSC ERROR: #11 TSStep_RK() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814
> [0]PETSC ERROR: #12 TSStep() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424
> [0]PETSC ERROR: #13 TSSolve() at
> /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230105/85d26b31/attachment-0001.html>


More information about the petsc-users mailing list