<div dir="ltr"><div>These cards do indeed not support cudaDeviceGetMemPool -- cudaDeviceGetAttribute on<span class="enum-member-name-def"> cudaDevAttrMemoryPoolsSupported return false, meaning it doesn't support cudaMallocAsync, so the first point of failure is the call to cudaDeviceGetMemPool in the initialization.<br></span></div><div><br></div><div>Would a workaround be to replace the cudaMallocAsync call to cudaMalloc and skip the mempool or is that a bad idea?<br></div><div><span class="enum-member-name-def"></span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jan 6, 2023 at 9:17 AM Mark Lohry <<a href="mailto:mlohry@gmail.com">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>It built+ran fine on a different system with an sm75 arch. Is there a documented minimum version if that indeed is the cause?</div><div><br></div><div>One minor hiccup FYI -- compilation of hypre fails with cuda toolkit 12, due to cusprase removing csrsv2Info_t (although it's still referenced in their docs...) in favor of bsrsv2Info_t. Rolling back to cuda toolkit 11.8 worked.<br></div><pre><span></span><span></span></pre></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 5, 2023 at 6:37 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Jacob, is it because the cuda arch is too old? <div><br clear="all"><div><div dir="ltr"><div dir="ltr">--Junchao Zhang</div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 5, 2023 at 4:30 PM Mark Lohry <<a href="mailto:mlohry@gmail.com" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I'm seeing the same thing on latest main with a different machine and -sm52 card, cuda 11.8. make check fails with the below, where the indicated line 249 corresponds to PetscCallCUPM(cupmDeviceGetMemPool(&mempool, static_cast<int>(device->deviceId))); in the initialize function. <br></div><div><br></div><div><br></div><div>Running check examples to verify correct installation<br>Using PETSC_DIR=/home/mlohry/dev/petsc and PETSC_ARCH=arch-linux-c-debug<br>C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process<br>C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes<br>2,17c2,46<br>< 0 SNES Function norm 2.391552133017e-01 <br>< 0 KSP Residual norm 2.928487269734e-01 <br>< 1 KSP Residual norm 1.876489580142e-02 <br>< 2 KSP Residual norm 3.291394847944e-03 <br>< 3 KSP Residual norm 2.456493072124e-04 <br>< 4 KSP Residual norm 1.161647147715e-05 <br>< 5 KSP Residual norm 1.285648407621e-06 <br>< 1 SNES Function norm 6.846805706142e-05 <br>< 0 KSP Residual norm 2.292783790384e-05 <br>< 1 KSP Residual norm 2.100673631699e-06 <br>< 2 KSP Residual norm 2.121341386147e-07 <br>< 3 KSP Residual norm 2.455932678957e-08 <br>< 4 KSP Residual norm 1.753095730744e-09 <br>< 5 KSP Residual norm 7.489214418904e-11 <br>< 2 SNES Function norm 2.103908447865e-10 <br>< Number of SNES iterations = 2<br>---<br>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>> [0]PETSC ERROR: GPU error<br>> [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported<br>> [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be the program crashed before they were used or a spelling mistake, etc!<br>> [0]PETSC ERROR: Option left: name:-mg_levels_ksp_max_it value: 3 source: command line<br>> [0]PETSC ERROR: Option left: name:-nox (no value) source: environment<br>> [0]PETSC ERROR: Option left: name:-nox_warning (no value) source: environment<br>> [0]PETSC ERROR: Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 source: command line<br>> [0]PETSC ERROR: See <a href="https://petsc.org/release/faq/" target="_blank">https://petsc.org/release/faq/</a> for trouble shooting.<br>> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.3-352-g91c56366cb GIT Date: 2023-01-05 17:22:48 +0000<br>> [0]PETSC ERROR: ./ex19 on a arch-linux-c-debug named osprey by mlohry Thu Jan 5 17:25:17 2023<br>> [0]PETSC ERROR: Configure options --with-cuda --with-mpi=1<br>> [0]PETSC ERROR: #1 initialize() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:249<br>> [0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at /home/mlohry/dev/petsc/src/sys/objects/device/impls/cupm/cuda/<a href="http://cupmcontext.cu:10" target="_blank">cupmcontext.cu:10</a><br>> [0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:247<br>> [0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/dcontext.cxx:260<br>> [0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52<br>> [0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at /home/mlohry/dev/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84<br>> [0]PETSC ERROR: #7 GetHandleDispatch_() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:499<br>> [0]PETSC ERROR: #8 create() at /home/mlohry/dev/petsc/include/petsc/private/veccupmimpl.h:1069<br>> [0]PETSC ERROR: #9 VecCreate_SeqCUDA() at /home/mlohry/dev/petsc/src/vec/vec/impls/seq/cupm/cuda/<a href="http://vecseqcupm.cu:10" target="_blank">vecseqcupm.cu:10</a><br>> [0]PETSC ERROR: #10 VecSetType() at /home/mlohry/dev/petsc/src/vec/vec/interface/vecreg.c:89<br>> [0]PETSC ERROR: #11 DMCreateGlobalVector_DA() at /home/mlohry/dev/petsc/src/dm/impls/da/dadist.c:31<br>> [0]PETSC ERROR: #12 DMCreateGlobalVector() at /home/mlohry/dev/petsc/src/dm/interface/dm.c:1023<br>> [0]PETSC ERROR: #13 main() at ex19.c:149<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 5, 2023 at 3:42 PM Mark Lohry <<a href="mailto:mlohry@gmail.com" target="_blank">mlohry@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I'm trying to compile the cuda example</div><div><br></div><div>./config/examples/arch-ci-linux-cuda-double-64idx.py --with-cudac=/usr/local/cuda-11.5/bin/nvcc</div><div><br></div><div>and running make test passes the test ok diff-sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-lazy but the eager variant fails, pasted below.</div><div><br></div><div></div><div>I get a similar error running my client code, pasted after. There when running with -info, it seems that some lazy initialization happens first, and i also call VecCreateSeqCuda which seems to have no issue.</div><div><br></div><div>Any idea? This happens to be with an -sm 3.5 device if it matters, otherwise it's a recent cuda compiler+driver.<br></div><div><br></div><div><br></div><div>petsc test code output:<br></div><div><br></div><div><br></div><div><br>not ok sys_objects_device_tests-ex1_host_with_device+nsize-1device_enable-eager # Error code: 97<br># [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br># [0]PETSC ERROR: GPU error<br># [0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported<br># [0]PETSC ERROR: See <a href="https://petsc.org/release/faq/" target="_blank">https://petsc.org/release/faq/</a> for trouble shooting.<br># [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 <br># [0]PETSC ERROR: ../ex1 on a named lancer by mlohry Thu Jan 5 15:22:33 2023<br># [0]PETSC ERROR: Configure options --package-prefix-hash=/home/mlohry/petsc-hash-pkgs --with-make-test-np=2 --download-openmpi=1 --download-hypre=1 --download-hwloc=1 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-cuda=1 --with-precision=double --with-clanguage=c --with-cudac=/usr/local/cuda-11.5/bin/nvcc PETSC_ARCH=arch-ci-linux-cuda-double-64idx<br># [0]PETSC ERROR: #1 CUPMAwareMPI_() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:194<br># [0]PETSC ERROR: #2 initialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:71<br># [0]PETSC ERROR: #3 init_device_id_() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:290<br># [0]PETSC ERROR: #4 getDevice() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/../impls/host/../impldevicebase.hpp:99<br># [0]PETSC ERROR: #5 PetscDeviceCreate() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:104<br># [0]PETSC ERROR: #6 PetscDeviceInitializeDefaultDevice_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:375<br># [0]PETSC ERROR: #7 PetscDeviceInitializeTypeFromOptions_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:499<br># [0]PETSC ERROR: #8 PetscDeviceInitializeFromOptions_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/device.cxx:634<br># [0]PETSC ERROR: #9 PetscInitialize_Common() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1001<br># [0]PETSC ERROR: #10 PetscInitialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/pinit.c:1267<br># [0]PETSC ERROR: #11 main() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/tests/ex1.c:12<br># [0]PETSC ERROR: PETSc Option Table entries:<br># [0]PETSC ERROR: -default_device_type host<br># [0]PETSC ERROR: -device_enable eager<br># [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint@mcs.anl.gov----------</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div>solver code output:</div><div><br></div><div><br></div><div><br></div><div>[0] <sys> PetscDetermineInitialFPTrap(): Floating point trapping is off by default 0<br>[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType host available, initializing<br>[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice host initialized, default device id 0, view FALSE, init type lazy<br>[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType cuda available, initializing<br>[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDevice cuda initialized, default device id 0, view FALSE, init type lazy<br>[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType hip not available<br>[0] <sys> PetscDeviceInitializeTypeFromOptions_Private(): PetscDeviceType sycl not available<br>[0] <sys> PetscInitialize_Common(): PETSc successfully started: number of processors = 1<br>[0] <sys> PetscGetHostName(): Rejecting domainname, likely is NIS lancer.(none)<br>[0] <sys> PetscInitialize_Common(): Running on machine: lancer<br># [Info] Petsc initialization complete.<br># [Trace] Timing: Starting solver...<br># [Info] RNG initial conditions have mean 0.000004, renormalizing.<br># [Trace] Timing: PetscTimeIntegrator initialization...<br># [Trace] Timing: Allocating Petsc CUDA arrays...<br>[0] <sys> PetscCommDuplicate(): Duplicating a communicator 2 3 max tags = 100000000<br>[0] <sys> configure(): Configured device 0<br>[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3<br># [Trace] Timing: Allocating Petsc CUDA arrays finished in 0.015439 seconds.<br>[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 2 3<br>[0] <sys> PetscCommDuplicate(): Duplicating a communicator 1 4 max tags = 100000000<br>[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4<br>[0] <dm> DMGetDMTS(): Creating new DMTS<br>[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4<br>[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4<br>[0] <dm> DMGetDMSNES(): Creating new DMSNES<br>[0] <dm> DMGetDMSNESWrite(): Copying DMSNES due to write<br># [Info] Initializing petsc with ode23 integrator<br># [Trace] Timing: PetscTimeIntegrator initialization finished in 0.016754 seconds.<br></div><div><br>[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4<br>[0] <sys> PetscCommDuplicate(): Using internal PETSc communicator 1 4<br>[0] <device> PetscDeviceContextSetupGlobalContext_Private(): Initializing global PetscDeviceContext with device type cuda<br>[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>[0]PETSC ERROR: GPU error<br>[0]PETSC ERROR: cuda error 801 (cudaErrorNotSupported) : operation not supported<br>[0]PETSC ERROR: See <a href="https://petsc.org/release/faq/" target="_blank">https://petsc.org/release/faq/</a> for trouble shooting.<br>[0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022 <br>[0]PETSC ERROR: maDG on a arch-linux2-c-opt named lancer by mlohry Thu Jan 5 15:39:14 2023<br>[0]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/bin/cc --with-cxx=/usr/bin/c++ --with-fc=0 --with-pic=1 --with-cxx-dialect=C++11 MAKEFLAGS=$MAKEFLAGS COPTFLAGS="-O3 -march=native" CXXOPTFLAGS="-O3 -march=native" --with-mpi=0 --with-debugging=no --with-cudac=/usr/local/cuda-11.5/bin/nvcc --with-cuda-arch=35 --with-cuda --with-cuda-dir=/usr/local/cuda-11.5/ --download-hwloc=1<br>[0]PETSC ERROR: #1 initialize() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/../cupmcontext.hpp:255<br>[0]PETSC ERROR: #2 PetscDeviceContextCreate_CUDA() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/<a href="http://cupmcontext.cu:10" target="_blank">cupmcontext.cu:10</a><br>[0]PETSC ERROR: #3 PetscDeviceContextSetDevice_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:244<br>[0]PETSC ERROR: #4 PetscDeviceContextSetDefaultDeviceForType_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/dcontext.cxx:259<br>[0]PETSC ERROR: #5 PetscDeviceContextSetupGlobalContext_Private() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:52<br>[0]PETSC ERROR: #6 PetscDeviceContextGetCurrentContext() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/interface/global_dcontext.cxx:84<br>[0]PETSC ERROR: #7 PetscDeviceContextGetCurrentContextAssertType_Internal() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/include/petsc/private/deviceimpl.h:371<br>[0]PETSC ERROR: #8 PetscCUBLASGetHandle() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/sys/objects/device/impls/cupm/cuda/<a href="http://cupmcontext.cu:23" target="_blank">cupmcontext.cu:23</a><br>[0]PETSC ERROR: #9 VecMAXPY_SeqCUDA() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/impls/seq/seqcuda/<a href="http://veccuda2.cu:261" target="_blank">veccuda2.cu:261</a><br>[0]PETSC ERROR: #10 VecMAXPY() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/vec/vec/interface/rvector.c:1221<br>[0]PETSC ERROR: #11 TSStep_RK() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/impls/explicit/rk/rk.c:814<br>[0]PETSC ERROR: #12 TSStep() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3424<br>[0]PETSC ERROR: #13 TSSolve() at /home/mlohry/dev/maDGiCart-cmake-build-cuda-release/external/petsc/src/ts/interface/ts.c:3814<br></div><div><br></div><div><br></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>