<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><a href="https://gitlab.com/petsc/petsc/-/merge_requests/4512" class="">https://gitlab.com/petsc/petsc/-/merge_requests/4512</a><div class=""><br class=""><div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div>Best regards,<br class=""><br class="">Jacob Faibussowitsch<br class="">(Jacob Fai - booss - oh - vitch)<br class=""></div></div></div>
</div>
<div style=""><br class=""><blockquote type="cite" class=""><div class="">On Nov 1, 2021, at 11:00, Barry Smith <<a href="mailto:bsmith@petsc.dev" class="">bsmith@petsc.dev</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><br class=""></div>   PETSc code could check for the environmental variable CUDA_VISIBLE_DEVICES=-1 if that makes sense to resolve the situation.<div class=""><br class=""></div><div class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Nov 1, 2021, at 11:43 AM, Jacob Faibussowitsch <<a href="mailto:jacob.fai@gmail.com" class="">jacob.fai@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Looks like you are tripping up the following:<div class=""><br class=""></div><div class=""><div class="">cerr = cupmGetDeviceCount(&ndev);</div><div class="">if (PetscUnlikely(cerr == cupmErrorStubLibrary)) {</div><div class="">  … // handle missing driver or stub library</div><div class="">} else {CHKERRCUPM(cerr);} // your error here</div><div class=""><br class=""></div><div class="">Is it an error if a user configures with cuda (i.e. signals intent to use cuda) but disables all the devices? On the one hand, yes this can be considered an error if the user inadvertently disables the devices via this environment variable without knowing, but on the other hand they should be able to freely set this variable without petsc crashing… Should we warn users? Handle this silently?</div><div class=""><br class=""></div><div class="">Note that petsc does provide '-device_enable none’ option to disable all devices, or if you only want to disable cuda devices '-device_enable_cuda none’ which should achieve the same effect as CUDA_VISIBLE_DEVICES=-1. But maybe it is too obscure to ask users to know about and use these flags instead of setting the cuda env variables. (Btw, can you test that using ‘-device_enable_cuda none’ does not crash when setting <span style="caret-color: rgb(0, 0, 0);" class="">CUDA_VISIBLE_DEVICES=-1?)</span></div><div class=""><br class=""></div><div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class="">Best regards,<br class=""><br class="">Jacob Faibussowitsch<br class="">(Jacob Fai - booss - oh - vitch)<br class=""></div></div></div>
</div>
<div class=""><br class=""><blockquote type="cite" class=""><div class="">On Nov 1, 2021, at 10:09, Stefano Zampini <<a href="mailto:stefano.zampini@gmail.com" class="">stefano.zampini@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">Just found out that if we configure with cuda and then want to run on CPU only using CUDA_VISIBLE_DEVICES=-1 PETSc errors out. Is this intended behavior? I supposed it should work</div><div class="">This is with main<br class=""></div><div class=""><br class=""></div><div class="">(ecrcml-cuda) zampins@qaysar:~/miniforge/Devel/petsc$ make check</div>Running check examples to verify correct installation<br class="">Using PETSC_DIR=/home/zampins/miniforge/Devel/petsc and PETSC_ARCH=arch-ecrcml-cuda-double<br class="">C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process<br class="">C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes<br class="">C/C++ example src/snes/tutorials/ex19 run successfully with cuda<br class="">Completed test examples<br class=""><div class=""><br class=""></div><div class="">(ecrcml-cuda) zampins@qaysar:~/miniforge/Devel/petsc$ make check CUDA_VISIBLE_DEVICES=1</div>Running check examples to verify correct installation<br class="">Using PETSC_DIR=/home/zampins/miniforge/Devel/petsc and PETSC_ARCH=arch-ecrcml-cuda-double<br class="">C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process<br class="">C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes<br class="">C/C++ example src/snes/tutorials/ex19 run successfully with cuda<br class="">Completed test examples<br class=""><div class=""><br class=""></div><div class="">(ecrcml-cuda) zampins@qaysar:~/miniforge/Devel/petsc$ make check CUDA_VISIBLE_DEVICES=-1</div>Running check examples to verify correct installation<br class="">Using PETSC_DIR=/home/zampins/miniforge/Devel/petsc and PETSC_ARCH=arch-ecrcml-cuda-double<br class="">Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process<br class="">See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" class="">http://www.mcs.anl.gov/petsc/documentation/faq.html</a><br class="">[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br class="">[0]PETSC ERROR: GPU error <br class="">[0]PETSC ERROR: cuda error 100 (cudaErrorNoDevice) : no CUDA-capable device is detected<br class="">[0]PETSC ERROR: See <a href="https://petsc.org/release/faq/" class="">https://petsc.org/release/faq/</a> for trouble shooting.<br class="">[0]PETSC ERROR: Petsc Development GIT revision: v3.16.0-368-g72b201b202  GIT Date: 2021-10-29 14:48:19 +0300<br class="">[0]PETSC ERROR: ./ex19 on a arch-ecrcml-cuda-double named <a href="http://qaysar.kaust.edu.sa/" class="">qaysar.kaust.edu.sa</a> by zampins Mon Nov  1 18:06:12 2021<br class="">[0]PETSC ERROR: Configure options --with-blaslapack-include=/home/zampins/miniforge/envs/ecrcml-cuda/include --with-blaslapack-lib=/home/zampins/miniforge/envs/ecrcml-cuda/lib/libmkl_rt.so --download-h2opus --with-cuda --with-kblas-dir=/home/zampins/miniforge/envs/ecrcml-cuda --with-magma-dir=/home/zampins/miniforge/envs/ecrcml-cuda --LDFLAGS=/usr/lib/x86_64-linux-gnu/libcuda.so --with-debugging=1 --with-openmp --with-precision=double --with-fc=0 PETSC_ARCH=arch-ecrcml-cuda-double PETSC_DIR=/home/zampins/miniforge/Devel/petsc<br class="">[0]PETSC ERROR: #1 initialize() at /home/zampins/miniforge/Devel/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:302<br class="">[0]PETSC ERROR: #2 PetscDeviceInitializeTypeFromOptions_Private() at /home/zampins/miniforge/Devel/petsc/src/sys/objects/device/interface/device.cxx:292<br class="">[0]PETSC ERROR: #3 PetscDeviceInitializeFromOptions_Internal() at /home/zampins/miniforge/Devel/petsc/src/sys/objects/device/interface/device.cxx:417<br class="">[0]PETSC ERROR: #4 PetscInitialize_Common() at /home/zampins/miniforge/Devel/petsc/src/sys/objects/pinit.c:956<br class="">[0]PETSC ERROR: #5 PetscInitialize() at /home/zampins/miniforge/Devel/petsc/src/sys/objects/pinit.c:1231<br class="">--------------------------------------------------------------------------<br class="">Primary job  terminated normally, but 1 process returned<br class="">a non-zero exit code. Per user-direction, the job has been aborted.<br class="">--------------------------------------------------------------------------<br class="">--------------------------------------------------------------------------<br class=""><br clear="all" class="">[</div>
</div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></body></html>