<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Segfault is caused by the following check at src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a PetscUnlikelyDebug() rather than just PetscUnlikely():<div class=""><br class=""></div><div class="">```</div><div class="">if (PetscUnlikelyDebug(_defaultDevice < 0)) { // _defaultDevice is in fact < 0 here and uncaught</div><div class="">```</div><div class=""><br class=""></div><div class="">To clarify: </div><div class=""><br class=""></div><div class="">“lazy” initialization is not that lazy after all, it still does some 50% of the initialization that “eager” initialization does. It stops short initializing the CUDA runtime, checking CUDA aware MPI, gathering device data, and initializing cublas and friends. Lazy also importantly swallows any errors that crop up during initialization, storing the resulting error code for later (specifically _defaultDevice = -init_error_value;).</div><div class=""><br class=""><div class="">So whether you initialize lazily or eagerly makes no difference here, as _defaultDevice will always contain -35.</div><div class=""><br class=""></div><div class="">The bigger question is why cudaGetDeviceCount() is returning cudaErrorInsufficientDriver. Can you compile and run</div><div class=""><br class=""></div><div class="">```</div><div class="">#include <cuda_runtime.h></div><div class=""><br class=""></div><div class="">int main()</div><div class="">{</div><div class=""> int ndev;</div><div class=""> return cudaGetDeviceCount(&ndev):</div><div class="">}</div><div class="">```</div><div class=""><br class=""></div><div class="">Then show the value of "echo $?”?</div><div class=""><br class=""><div class=""><div class="">
<div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div>Best regards,<br class=""><br class="">Jacob Faibussowitsch<br class="">(Jacob Fai - booss - oh - vitch)<br class=""></div></div></div>
</div>
<div><br class=""><blockquote type="cite" class=""><div class="">On Jan 20, 2022, at 17:47, Matthew Knepley <<a href="mailto:knepley@gmail.com" class="">knepley@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class="">On Thu, Jan 20, 2022 at 6:44 PM Fande Kong <<a href="mailto:fdkong.jd@gmail.com" class="">fdkong.jd@gmail.com</a>> wrote:<br class=""></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class="">Thanks, Jed</div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 20, 2022 at 4:34 PM Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank" class="">jed@jedbrown.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">You can't create CUDA or Kokkos Vecs if you're running on a node without a GPU. </blockquote><div class=""><br class="">I am running the code on compute nodes that do have GPUs.<br class=""></div></div></div></blockquote><div class=""><br class=""></div><div class="">If you are actually running on GPUs, why would you need lazy initialization? It would not break with GPUs present.</div><div class=""><br class=""></div><div class=""> Matt</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div class="gmail_quote"><div class="">With PETSc-3.16.1, I got good speedup by running GAMG on GPUs. That might be a bug of PETSc-main.<br class=""><br class="">Thanks,<br class=""><br class="">Fande<br class=""><br class=""><br class=""><br class="">KSPSetUp 13 1.0 6.4400e-01 1.0 2.02e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 3140 64630 15 1.05e+02 5 3.49e+01 100<br class="">KSPSolve 1 1.0 1.0109e+00 1.0 3.49e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 87 0 0 0 0 87 0 0 0 34522 69556 4 4.35e-03 1 2.38e-03 100<br class="">KSPGMRESOrthog 142 1.0 1.2674e-01 1.0 1.06e+10 1.0 0.0e+00 0.0e+00 0.0e+00 0 27 0 0 0 0 27 0 0 0 83755 87801 0 0.00e+00 0 0.00e+00 100<br class="">SNESSolve 1 1.0 4.4402e+01 1.0 4.00e+10 1.0 0.0e+00 0.0e+00 0.0e+00 21100 0 0 0 21100 0 0 0 901 51365 57 1.10e+03 52 8.78e+02 100<br class="">SNESSetUp 1 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0<br class="">SNESFunctionEval 2 1.0 1.7097e+01 1.0 1.60e+07 1.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 1 0 0 0.00e+00 6 1.92e+02 0<br class="">SNESJacobianEval 1 1.0 1.6213e+01 1.0 2.80e+07 1.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 2 0 0 0.00e+00 1 3.20e+01 0<br class="">SNESLineSearch 1 1.0 8.5582e+00 1.0 1.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 14 64153 1 3.20e+01 3 9.61e+01 94<br class="">PCGAMGGraph_AGG 5 1.0 3.0509e+00 1.0 8.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 0 5 3.49e+01 9 7.43e+01 0<br class="">PCGAMGCoarse_AGG 5 1.0 3.8711e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0<br class="">PCGAMGProl_AGG 5 1.0 7.0748e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0<br class="">PCGAMGPOpt_AGG 5 1.0 1.2904e+00 1.0 2.14e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 1661 29807 26 7.15e+02 20 2.90e+02 99<br class="">GAMG: createProl 5 1.0 8.9489e+00 1.0 2.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 6 0 0 0 4 6 0 0 0 249 29666 31 7.50e+02 29 3.64e+02 96<br class=""> Graph 10 1.0 3.0478e+00 1.0 8.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 0 5 3.49e+01 9 7.43e+01 0<br class=""> MIS/Agg 5 1.0 4.1290e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0<br class=""> SA: col data 5 1.0 1.9127e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0<br class=""> SA: frmProl0 5 1.0 6.2662e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0<br class=""> SA: smooth 5 1.0 4.9595e-01 1.0 1.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 244 2709 15 1.97e+02 15 2.55e+02 90<br class="">GAMG: partLevel 5 1.0 4.7330e-01 1.0 6.98e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1475 4120 5 1.78e+02 10 2.55e+02 100<br class="">PCGAMG Squ l00 1 1.0 2.6027e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0<br class="">PCGAMG Gal l00 1 1.0 3.8406e-01 1.0 5.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1426 4270 1 1.48e+02 2 2.11e+02 100<br class="">PCGAMG Opt l00 1 1.0 2.4932e-01 1.0 7.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 289 2653 1 6.41e+01 1 1.13e+02 100<br class="">PCGAMG Gal l01 1 1.0 6.6279e-02 1.0 1.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1645 3851 1 2.40e+01 2 3.64e+01 100<br class="">PCGAMG Opt l01 1 1.0 2.9544e-02 1.0 7.15e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 242 1671 1 4.84e+00 1 1.23e+01 100<br class="">PCGAMG Gal l02 1 1.0 1.8874e-02 1.0 3.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1974 3636 1 5.04e+00 2 6.58e+00 100<br class="">PCGAMG Opt l02 1 1.0 7.4353e-03 1.0 2.40e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 323 1457 1 7.71e-01 1 2.30e+00 100<br class="">PCGAMG Gal l03 1 1.0 2.8479e-03 1.0 4.10e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1440 2266 1 4.44e-01 2 5.51e-01 100<br class="">PCGAMG Opt l03 1 1.0 8.2684e-04 1.0 2.80e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 339 1667 1 6.72e-02 1 2.03e-01 100<br class="">PCGAMG Gal l04 1 1.0 1.2238e-03 1.0 2.09e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 170 244 1 2.05e-02 2 2.53e-02 100<br class="">PCGAMG Opt l04 1 1.0 4.1008e-04 1.0 1.77e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 43 165 1 4.49e-03 1 1.19e-02 100<br class="">PCSetUp 2 1.0 9.9632e+00 1.0 4.95e+09 1.0 0.0e+00 0.0e+00 0.0e+00 5 12 0 0 0 5 12 0 0 0 496 17826 55 1.03e+03 45 6.54e+02 98<br class="">PCSetUpOnBlocks 44 1.0 9.9087e-04 1.0 2.88e+03 1.0<br class=""><br class=""><div style="margin: 0px; font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><br class=""></div><br class=""><br class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The point of lazy initialization is to make it possible to run a solve that doesn't use a GPU in PETSC_ARCH that supports GPUs, regardless of whether a GPU is actually present.<br class="">
<br class="">
Fande Kong <<a href="mailto:fdkong.jd@gmail.com" target="_blank" class="">fdkong.jd@gmail.com</a>> writes:<br class="">
<br class="">
> I spoke too soon. It seems that we have trouble creating cuda/kokkos vecs<br class="">
> now. Got Segmentation fault.<br class="">
><br class="">
> Thanks,<br class="">
><br class="">
> Fande<br class="">
><br class="">
> Program received signal SIGSEGV, Segmentation fault.<br class="">
> 0x00002aaab5558b11 in<br class="">
> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize<br class="">
> (this=0x1) at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54<br class="">
> 54 PetscErrorCode CUPMDevice<T>::CUPMDeviceInternal::initialize() noexcept<br class="">
> Missing separate debuginfos, use: debuginfo-install<br class="">
> bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.176-5.el7.x86_64<br class="">
> elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64<br class="">
> libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64<br class="">
> libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64<br class="">
> libibmad-5.4.0.MLNX20190423.1d917ae-0.1.49224.x86_64<br class="">
> libibumad-43.1.1.MLNX20200211.078947f-0.1.49224.x86_64<br class="">
> libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64<br class="">
> libmlx4-41mlnx1-OFED.4.7.3.0.3.49224.x86_64<br class="">
> libmlx5-41mlnx1-OFED.4.9.0.1.2.49224.x86_64 libnl3-3.2.28-4.el7.x86_64<br class="">
> librdmacm-41mlnx1-OFED.4.7.3.0.6.49224.x86_64<br class="">
> librxe-41mlnx1-OFED.4.4.2.4.6.49224.x86_64 libxcb-1.13-1.el7.x86_64<br class="">
> libxml2-2.9.1-6.el7_9.6.x86_64 numactl-libs-2.0.12-5.el7.x86_64<br class="">
> systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64<br class="">
> zlib-1.2.7-19.el7_9.x86_64<br class="">
> (gdb) bt<br class="">
> #0 0x00002aaab5558b11 in<br class="">
> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize<br class="">
> (this=0x1) at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54<br class="">
> #1 0x00002aaab5558db7 in<br class="">
> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::getDevice<br class="">
> (this=this@entry=0x2aaab7f37b70<br class="">
> <CUDADevice>, device=0x115da00, id=-35, id@entry=-1) at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:344<br class="">
> #2 0x00002aaab55577de in PetscDeviceCreate (type=type@entry=PETSC_DEVICE_CUDA,<br class="">
> devid=devid@entry=-1, device=device@entry=0x2aaab7f37b48<br class="">
> <defaultDevices+8>) at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:107<br class="">
> #3 0x00002aaab5557b3a in PetscDeviceInitializeDefaultDevice_Internal<br class="">
> (type=type@entry=PETSC_DEVICE_CUDA, defaultDeviceId=defaultDeviceId@entry=-1)<br class="">
> at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:273<br class="">
> #4 0x00002aaab5557bf6 in PetscDeviceInitialize<br class="">
> (type=type@entry=PETSC_DEVICE_CUDA)<br class="">
> at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:234<br class="">
> #5 0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/seq/seqcuda/veccuda.c:244<br class="">
> #6 0x00002aaab5649b40 in VecSetType (vec=vec@entry=0x115d150,<br class="">
> method=method@entry=0x2aaab70b45b8 "seqcuda") at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93<br class="">
> #7 0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/mpi/mpicuda/<br class="">
> <a href="http://mpicuda.cu:214/" rel="noreferrer" target="_blank" class="">mpicuda.cu:214</a><br class="">
> #8 0x00002aaab5649b40 in VecSetType (vec=vec@entry=0x115d150,<br class="">
> method=method@entry=0x7fffffff9260 "cuda") at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93<br class="">
> #9 0x00002aaab5648bf1 in VecSetTypeFromOptions_Private (vec=0x115d150,<br class="">
> PetscOptionsObject=0x7fffffff9210) at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1263<br class="">
> #10 VecSetFromOptions (vec=0x115d150) at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1297<br class="">
> #11 0x00002aaab02ef227 in libMesh::PetscVector<double>::init<br class="">
> (this=0x11cd1a0, n=441, n_local=441, fast=false, ptype=libMesh::PARALLEL)<br class="">
> at<br class="">
> /home/kongf/workhome/sawtooth/moosegpu/scripts/../libmesh/installed/include/libmesh/petsc_vector.h:693<br class="">
><br class="">
> On Thu, Jan 20, 2022 at 1:09 PM Fande Kong <<a href="mailto:fdkong.jd@gmail.com" target="_blank" class="">fdkong.jd@gmail.com</a>> wrote:<br class="">
><br class="">
>> Thanks, Jed,<br class="">
>><br class="">
>> This worked!<br class="">
>><br class="">
>> Fande<br class="">
>><br class="">
>> On Wed, Jan 19, 2022 at 11:03 PM Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank" class="">jed@jedbrown.org</a>> wrote:<br class="">
>><br class="">
>>> Fande Kong <<a href="mailto:fdkong.jd@gmail.com" target="_blank" class="">fdkong.jd@gmail.com</a>> writes:<br class="">
>>><br class="">
>>> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch <<br class="">
>>> <a href="mailto:jacob.fai@gmail.com" target="_blank" class="">jacob.fai@gmail.com</a>><br class="">
>>> > wrote:<br class="">
>>> ><br class="">
>>> >> Are you running on login nodes or compute nodes (I can’t seem to tell<br class="">
>>> from<br class="">
>>> >> the configure.log)?<br class="">
>>> >><br class="">
>>> ><br class="">
>>> > I was compiling codes on login nodes, and running codes on compute<br class="">
>>> nodes.<br class="">
>>> > Login nodes do not have GPUs, but compute nodes do have GPUs.<br class="">
>>> ><br class="">
>>> > Just to be clear, the same thing (code, machine) with PETSc-3.16.1<br class="">
>>> worked<br class="">
>>> > perfectly. I have this trouble with PETSc-main.<br class="">
>>><br class="">
>>> I assume you can<br class="">
>>><br class="">
>>> export PETSC_OPTIONS='-device_enable lazy'<br class="">
>>><br class="">
>>> and it'll work.<br class="">
>>><br class="">
>>> I think this should be the default. The main complaint is that timing the<br class="">
>>> first GPU-using event isn't accurate if it includes initialization, but I<br class="">
>>> think this is mostly hypothetical because you can't trust any timing that<br class="">
>>> doesn't preload in some form and the first GPU-using event will almost<br class="">
>>> always be something uninteresting so I think it will rarely lead to<br class="">
>>> confusion. Meanwhile, eager initialization is viscerally disruptive for<br class="">
>>> lots of people.<br class="">
>>><br class="">
>><br class="">
</blockquote></div></div>
</blockquote></div><br clear="all" class=""><div class=""><br class=""></div>-- <br class=""><div dir="ltr" class="gmail_signature"><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class=""><div dir="ltr" class=""><div class="">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br class="">-- Norbert Wiener</div><div class=""><br class=""></div><div class=""><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank" class="">https://www.cse.buffalo.edu/~knepley/</a><br class=""></div></div></div></div></div></div></div></div>
</div></blockquote></div><br class=""></div></div></div></body></html>