[petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

Junchao Zhang junchao.zhang at gmail.com
Thu Jan 20 20:29:51 CST 2022


I don't see values using PetscUnlikely() today.

--Junchao Zhang


On Thu, Jan 20, 2022 at 7:26 PM Jacob Faibussowitsch <jacob.fai at gmail.com>
wrote:

> Segfault is caused by the following check at
> src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a
> PetscUnlikelyDebug() rather than just PetscUnlikely():
>
> ```
> if (PetscUnlikelyDebug(_defaultDevice < 0)) { // _defaultDevice is in fact
> < 0 here and uncaught
> ```
>
> To clarify:
>
> “lazy” initialization is not that lazy after all, it still does some 50%
> of the initialization that “eager” initialization does. It stops short
> initializing the CUDA runtime, checking CUDA aware MPI, gathering device
> data, and initializing cublas and friends. Lazy also importantly swallows
> any errors that crop up during initialization, storing the resulting error
> code for later (specifically _defaultDevice = -init_error_value;).
>
> So whether you initialize lazily or eagerly makes no difference here, as
> _defaultDevice will always contain -35.
>
> The bigger question is why cudaGetDeviceCount() is returning
> cudaErrorInsufficientDriver. Can you compile and run
>
> ```
> #include <cuda_runtime.h>
>
> int main()
> {
>   int ndev;
>   return cudaGetDeviceCount(&ndev):
> }
> ```
>
> Then show the value of "echo $?”?
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Jan 20, 2022, at 17:47, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Thu, Jan 20, 2022 at 6:44 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>> Thanks, Jed
>>
>> On Thu, Jan 20, 2022 at 4:34 PM Jed Brown <jed at jedbrown.org> wrote:
>>
>>> You can't create CUDA or Kokkos Vecs if you're running on a node without
>>> a GPU.
>>
>>
>> I am running the code on compute nodes that do have GPUs.
>>
>
> If you are actually running on GPUs, why would you need lazy
> initialization? It would not break with GPUs present.
>
>    Matt
>
>
>> With PETSc-3.16.1, I  got good speedup by running GAMG on GPUs.  That
>> might be a bug of PETSc-main.
>>
>> Thanks,
>>
>> Fande
>>
>>
>>
>> KSPSetUp              13 1.0 6.4400e-01 1.0 2.02e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  5  0  0  0   0  5  0  0  0  3140   64630     15 1.05e+02    5
>> 3.49e+01 100
>> KSPSolve               1 1.0 1.0109e+00 1.0 3.49e+10 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0 87  0  0  0   0 87  0  0  0 34522   69556      4 4.35e-03    1
>> 2.38e-03 100
>> KSPGMRESOrthog       142 1.0 1.2674e-01 1.0 1.06e+10 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0 27  0  0  0   0 27  0  0  0 83755   87801      0 0.00e+00    0
>> 0.00e+00 100
>> SNESSolve              1 1.0 4.4402e+01 1.0 4.00e+10 1.0 0.0e+00 0.0e+00
>> 0.0e+00 21100  0  0  0  21100  0  0  0   901   51365     57 1.10e+03   52
>> 8.78e+02 100
>> SNESSetUp              1 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>> SNESFunctionEval       2 1.0 1.7097e+01 1.0 1.60e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  8  0  0  0  0   8  0  0  0  0     1       0      0 0.00e+00    6
>> 1.92e+02  0
>> SNESJacobianEval       1 1.0 1.6213e+01 1.0 2.80e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  8  0  0  0  0   8  0  0  0  0     2       0      0 0.00e+00    1
>> 3.20e+01  0
>> SNESLineSearch         1 1.0 8.5582e+00 1.0 1.24e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  4  0  0  0  0   4  0  0  0  0    14   64153      1 3.20e+01    3
>> 9.61e+01 94
>> PCGAMGGraph_AGG        5 1.0 3.0509e+00 1.0 8.19e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0    27       0      5 3.49e+01    9
>> 7.43e+01  0
>> PCGAMGCoarse_AGG       5 1.0 3.8711e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>> PCGAMGProl_AGG         5 1.0 7.0748e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>> PCGAMGPOpt_AGG         5 1.0 1.2904e+00 1.0 2.14e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1  5  0  0  0   1  5  0  0  0  1661   29807     26 7.15e+02   20
>> 2.90e+02 99
>> GAMG: createProl       5 1.0 8.9489e+00 1.0 2.22e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  4  6  0  0  0   4  6  0  0  0   249   29666     31 7.50e+02   29
>> 3.64e+02 96
>>   Graph               10 1.0 3.0478e+00 1.0 8.19e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0    27       0      5 3.49e+01    9
>> 7.43e+01  0
>>   MIS/Agg              5 1.0 4.1290e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>   SA: col data         5 1.0 1.9127e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>   SA: frmProl0         5 1.0 6.2662e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>>   SA: smooth           5 1.0 4.9595e-01 1.0 1.21e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   244    2709     15 1.97e+02   15
>> 2.55e+02 90
>> GAMG: partLevel        5 1.0 4.7330e-01 1.0 6.98e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  2  0  0  0   0  2  0  0  0  1475    4120      5 1.78e+02   10
>> 2.55e+02 100
>> PCGAMG Squ l00         1 1.0 2.6027e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      0 0.00e+00    0
>> 0.00e+00  0
>> PCGAMG Gal l00         1 1.0 3.8406e-01 1.0 5.48e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0  1426    4270      1 1.48e+02    2
>> 2.11e+02 100
>> PCGAMG Opt l00         1 1.0 2.4932e-01 1.0 7.20e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   289    2653      1 6.41e+01    1
>> 1.13e+02 100
>> PCGAMG Gal l01         1 1.0 6.6279e-02 1.0 1.09e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1645    3851      1 2.40e+01    2
>> 3.64e+01 100
>> PCGAMG Opt l01         1 1.0 2.9544e-02 1.0 7.15e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   242    1671      1 4.84e+00    1
>> 1.23e+01 100
>> PCGAMG Gal l02         1 1.0 1.8874e-02 1.0 3.72e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1974    3636      1 5.04e+00    2
>> 6.58e+00 100
>> PCGAMG Opt l02         1 1.0 7.4353e-03 1.0 2.40e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   323    1457      1 7.71e-01    1
>> 2.30e+00 100
>> PCGAMG Gal l03         1 1.0 2.8479e-03 1.0 4.10e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1440    2266      1 4.44e-01    2
>> 5.51e-01 100
>> PCGAMG Opt l03         1 1.0 8.2684e-04 1.0 2.80e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   339    1667      1 6.72e-02    1
>> 2.03e-01 100
>> PCGAMG Gal l04         1 1.0 1.2238e-03 1.0 2.09e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0   170     244      1 2.05e-02    2
>> 2.53e-02 100
>> PCGAMG Opt l04         1 1.0 4.1008e-04 1.0 1.77e+04 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0    43     165      1 4.49e-03    1
>> 1.19e-02 100
>> PCSetUp                2 1.0 9.9632e+00 1.0 4.95e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00  5 12  0  0  0   5 12  0  0  0   496   17826     55 1.03e+03   45
>> 6.54e+02 98
>> PCSetUpOnBlocks       44 1.0 9.9087e-04 1.0 2.88e+03 1.0
>>
>>
>>
>>
>>
>>
>>> The point of lazy initialization is to make it possible to run a solve
>>> that doesn't use a GPU in PETSC_ARCH that supports GPUs, regardless of
>>> whether a GPU is actually present.
>>>
>>> Fande Kong <fdkong.jd at gmail.com> writes:
>>>
>>> > I spoke too soon. It seems that we have trouble creating cuda/kokkos
>>> vecs
>>> > now. Got Segmentation fault.
>>> >
>>> > Thanks,
>>> >
>>> > Fande
>>> >
>>> > Program received signal SIGSEGV, Segmentation fault.
>>> > 0x00002aaab5558b11 in
>>> >
>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize
>>> > (this=0x1) at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54
>>> > 54 PetscErrorCode CUPMDevice<T>::CUPMDeviceInternal::initialize()
>>> noexcept
>>> > Missing separate debuginfos, use: debuginfo-install
>>> > bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.176-5.el7.x86_64
>>> > elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64
>>> > libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64
>>> > libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64
>>> > libibmad-5.4.0.MLNX20190423.1d917ae-0.1.49224.x86_64
>>> > libibumad-43.1.1.MLNX20200211.078947f-0.1.49224.x86_64
>>> > libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64
>>> > libmlx4-41mlnx1-OFED.4.7.3.0.3.49224.x86_64
>>> > libmlx5-41mlnx1-OFED.4.9.0.1.2.49224.x86_64 libnl3-3.2.28-4.el7.x86_64
>>> > librdmacm-41mlnx1-OFED.4.7.3.0.6.49224.x86_64
>>> > librxe-41mlnx1-OFED.4.4.2.4.6.49224.x86_64 libxcb-1.13-1.el7.x86_64
>>> > libxml2-2.9.1-6.el7_9.6.x86_64 numactl-libs-2.0.12-5.el7.x86_64
>>> > systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64
>>> > zlib-1.2.7-19.el7_9.x86_64
>>> > (gdb) bt
>>> > #0  0x00002aaab5558b11 in
>>> >
>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize
>>> > (this=0x1) at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54
>>> > #1  0x00002aaab5558db7 in
>>> > Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::getDevice
>>> > (this=this at entry=0x2aaab7f37b70
>>> > <CUDADevice>, device=0x115da00, id=-35, id at entry=-1) at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:344
>>> > #2  0x00002aaab55577de in PetscDeviceCreate (type=type at entry
>>> =PETSC_DEVICE_CUDA,
>>> > devid=devid at entry=-1, device=device at entry=0x2aaab7f37b48
>>> > <defaultDevices+8>) at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:107
>>> > #3  0x00002aaab5557b3a in PetscDeviceInitializeDefaultDevice_Internal
>>> > (type=type at entry=PETSC_DEVICE_CUDA,
>>> defaultDeviceId=defaultDeviceId at entry=-1)
>>> > at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:273
>>> > #4  0x00002aaab5557bf6 in PetscDeviceInitialize
>>> > (type=type at entry=PETSC_DEVICE_CUDA)
>>> > at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:234
>>> > #5  0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/seq/seqcuda/veccuda.c:244
>>> > #6  0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
>>> > method=method at entry=0x2aaab70b45b8 "seqcuda") at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93
>>> > #7  0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/mpi/mpicuda/
>>> > mpicuda.cu:214
>>> > #8  0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
>>> > method=method at entry=0x7fffffff9260 "cuda") at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93
>>> > #9  0x00002aaab5648bf1 in VecSetTypeFromOptions_Private (vec=0x115d150,
>>> > PetscOptionsObject=0x7fffffff9210) at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1263
>>> > #10 VecSetFromOptions (vec=0x115d150) at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1297
>>> > #11 0x00002aaab02ef227 in libMesh::PetscVector<double>::init
>>> > (this=0x11cd1a0, n=441, n_local=441, fast=false,
>>> ptype=libMesh::PARALLEL)
>>> > at
>>> >
>>> /home/kongf/workhome/sawtooth/moosegpu/scripts/../libmesh/installed/include/libmesh/petsc_vector.h:693
>>> >
>>> > On Thu, Jan 20, 2022 at 1:09 PM Fande Kong <fdkong.jd at gmail.com>
>>> wrote:
>>> >
>>> >> Thanks, Jed,
>>> >>
>>> >> This worked!
>>> >>
>>> >> Fande
>>> >>
>>> >> On Wed, Jan 19, 2022 at 11:03 PM Jed Brown <jed at jedbrown.org> wrote:
>>> >>
>>> >>> Fande Kong <fdkong.jd at gmail.com> writes:
>>> >>>
>>> >>> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch <
>>> >>> jacob.fai at gmail.com>
>>> >>> > wrote:
>>> >>> >
>>> >>> >> Are you running on login nodes or compute nodes (I can’t seem to
>>> tell
>>> >>> from
>>> >>> >> the configure.log)?
>>> >>> >>
>>> >>> >
>>> >>> > I was compiling codes on login nodes, and running codes on compute
>>> >>> nodes.
>>> >>> > Login nodes do not have GPUs, but compute nodes do have GPUs.
>>> >>> >
>>> >>> > Just to be clear, the same thing (code, machine) with PETSc-3.16.1
>>> >>> worked
>>> >>> > perfectly. I have this trouble with PETSc-main.
>>> >>>
>>> >>> I assume you can
>>> >>>
>>> >>> export PETSC_OPTIONS='-device_enable lazy'
>>> >>>
>>> >>> and it'll work.
>>> >>>
>>> >>> I think this should be the default. The main complaint is that
>>> timing the
>>> >>> first GPU-using event isn't accurate if it includes initialization,
>>> but I
>>> >>> think this is mostly hypothetical because you can't trust any timing
>>> that
>>> >>> doesn't preload in some form and the first GPU-using event will
>>> almost
>>> >>> always be something uninteresting so I think it will rarely lead to
>>> >>> confusion. Meanwhile, eager initialization is viscerally disruptive
>>> for
>>> >>> lots of people.
>>> >>>
>>> >>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220120/f4c75c36/attachment-0001.html>


More information about the petsc-users mailing list