[petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version
Fande Kong
fdkong.jd at gmail.com
Tue Jan 25 20:18:39 CST 2022
On Tue, Jan 25, 2022 at 9:04 AM Jacob Faibussowitsch <jacob.fai at gmail.com>
wrote:
> Configure should not have an impact here I think. The reason I had you run
> `cudaGetDeviceCount()` is because this is the CUDA call (and in fact the
> only CUDA call) in the initialization sequence that returns the error code.
> There should be no prior CUDA calls. Maybe this is a problem with
> oversubscribing GPU’s? In the runs that crash, how many ranks are using any
> given GPU at once? Maybe MPS is required.
>
I used one MPI rank.
Fande
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Jan 21, 2022, at 12:01, Fande Kong <fdkong.jd at gmail.com> wrote:
>
> Thanks Jacob,
>
> On Thu, Jan 20, 2022 at 6:25 PM Jacob Faibussowitsch <jacob.fai at gmail.com>
> wrote:
>
>> Segfault is caused by the following check at
>> src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a
>> PetscUnlikelyDebug() rather than just PetscUnlikely():
>>
>> ```
>> if (PetscUnlikelyDebug(_defaultDevice < 0)) { // _defaultDevice is in
>> fact < 0 here and uncaught
>> ```
>>
>> To clarify:
>>
>> “lazy” initialization is not that lazy after all, it still does some 50%
>> of the initialization that “eager” initialization does. It stops short
>> initializing the CUDA runtime, checking CUDA aware MPI, gathering device
>> data, and initializing cublas and friends. Lazy also importantly swallows
>> any errors that crop up during initialization, storing the resulting error
>> code for later (specifically _defaultDevice = -init_error_value;).
>>
>> So whether you initialize lazily or eagerly makes no difference here, as
>> _defaultDevice will always contain -35.
>>
>> The bigger question is why cudaGetDeviceCount() is returning
>> cudaErrorInsufficientDriver. Can you compile and run
>>
>> ```
>> #include <cuda_runtime.h>
>>
>> int main()
>> {
>> int ndev;
>> return cudaGetDeviceCount(&ndev):
>> }
>> ```
>>
>> Then show the value of "echo $?”?
>>
>
> Modify your code a little to get more information.
>
> #include <cuda_runtime.h>
> #include <cstdio>
>
> int main()
> {
> int ndev;
> int error = cudaGetDeviceCount(&ndev);
> printf("ndev %d \n", ndev);
> printf("error %d \n", error);
> return 0;
> }
>
> Results:
>
> $ ./a.out
> ndev 4
> error 0
>
>
> I have not read the PETSc cuda initialization code yet. If I need to guess
> at what was happening. I will naively think that PETSc did not get correct
> GPU information in the configuration because the compiler node does not
> have GPUs, and there was no way to get any GPU device information.
>
> During the runtime on GPU nodes, PETSc might have incorrect information
> grabbed during configuration and had this kind of false error message.
>
> Thanks,
>
> Fande
>
>
>
>>
>> Best regards,
>>
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>>
>> On Jan 20, 2022, at 17:47, Matthew Knepley <knepley at gmail.com> wrote:
>>
>> On Thu, Jan 20, 2022 at 6:44 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>
>>> Thanks, Jed
>>>
>>> On Thu, Jan 20, 2022 at 4:34 PM Jed Brown <jed at jedbrown.org> wrote:
>>>
>>>> You can't create CUDA or Kokkos Vecs if you're running on a node
>>>> without a GPU.
>>>
>>>
>>> I am running the code on compute nodes that do have GPUs.
>>>
>>
>> If you are actually running on GPUs, why would you need lazy
>> initialization? It would not break with GPUs present.
>>
>> Matt
>>
>>
>>> With PETSc-3.16.1, I got good speedup by running GAMG on GPUs. That
>>> might be a bug of PETSc-main.
>>>
>>> Thanks,
>>>
>>> Fande
>>>
>>>
>>>
>>> KSPSetUp 13 1.0 6.4400e-01 1.0 2.02e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 5 0 0 0 0 5 0 0 0 3140 64630 15 1.05e+02 5
>>> 3.49e+01 100
>>> KSPSolve 1 1.0 1.0109e+00 1.0 3.49e+10 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 87 0 0 0 0 87 0 0 0 34522 69556 4 4.35e-03 1
>>> 2.38e-03 100
>>> KSPGMRESOrthog 142 1.0 1.2674e-01 1.0 1.06e+10 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 27 0 0 0 0 27 0 0 0 83755 87801 0 0.00e+00 0
>>> 0.00e+00 100
>>> SNESSolve 1 1.0 4.4402e+01 1.0 4.00e+10 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 21100 0 0 0 21100 0 0 0 901 51365 57 1.10e+03 52
>>> 8.78e+02 100
>>> SNESSetUp 1 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0
>>> 0.00e+00 0
>>> SNESFunctionEval 2 1.0 1.7097e+01 1.0 1.60e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 8 0 0 0 0 8 0 0 0 0 1 0 0 0.00e+00 6
>>> 1.92e+02 0
>>> SNESJacobianEval 1 1.0 1.6213e+01 1.0 2.80e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 8 0 0 0 0 8 0 0 0 0 2 0 0 0.00e+00 1
>>> 3.20e+01 0
>>> SNESLineSearch 1 1.0 8.5582e+00 1.0 1.24e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 14 64153 1 3.20e+01 3
>>> 9.61e+01 94
>>> PCGAMGGraph_AGG 5 1.0 3.0509e+00 1.0 8.19e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 0 5 3.49e+01 9
>>> 7.43e+01 0
>>> PCGAMGCoarse_AGG 5 1.0 3.8711e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 0 0.00e+00 0
>>> 0.00e+00 0
>>> PCGAMGProl_AGG 5 1.0 7.0748e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0
>>> 0.00e+00 0
>>> PCGAMGPOpt_AGG 5 1.0 1.2904e+00 1.0 2.14e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 5 0 0 0 1 5 0 0 0 1661 29807 26 7.15e+02 20
>>> 2.90e+02 99
>>> GAMG: createProl 5 1.0 8.9489e+00 1.0 2.22e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 4 6 0 0 0 4 6 0 0 0 249 29666 31 7.50e+02 29
>>> 3.64e+02 96
>>> Graph 10 1.0 3.0478e+00 1.0 8.19e+07 1.0 0.0e+00
>>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 0 5
>>> 3.49e+01 9 7.43e+01 0
>>> MIS/Agg 5 1.0 4.1290e-01 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0.00e+00 0 0.00e+00 0
>>> SA: col data 5 1.0 1.9127e-02 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0.00e+00 0 0.00e+00 0
>>> SA: frmProl0 5 1.0 6.2662e-01 1.0 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0.00e+00 0 0.00e+00 0
>>> SA: smooth 5 1.0 4.9595e-01 1.0 1.21e+08 1.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 244 2709 15
>>> 1.97e+02 15 2.55e+02 90
>>> GAMG: partLevel 5 1.0 4.7330e-01 1.0 6.98e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 1475 4120 5 1.78e+02 10
>>> 2.55e+02 100
>>> PCGAMG Squ l00 1 1.0 2.6027e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0
>>> 0.00e+00 0
>>> PCGAMG Gal l00 1 1.0 3.8406e-01 1.0 5.48e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1426 4270 1 1.48e+02 2
>>> 2.11e+02 100
>>> PCGAMG Opt l00 1 1.0 2.4932e-01 1.0 7.20e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 289 2653 1 6.41e+01 1
>>> 1.13e+02 100
>>> PCGAMG Gal l01 1 1.0 6.6279e-02 1.0 1.09e+08 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1645 3851 1 2.40e+01 2
>>> 3.64e+01 100
>>> PCGAMG Opt l01 1 1.0 2.9544e-02 1.0 7.15e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 242 1671 1 4.84e+00 1
>>> 1.23e+01 100
>>> PCGAMG Gal l02 1 1.0 1.8874e-02 1.0 3.72e+07 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1974 3636 1 5.04e+00 2
>>> 6.58e+00 100
>>> PCGAMG Opt l02 1 1.0 7.4353e-03 1.0 2.40e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 323 1457 1 7.71e-01 1
>>> 2.30e+00 100
>>> PCGAMG Gal l03 1 1.0 2.8479e-03 1.0 4.10e+06 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1440 2266 1 4.44e-01 2
>>> 5.51e-01 100
>>> PCGAMG Opt l03 1 1.0 8.2684e-04 1.0 2.80e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 339 1667 1 6.72e-02 1
>>> 2.03e-01 100
>>> PCGAMG Gal l04 1 1.0 1.2238e-03 1.0 2.09e+05 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 170 244 1 2.05e-02 2
>>> 2.53e-02 100
>>> PCGAMG Opt l04 1 1.0 4.1008e-04 1.0 1.77e+04 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 43 165 1 4.49e-03 1
>>> 1.19e-02 100
>>> PCSetUp 2 1.0 9.9632e+00 1.0 4.95e+09 1.0 0.0e+00 0.0e+00
>>> 0.0e+00 5 12 0 0 0 5 12 0 0 0 496 17826 55 1.03e+03 45
>>> 6.54e+02 98
>>> PCSetUpOnBlocks 44 1.0 9.9087e-04 1.0 2.88e+03 1.0
>>>
>>>
>>>
>>>
>>>
>>>
>>>> The point of lazy initialization is to make it possible to run a solve
>>>> that doesn't use a GPU in PETSC_ARCH that supports GPUs, regardless of
>>>> whether a GPU is actually present.
>>>>
>>>> Fande Kong <fdkong.jd at gmail.com> writes:
>>>>
>>>> > I spoke too soon. It seems that we have trouble creating cuda/kokkos
>>>> vecs
>>>> > now. Got Segmentation fault.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Fande
>>>> >
>>>> > Program received signal SIGSEGV, Segmentation fault.
>>>> > 0x00002aaab5558b11 in
>>>> >
>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize
>>>> > (this=0x1) at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54
>>>> > 54 PetscErrorCode CUPMDevice<T>::CUPMDeviceInternal::initialize()
>>>> noexcept
>>>> > Missing separate debuginfos, use: debuginfo-install
>>>> > bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.176-5.el7.x86_64
>>>> > elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64
>>>> > libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64
>>>> > libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64
>>>> > libibmad-5.4.0.MLNX20190423.1d917ae-0.1.49224.x86_64
>>>> > libibumad-43.1.1.MLNX20200211.078947f-0.1.49224.x86_64
>>>> > libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64
>>>> > libmlx4-41mlnx1-OFED.4.7.3.0.3.49224.x86_64
>>>> > libmlx5-41mlnx1-OFED.4.9.0.1.2.49224.x86_64 libnl3-3.2.28-4.el7.x86_64
>>>> > librdmacm-41mlnx1-OFED.4.7.3.0.6.49224.x86_64
>>>> > librxe-41mlnx1-OFED.4.4.2.4.6.49224.x86_64 libxcb-1.13-1.el7.x86_64
>>>> > libxml2-2.9.1-6.el7_9.6.x86_64 numactl-libs-2.0.12-5.el7.x86_64
>>>> > systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64
>>>> > zlib-1.2.7-19.el7_9.x86_64
>>>> > (gdb) bt
>>>> > #0 0x00002aaab5558b11 in
>>>> >
>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize
>>>> > (this=0x1) at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54
>>>> > #1 0x00002aaab5558db7 in
>>>> > Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::getDevice
>>>> > (this=this at entry=0x2aaab7f37b70
>>>> > <CUDADevice>, device=0x115da00, id=-35, id at entry=-1) at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:344
>>>> > #2 0x00002aaab55577de in PetscDeviceCreate (type=type at entry
>>>> =PETSC_DEVICE_CUDA,
>>>> > devid=devid at entry=-1, device=device at entry=0x2aaab7f37b48
>>>> > <defaultDevices+8>) at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:107
>>>> > #3 0x00002aaab5557b3a in PetscDeviceInitializeDefaultDevice_Internal
>>>> > (type=type at entry=PETSC_DEVICE_CUDA,
>>>> defaultDeviceId=defaultDeviceId at entry=-1)
>>>> > at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:273
>>>> > #4 0x00002aaab5557bf6 in PetscDeviceInitialize
>>>> > (type=type at entry=PETSC_DEVICE_CUDA)
>>>> > at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:234
>>>> > #5 0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/seq/seqcuda/veccuda.c:244
>>>> > #6 0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
>>>> > method=method at entry=0x2aaab70b45b8 "seqcuda") at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93
>>>> > #7 0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/mpi/mpicuda/
>>>> > mpicuda.cu:214
>>>> > #8 0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
>>>> > method=method at entry=0x7fffffff9260 "cuda") at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93
>>>> > #9 0x00002aaab5648bf1 in VecSetTypeFromOptions_Private
>>>> (vec=0x115d150,
>>>> > PetscOptionsObject=0x7fffffff9210) at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1263
>>>> > #10 VecSetFromOptions (vec=0x115d150) at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1297
>>>> > #11 0x00002aaab02ef227 in libMesh::PetscVector<double>::init
>>>> > (this=0x11cd1a0, n=441, n_local=441, fast=false,
>>>> ptype=libMesh::PARALLEL)
>>>> > at
>>>> >
>>>> /home/kongf/workhome/sawtooth/moosegpu/scripts/../libmesh/installed/include/libmesh/petsc_vector.h:693
>>>> >
>>>> > On Thu, Jan 20, 2022 at 1:09 PM Fande Kong <fdkong.jd at gmail.com>
>>>> wrote:
>>>> >
>>>> >> Thanks, Jed,
>>>> >>
>>>> >> This worked!
>>>> >>
>>>> >> Fande
>>>> >>
>>>> >> On Wed, Jan 19, 2022 at 11:03 PM Jed Brown <jed at jedbrown.org> wrote:
>>>> >>
>>>> >>> Fande Kong <fdkong.jd at gmail.com> writes:
>>>> >>>
>>>> >>> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch <
>>>> >>> jacob.fai at gmail.com>
>>>> >>> > wrote:
>>>> >>> >
>>>> >>> >> Are you running on login nodes or compute nodes (I can’t seem to
>>>> tell
>>>> >>> from
>>>> >>> >> the configure.log)?
>>>> >>> >>
>>>> >>> >
>>>> >>> > I was compiling codes on login nodes, and running codes on compute
>>>> >>> nodes.
>>>> >>> > Login nodes do not have GPUs, but compute nodes do have GPUs.
>>>> >>> >
>>>> >>> > Just to be clear, the same thing (code, machine) with PETSc-3.16.1
>>>> >>> worked
>>>> >>> > perfectly. I have this trouble with PETSc-main.
>>>> >>>
>>>> >>> I assume you can
>>>> >>>
>>>> >>> export PETSC_OPTIONS='-device_enable lazy'
>>>> >>>
>>>> >>> and it'll work.
>>>> >>>
>>>> >>> I think this should be the default. The main complaint is that
>>>> timing the
>>>> >>> first GPU-using event isn't accurate if it includes initialization,
>>>> but I
>>>> >>> think this is mostly hypothetical because you can't trust any
>>>> timing that
>>>> >>> doesn't preload in some form and the first GPU-using event will
>>>> almost
>>>> >>> always be something uninteresting so I think it will rarely lead to
>>>> >>> confusion. Meanwhile, eager initialization is viscerally disruptive
>>>> for
>>>> >>> lots of people.
>>>> >>>
>>>> >>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220125/d0d9177d/attachment-0001.html>
More information about the petsc-users
mailing list