[petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

Barry Smith bsmith at petsc.dev
Tue Jan 25 23:59:21 CST 2022


bad has extra

-L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs  -lcuda

good does not.

Try removing the stubs directory and -lcuda from the bad $PETSC_ARCH/lib/petsc/conf/variables and likely the bad will start working.

Barry

I never liked the stubs stuff.

> On Jan 25, 2022, at 11:29 PM, Fande Kong <fdkong.jd at gmail.com> wrote:
> 
> Hi Junchao,
> 
> I attached a "bad" configure log and a "good" configure log.
> 
> The "bad" one was on produced at 246ba74192519a5f34fb6e227d1c64364e19ce2c
> 
> and the "good" one at 384645a00975869a1aacbd3169de62ba40cad683
> 
> This good hash is the last good hash that is just the right before the bad one. 
> 
> I think you could do a comparison  between these two logs, and check what the differences were.
> 
> Thanks,
> 
> Fande
> 
> On Tue, Jan 25, 2022 at 8:21 PM Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
> Fande, could you send the configure.log that works (i.e., before this offending commit)?
> --Junchao Zhang
> 
> 
> On Tue, Jan 25, 2022 at 8:21 PM Fande Kong <fdkong.jd at gmail.com <mailto:fdkong.jd at gmail.com>> wrote:
> Not sure if this is helpful. I did "git bisect", and here was the result:
> 
> [kongf at sawtooth2 petsc]$ git bisect bad
> 246ba74192519a5f34fb6e227d1c64364e19ce2c is the first bad commit
> commit 246ba74192519a5f34fb6e227d1c64364e19ce2c
> Author: Junchao Zhang <jczhang at mcs.anl.gov <mailto:jczhang at mcs.anl.gov>>
> Date:   Wed Oct 13 05:32:43 2021 +0000
> 
>     Config: fix CUDA library and header dirs
> 
> :040000 040000 187c86055adb80f53c1d0565a8888704fec43a96 ea1efd7f594fd5e8df54170bc1bc7b00f35e4d5f M config
> 
> 
> Started from this commit, and GPU did not work for me on our HPC
> 
> Thanks,
> Fande
> 
> On Tue, Jan 25, 2022 at 7:18 PM Fande Kong <fdkong.jd at gmail.com <mailto:fdkong.jd at gmail.com>> wrote:
> 
> 
> On Tue, Jan 25, 2022 at 9:04 AM Jacob Faibussowitsch <jacob.fai at gmail.com <mailto:jacob.fai at gmail.com>> wrote:
> Configure should not have an impact here I think. The reason I had you run `cudaGetDeviceCount()` is because this is the CUDA call (and in fact the only CUDA call) in the initialization sequence that returns the error code. There should be no prior CUDA calls. Maybe this is a problem with oversubscribing GPU’s? In the runs that crash, how many ranks are using any given GPU  at once? Maybe MPS is required.
> 
> I used one MPI rank. 
> 
> Fande
> 
>  
> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> 
>> On Jan 21, 2022, at 12:01, Fande Kong <fdkong.jd at gmail.com <mailto:fdkong.jd at gmail.com>> wrote:
>> 
>> Thanks Jacob,
>> 
>> On Thu, Jan 20, 2022 at 6:25 PM Jacob Faibussowitsch <jacob.fai at gmail.com <mailto:jacob.fai at gmail.com>> wrote:
>> Segfault is caused by the following check at src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a PetscUnlikelyDebug() rather than just PetscUnlikely():
>> 
>> ```
>> if (PetscUnlikelyDebug(_defaultDevice < 0)) { // _defaultDevice is in fact < 0 here and uncaught
>> ```
>> 
>> To clarify: 
>> 
>> “lazy” initialization is not that lazy after all, it still does some 50% of the initialization that “eager” initialization does. It stops short initializing the CUDA runtime, checking CUDA aware MPI, gathering device data, and initializing cublas and friends. Lazy also importantly swallows any errors that crop up during initialization, storing the resulting error code for later (specifically _defaultDevice = -init_error_value;).
>> 
>> So whether you initialize lazily or eagerly makes no difference here, as _defaultDevice will always contain -35.
>> 
>> The bigger question is why cudaGetDeviceCount() is returning cudaErrorInsufficientDriver. Can you compile and run
>> 
>> ```
>> #include <cuda_runtime.h>
>> 
>> int main()
>> {
>>   int ndev;
>>   return cudaGetDeviceCount(&ndev):
>> }
>> ```
>> 
>> Then show the value of "echo $?”?
>> 
>> Modify your code a little to get more information.
>> 
>> #include <cuda_runtime.h>
>> #include <cstdio>
>> 
>> int main()
>> {
>>   int ndev;
>>   int error = cudaGetDeviceCount(&ndev);
>>   printf("ndev %d \n", ndev);
>>   printf("error %d \n", error);
>>   return 0;
>> }
>> 
>> Results:
>> 
>> $ ./a.out 
>> ndev 4 
>> error 0 
>> 
>> 
>> I have not read the PETSc cuda initialization code yet. If I need to guess at what was happening. I will naively think that PETSc did not get correct GPU information in the configuration because the compiler node does not have GPUs, and there was no way to get any GPU device information. 
>> 
>> During the runtime on GPU nodes, PETSc might have incorrect information grabbed during configuration and had this kind of false error message.
>> 
>> Thanks,
>> 
>> Fande
>> 
>>  
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> 
>>> On Jan 20, 2022, at 17:47, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> wrote:
>>> 
>>> On Thu, Jan 20, 2022 at 6:44 PM Fande Kong <fdkong.jd at gmail.com <mailto:fdkong.jd at gmail.com>> wrote:
>>> Thanks, Jed
>>> 
>>> On Thu, Jan 20, 2022 at 4:34 PM Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>>> You can't create CUDA or Kokkos Vecs if you're running on a node without a GPU. 
>>> 
>>> I am running the code on compute nodes that do have GPUs.
>>> 
>>> If you are actually running on GPUs, why would you need lazy initialization? It would not break with GPUs present.
>>> 
>>>    Matt
>>>  
>>> With PETSc-3.16.1, I  got good speedup by running GAMG on GPUs.  That might be a bug of PETSc-main.
>>> 
>>> Thanks,
>>> 
>>> Fande
>>> 
>>> 
>>> 
>>> KSPSetUp              13 1.0 6.4400e-01 1.0 2.02e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  3140   64630     15 1.05e+02    5 3.49e+01 100
>>> KSPSolve               1 1.0 1.0109e+00 1.0 3.49e+10 1.0 0.0e+00 0.0e+00 0.0e+00  0 87  0  0  0   0 87  0  0  0 34522   69556      4 4.35e-03    1 2.38e-03 100
>>> KSPGMRESOrthog       142 1.0 1.2674e-01 1.0 1.06e+10 1.0 0.0e+00 0.0e+00 0.0e+00  0 27  0  0  0   0 27  0  0  0 83755   87801      0 0.00e+00    0 0.00e+00 100
>>> SNESSolve              1 1.0 4.4402e+01 1.0 4.00e+10 1.0 0.0e+00 0.0e+00 0.0e+00 21100  0  0  0  21100  0  0  0   901   51365     57 1.10e+03   52 8.78e+02 100
>>> SNESSetUp              1 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
>>> SNESFunctionEval       2 1.0 1.7097e+01 1.0 1.60e+07 1.0 0.0e+00 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     1       0      0 0.00e+00    6 1.92e+02  0
>>> SNESJacobianEval       1 1.0 1.6213e+01 1.0 2.80e+07 1.0 0.0e+00 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     2       0      0 0.00e+00    1 3.20e+01  0
>>> SNESLineSearch         1 1.0 8.5582e+00 1.0 1.24e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0    14   64153      1 3.20e+01    3 9.61e+01 94
>>> PCGAMGGraph_AGG        5 1.0 3.0509e+00 1.0 8.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    27       0      5 3.49e+01    9 7.43e+01  0
>>> PCGAMGCoarse_AGG       5 1.0 3.8711e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
>>> PCGAMGProl_AGG         5 1.0 7.0748e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
>>> PCGAMGPOpt_AGG         5 1.0 1.2904e+00 1.0 2.14e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0  1661   29807     26 7.15e+02   20 2.90e+02 99
>>> GAMG: createProl       5 1.0 8.9489e+00 1.0 2.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00  4  6  0  0  0   4  6  0  0  0   249   29666     31 7.50e+02   29 3.64e+02 96
>>>   Graph               10 1.0 3.0478e+00 1.0 8.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    27       0      5 3.49e+01    9 7.43e+01  0
>>>   MIS/Agg              5 1.0 4.1290e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
>>>   SA: col data         5 1.0 1.9127e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
>>>   SA: frmProl0         5 1.0 6.2662e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
>>>   SA: smooth           5 1.0 4.9595e-01 1.0 1.21e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   244    2709     15 1.97e+02   15 2.55e+02 90
>>> GAMG: partLevel        5 1.0 4.7330e-01 1.0 6.98e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  1475    4120      5 1.78e+02   10 2.55e+02 100
>>> PCGAMG Squ l00         1 1.0 2.6027e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
>>> PCGAMG Gal l00         1 1.0 3.8406e-01 1.0 5.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1426    4270      1 1.48e+02    2 2.11e+02 100
>>> PCGAMG Opt l00         1 1.0 2.4932e-01 1.0 7.20e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   289    2653      1 6.41e+01    1 1.13e+02 100
>>> PCGAMG Gal l01         1 1.0 6.6279e-02 1.0 1.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1645    3851      1 2.40e+01    2 3.64e+01 100
>>> PCGAMG Opt l01         1 1.0 2.9544e-02 1.0 7.15e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   242    1671      1 4.84e+00    1 1.23e+01 100
>>> PCGAMG Gal l02         1 1.0 1.8874e-02 1.0 3.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1974    3636      1 5.04e+00    2 6.58e+00 100
>>> PCGAMG Opt l02         1 1.0 7.4353e-03 1.0 2.40e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   323    1457      1 7.71e-01    1 2.30e+00 100
>>> PCGAMG Gal l03         1 1.0 2.8479e-03 1.0 4.10e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1440    2266      1 4.44e-01    2 5.51e-01 100
>>> PCGAMG Opt l03         1 1.0 8.2684e-04 1.0 2.80e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   339    1667      1 6.72e-02    1 2.03e-01 100
>>> PCGAMG Gal l04         1 1.0 1.2238e-03 1.0 2.09e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   170     244      1 2.05e-02    2 2.53e-02 100
>>> PCGAMG Opt l04         1 1.0 4.1008e-04 1.0 1.77e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    43     165      1 4.49e-03    1 1.19e-02 100
>>> PCSetUp                2 1.0 9.9632e+00 1.0 4.95e+09 1.0 0.0e+00 0.0e+00 0.0e+00  5 12  0  0  0   5 12  0  0  0   496   17826     55 1.03e+03   45 6.54e+02 98
>>> PCSetUpOnBlocks       44 1.0 9.9087e-04 1.0 2.88e+03 1.0
>>> 
>>> 
>>> 
>>> 
>>>  
>>> The point of lazy initialization is to make it possible to run a solve that doesn't use a GPU in PETSC_ARCH that supports GPUs, regardless of whether a GPU is actually present.
>>> 
>>> Fande Kong <fdkong.jd at gmail.com <mailto:fdkong.jd at gmail.com>> writes:
>>> 
>>> > I spoke too soon. It seems that we have trouble creating cuda/kokkos vecs
>>> > now. Got Segmentation fault.
>>> >
>>> > Thanks,
>>> >
>>> > Fande
>>> >
>>> > Program received signal SIGSEGV, Segmentation fault.
>>> > 0x00002aaab5558b11 in
>>> > Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize
>>> > (this=0x1) at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54
>>> > 54 PetscErrorCode CUPMDevice<T>::CUPMDeviceInternal::initialize() noexcept
>>> > Missing separate debuginfos, use: debuginfo-install
>>> > bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.176-5.el7.x86_64
>>> > elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64
>>> > libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64
>>> > libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64
>>> > libibmad-5.4.0.MLNX20190423.1d917ae-0.1.49224.x86_64
>>> > libibumad-43.1.1.MLNX20200211.078947f-0.1.49224.x86_64
>>> > libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64
>>> > libmlx4-41mlnx1-OFED.4.7.3.0.3.49224.x86_64
>>> > libmlx5-41mlnx1-OFED.4.9.0.1.2.49224.x86_64 libnl3-3.2.28-4.el7.x86_64
>>> > librdmacm-41mlnx1-OFED.4.7.3.0.6.49224.x86_64
>>> > librxe-41mlnx1-OFED.4.4.2.4.6.49224.x86_64 libxcb-1.13-1.el7.x86_64
>>> > libxml2-2.9.1-6.el7_9.6.x86_64 numactl-libs-2.0.12-5.el7.x86_64
>>> > systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64
>>> > zlib-1.2.7-19.el7_9.x86_64
>>> > (gdb) bt
>>> > #0  0x00002aaab5558b11 in
>>> > Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize
>>> > (this=0x1) at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54
>>> > #1  0x00002aaab5558db7 in
>>> > Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::getDevice
>>> > (this=this at entry=0x2aaab7f37b70
>>> > <CUDADevice>, device=0x115da00, id=-35, id at entry=-1) at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:344
>>> > #2  0x00002aaab55577de in PetscDeviceCreate (type=type at entry=PETSC_DEVICE_CUDA,
>>> > devid=devid at entry=-1, device=device at entry=0x2aaab7f37b48
>>> > <defaultDevices+8>) at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:107
>>> > #3  0x00002aaab5557b3a in PetscDeviceInitializeDefaultDevice_Internal
>>> > (type=type at entry=PETSC_DEVICE_CUDA, defaultDeviceId=defaultDeviceId at entry=-1)
>>> > at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:273
>>> > #4  0x00002aaab5557bf6 in PetscDeviceInitialize
>>> > (type=type at entry=PETSC_DEVICE_CUDA)
>>> > at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:234
>>> > #5  0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/seq/seqcuda/veccuda.c:244
>>> > #6  0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
>>> > method=method at entry=0x2aaab70b45b8 "seqcuda") at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93
>>> > #7  0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/mpi/mpicuda/
>>> > mpicuda.cu:214 <http://mpicuda.cu:214/>
>>> > #8  0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
>>> > method=method at entry=0x7fffffff9260 "cuda") at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93
>>> > #9  0x00002aaab5648bf1 in VecSetTypeFromOptions_Private (vec=0x115d150,
>>> > PetscOptionsObject=0x7fffffff9210) at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1263
>>> > #10 VecSetFromOptions (vec=0x115d150) at
>>> > /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1297
>>> > #11 0x00002aaab02ef227 in libMesh::PetscVector<double>::init
>>> > (this=0x11cd1a0, n=441, n_local=441, fast=false, ptype=libMesh::PARALLEL)
>>> > at
>>> > /home/kongf/workhome/sawtooth/moosegpu/scripts/../libmesh/installed/include/libmesh/petsc_vector.h:693
>>> >
>>> > On Thu, Jan 20, 2022 at 1:09 PM Fande Kong <fdkong.jd at gmail.com <mailto:fdkong.jd at gmail.com>> wrote:
>>> >
>>> >> Thanks, Jed,
>>> >>
>>> >> This worked!
>>> >>
>>> >> Fande
>>> >>
>>> >> On Wed, Jan 19, 2022 at 11:03 PM Jed Brown <jed at jedbrown.org <mailto:jed at jedbrown.org>> wrote:
>>> >>
>>> >>> Fande Kong <fdkong.jd at gmail.com <mailto:fdkong.jd at gmail.com>> writes:
>>> >>>
>>> >>> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch <
>>> >>> jacob.fai at gmail.com <mailto:jacob.fai at gmail.com>>
>>> >>> > wrote:
>>> >>> >
>>> >>> >> Are you running on login nodes or compute nodes (I can’t seem to tell
>>> >>> from
>>> >>> >> the configure.log)?
>>> >>> >>
>>> >>> >
>>> >>> > I was compiling codes on login nodes, and running codes on compute
>>> >>> nodes.
>>> >>> > Login nodes do not have GPUs, but compute nodes do have GPUs.
>>> >>> >
>>> >>> > Just to be clear, the same thing (code, machine) with PETSc-3.16.1
>>> >>> worked
>>> >>> > perfectly. I have this trouble with PETSc-main.
>>> >>>
>>> >>> I assume you can
>>> >>>
>>> >>> export PETSC_OPTIONS='-device_enable lazy'
>>> >>>
>>> >>> and it'll work.
>>> >>>
>>> >>> I think this should be the default. The main complaint is that timing the
>>> >>> first GPU-using event isn't accurate if it includes initialization, but I
>>> >>> think this is mostly hypothetical because you can't trust any timing that
>>> >>> doesn't preload in some form and the first GPU-using event will almost
>>> >>> always be something uninteresting so I think it will rarely lead to
>>> >>> confusion. Meanwhile, eager initialization is viscerally disruptive for
>>> >>> lots of people.
>>> >>>
>>> >>
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> <configure_bad.log><configure_good.log>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220126/943aacc7/attachment-0001.html>


More information about the petsc-users mailing list