[petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

Fande Kong fdkong.jd at gmail.com
Wed Jan 26 12:42:55 CST 2022


The make.log generated after removing "stubs and -lcuda", was attached in
case it might be helpful

I am not aware of the motivation for making the changes in cuda.py. Might I
ask to revert that bad commit before we fully understand the issue?

Thanks,

Fande




On Wed, Jan 26, 2022 at 11:25 AM Fande Kong <fdkong.jd at gmail.com> wrote:

> I am on the petsc-main
>
> commit 1390d3a27d88add7d79c9b38bf1a895ae5e67af6
>
> Merge: 96c919c d5f3255
>
> Author: Satish Balay <balay at mcs.anl.gov>
>
> Date:   Wed Jan 26 10:28:32 2022 -0600
>
>
>     Merge remote-tracking branch 'origin/release'
>
>
> It is still broken.
>
> Thanks,
>
>
> Fande
>
> On Wed, Jan 26, 2022 at 7:40 AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> The good uses the compiler's default library/header path.  The bad
>> searches from cuda toolkit path and uses rpath linking.
>> Though the paths look the same on the login node, they could have
>> different behavior on a compute node depending on its environment.
>> I think we fixed the issue in cuda.py (i.e., first try the compiler's
>> default, then toolkit).  That's why I wanted Fande to use petsc/main.
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Jan 25, 2022 at 11:59 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>> bad has extra
>>>
>>> -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs
>>>  -lcuda
>>>
>>> good does not.
>>>
>>> Try removing the stubs directory and -lcuda from the bad
>>> $PETSC_ARCH/lib/petsc/conf/variables and likely the bad will start working.
>>>
>>> Barry
>>>
>>> I never liked the stubs stuff.
>>>
>>> On Jan 25, 2022, at 11:29 PM, Fande Kong <fdkong.jd at gmail.com> wrote:
>>>
>>> Hi Junchao,
>>>
>>> I attached a "bad" configure log and a "good" configure log.
>>>
>>> The "bad" one was on produced at 246ba74192519a5f34fb6e227d1c64364e19ce2c
>>>
>>> and the "good" one at 384645a00975869a1aacbd3169de62ba40cad683
>>>
>>> This good hash is the last good hash that is just the right before the
>>> bad one.
>>>
>>> I think you could do a comparison  between these two logs, and check
>>> what the differences were.
>>>
>>> Thanks,
>>>
>>> Fande
>>>
>>> On Tue, Jan 25, 2022 at 8:21 PM Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>> Fande, could you send the configure.log that works (i.e., before this
>>>> offending commit)?
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Tue, Jan 25, 2022 at 8:21 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>>
>>>>> Not sure if this is helpful. I did "git bisect", and here was the
>>>>> result:
>>>>>
>>>>> [kongf at sawtooth2 petsc]$ git bisect bad
>>>>> 246ba74192519a5f34fb6e227d1c64364e19ce2c is the first bad commit
>>>>> commit 246ba74192519a5f34fb6e227d1c64364e19ce2c
>>>>> Author: Junchao Zhang <jczhang at mcs.anl.gov>
>>>>> Date:   Wed Oct 13 05:32:43 2021 +0000
>>>>>
>>>>>     Config: fix CUDA library and header dirs
>>>>>
>>>>> :040000 040000 187c86055adb80f53c1d0565a8888704fec43a96
>>>>> ea1efd7f594fd5e8df54170bc1bc7b00f35e4d5f M config
>>>>>
>>>>>
>>>>> Started from this commit, and GPU did not work for me on our HPC
>>>>>
>>>>> Thanks,
>>>>> Fande
>>>>>
>>>>> On Tue, Jan 25, 2022 at 7:18 PM Fande Kong <fdkong.jd at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 25, 2022 at 9:04 AM Jacob Faibussowitsch <
>>>>>> jacob.fai at gmail.com> wrote:
>>>>>>
>>>>>>> Configure should not have an impact here I think. The reason I had
>>>>>>> you run `cudaGetDeviceCount()` is because this is the CUDA call (and in
>>>>>>> fact the only CUDA call) in the initialization sequence that returns the
>>>>>>> error code. There should be no prior CUDA calls. Maybe this is a problem
>>>>>>> with oversubscribing GPU’s? In the runs that crash, how many ranks are
>>>>>>> using any given GPU  at once? Maybe MPS is required.
>>>>>>>
>>>>>>
>>>>>> I used one MPI rank.
>>>>>>
>>>>>> Fande
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Jacob Faibussowitsch
>>>>>>> (Jacob Fai - booss - oh - vitch)
>>>>>>>
>>>>>>> On Jan 21, 2022, at 12:01, Fande Kong <fdkong.jd at gmail.com> wrote:
>>>>>>>
>>>>>>> Thanks Jacob,
>>>>>>>
>>>>>>> On Thu, Jan 20, 2022 at 6:25 PM Jacob Faibussowitsch <
>>>>>>> jacob.fai at gmail.com> wrote:
>>>>>>>
>>>>>>>> Segfault is caused by the following check at
>>>>>>>> src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a
>>>>>>>> PetscUnlikelyDebug() rather than just PetscUnlikely():
>>>>>>>>
>>>>>>>> ```
>>>>>>>> if (PetscUnlikelyDebug(_defaultDevice < 0)) { // _defaultDevice is
>>>>>>>> in fact < 0 here and uncaught
>>>>>>>> ```
>>>>>>>>
>>>>>>>> To clarify:
>>>>>>>>
>>>>>>>> “lazy” initialization is not that lazy after all, it still does
>>>>>>>> some 50% of the initialization that “eager” initialization does. It stops
>>>>>>>> short initializing the CUDA runtime, checking CUDA aware MPI, gathering
>>>>>>>> device data, and initializing cublas and friends. Lazy also importantly
>>>>>>>> swallows any errors that crop up during initialization, storing the
>>>>>>>> resulting error code for later (specifically _defaultDevice =
>>>>>>>> -init_error_value;).
>>>>>>>>
>>>>>>>> So whether you initialize lazily or eagerly makes no difference
>>>>>>>> here, as _defaultDevice will always contain -35.
>>>>>>>>
>>>>>>>> The bigger question is why cudaGetDeviceCount() is returning
>>>>>>>> cudaErrorInsufficientDriver. Can you compile and run
>>>>>>>>
>>>>>>>> ```
>>>>>>>> #include <cuda_runtime.h>
>>>>>>>>
>>>>>>>> int main()
>>>>>>>> {
>>>>>>>>   int ndev;
>>>>>>>>   return cudaGetDeviceCount(&ndev):
>>>>>>>> }
>>>>>>>> ```
>>>>>>>>
>>>>>>>> Then show the value of "echo $?”?
>>>>>>>>
>>>>>>>
>>>>>>> Modify your code a little to get more information.
>>>>>>>
>>>>>>> #include <cuda_runtime.h>
>>>>>>> #include <cstdio>
>>>>>>>
>>>>>>> int main()
>>>>>>> {
>>>>>>>   int ndev;
>>>>>>>   int error = cudaGetDeviceCount(&ndev);
>>>>>>>   printf("ndev %d \n", ndev);
>>>>>>>   printf("error %d \n", error);
>>>>>>>   return 0;
>>>>>>> }
>>>>>>>
>>>>>>> Results:
>>>>>>>
>>>>>>> $ ./a.out
>>>>>>> ndev 4
>>>>>>> error 0
>>>>>>>
>>>>>>>
>>>>>>> I have not read the PETSc cuda initialization code yet. If I need to
>>>>>>> guess at what was happening. I will naively think that PETSc did not get
>>>>>>> correct GPU information in the configuration because the compiler node does
>>>>>>> not have GPUs, and there was no way to get any GPU device information.
>>>>>>>
>>>>>>>
>>>>>>> During the runtime on GPU nodes, PETSc might have incorrect
>>>>>>> information grabbed during configuration and had this kind of false error
>>>>>>> message.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Fande
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Jacob Faibussowitsch
>>>>>>>> (Jacob Fai - booss - oh - vitch)
>>>>>>>>
>>>>>>>> On Jan 20, 2022, at 17:47, Matthew Knepley <knepley at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Thu, Jan 20, 2022 at 6:44 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks, Jed
>>>>>>>>>
>>>>>>>>> On Thu, Jan 20, 2022 at 4:34 PM Jed Brown <jed at jedbrown.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> You can't create CUDA or Kokkos Vecs if you're running on a node
>>>>>>>>>> without a GPU.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am running the code on compute nodes that do have GPUs.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If you are actually running on GPUs, why would you need lazy
>>>>>>>> initialization? It would not break with GPUs present.
>>>>>>>>
>>>>>>>>    Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>> With PETSc-3.16.1, I  got good speedup by running GAMG on GPUs.
>>>>>>>>> That might be a bug of PETSc-main.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Fande
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> KSPSetUp              13 1.0 6.4400e-01 1.0 2.02e+09 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0  3140   64630     15
>>>>>>>>> 1.05e+02    5 3.49e+01 100
>>>>>>>>> KSPSolve               1 1.0 1.0109e+00 1.0 3.49e+10 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0 87  0  0  0   0 87  0  0  0 34522   69556      4
>>>>>>>>> 4.35e-03    1 2.38e-03 100
>>>>>>>>> KSPGMRESOrthog       142 1.0 1.2674e-01 1.0 1.06e+10 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0 27  0  0  0   0 27  0  0  0 83755   87801      0
>>>>>>>>> 0.00e+00    0 0.00e+00 100
>>>>>>>>> SNESSolve              1 1.0 4.4402e+01 1.0 4.00e+10 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00 21100  0  0  0  21100  0  0  0   901   51365     57
>>>>>>>>> 1.10e+03   52 8.78e+02 100
>>>>>>>>> SNESSetUp              1 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>>>> SNESFunctionEval       2 1.0 1.7097e+01 1.0 1.60e+07 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     1       0      0
>>>>>>>>> 0.00e+00    6 1.92e+02  0
>>>>>>>>> SNESJacobianEval       1 1.0 1.6213e+01 1.0 2.80e+07 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     2       0      0
>>>>>>>>> 0.00e+00    1 3.20e+01  0
>>>>>>>>> SNESLineSearch         1 1.0 8.5582e+00 1.0 1.24e+08 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0    14   64153      1
>>>>>>>>> 3.20e+01    3 9.61e+01 94
>>>>>>>>> PCGAMGGraph_AGG        5 1.0 3.0509e+00 1.0 8.19e+07 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    27       0      5
>>>>>>>>> 3.49e+01    9 7.43e+01  0
>>>>>>>>> PCGAMGCoarse_AGG       5 1.0 3.8711e+00 1.0 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0       0      0
>>>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>>>> PCGAMGProl_AGG         5 1.0 7.0748e-01 1.0 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>>>> PCGAMGPOpt_AGG         5 1.0 1.2904e+00 1.0 2.14e+09 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0  1661   29807     26
>>>>>>>>> 7.15e+02   20 2.90e+02 99
>>>>>>>>> GAMG: createProl       5 1.0 8.9489e+00 1.0 2.22e+09 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  4  6  0  0  0   4  6  0  0  0   249   29666     31
>>>>>>>>> 7.50e+02   29 3.64e+02 96
>>>>>>>>>   Graph               10 1.0 3.0478e+00 1.0 8.19e+07 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0    27       0      5
>>>>>>>>> 3.49e+01    9 7.43e+01  0
>>>>>>>>>   MIS/Agg              5 1.0 4.1290e-01 1.0 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>>>>   SA: col data         5 1.0 1.9127e-02 1.0 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>>>>   SA: frmProl0         5 1.0 6.2662e-01 1.0 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0
>>>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>>>>   SA: smooth           5 1.0 4.9595e-01 1.0 1.21e+08 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   244    2709     15
>>>>>>>>> 1.97e+02   15 2.55e+02 90
>>>>>>>>> GAMG: partLevel        5 1.0 4.7330e-01 1.0 6.98e+08 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  1475    4120      5
>>>>>>>>> 1.78e+02   10 2.55e+02 100
>>>>>>>>> PCGAMG Squ l00         1 1.0 2.6027e+00 1.0 0.00e+00 0.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      0
>>>>>>>>> 0.00e+00    0 0.00e+00  0
>>>>>>>>> PCGAMG Gal l00         1 1.0 3.8406e-01 1.0 5.48e+08 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1426    4270      1
>>>>>>>>> 1.48e+02    2 2.11e+02 100
>>>>>>>>> PCGAMG Opt l00         1 1.0 2.4932e-01 1.0 7.20e+07 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   289    2653      1
>>>>>>>>> 6.41e+01    1 1.13e+02 100
>>>>>>>>> PCGAMG Gal l01         1 1.0 6.6279e-02 1.0 1.09e+08 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1645    3851      1
>>>>>>>>> 2.40e+01    2 3.64e+01 100
>>>>>>>>> PCGAMG Opt l01         1 1.0 2.9544e-02 1.0 7.15e+06 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   242    1671      1
>>>>>>>>> 4.84e+00    1 1.23e+01 100
>>>>>>>>> PCGAMG Gal l02         1 1.0 1.8874e-02 1.0 3.72e+07 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1974    3636      1
>>>>>>>>> 5.04e+00    2 6.58e+00 100
>>>>>>>>> PCGAMG Opt l02         1 1.0 7.4353e-03 1.0 2.40e+06 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   323    1457      1
>>>>>>>>> 7.71e-01    1 2.30e+00 100
>>>>>>>>> PCGAMG Gal l03         1 1.0 2.8479e-03 1.0 4.10e+06 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1440    2266      1
>>>>>>>>> 4.44e-01    2 5.51e-01 100
>>>>>>>>> PCGAMG Opt l03         1 1.0 8.2684e-04 1.0 2.80e+05 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   339    1667      1
>>>>>>>>> 6.72e-02    1 2.03e-01 100
>>>>>>>>> PCGAMG Gal l04         1 1.0 1.2238e-03 1.0 2.09e+05 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   170     244      1
>>>>>>>>> 2.05e-02    2 2.53e-02 100
>>>>>>>>> PCGAMG Opt l04         1 1.0 4.1008e-04 1.0 1.77e+04 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    43     165      1
>>>>>>>>> 4.49e-03    1 1.19e-02 100
>>>>>>>>> PCSetUp                2 1.0 9.9632e+00 1.0 4.95e+09 1.0 0.0e+00
>>>>>>>>> 0.0e+00 0.0e+00  5 12  0  0  0   5 12  0  0  0   496   17826     55
>>>>>>>>> 1.03e+03   45 6.54e+02 98
>>>>>>>>> PCSetUpOnBlocks       44 1.0 9.9087e-04 1.0 2.88e+03 1.0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> The point of lazy initialization is to make it possible to run a
>>>>>>>>>> solve that doesn't use a GPU in PETSC_ARCH that supports GPUs, regardless
>>>>>>>>>> of whether a GPU is actually present.
>>>>>>>>>>
>>>>>>>>>> Fande Kong <fdkong.jd at gmail.com> writes:
>>>>>>>>>>
>>>>>>>>>> > I spoke too soon. It seems that we have trouble creating
>>>>>>>>>> cuda/kokkos vecs
>>>>>>>>>> > now. Got Segmentation fault.
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> >
>>>>>>>>>> > Fande
>>>>>>>>>> >
>>>>>>>>>> > Program received signal SIGSEGV, Segmentation fault.
>>>>>>>>>> > 0x00002aaab5558b11 in
>>>>>>>>>> >
>>>>>>>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize
>>>>>>>>>> > (this=0x1) at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54
>>>>>>>>>> > 54 PetscErrorCode
>>>>>>>>>> CUPMDevice<T>::CUPMDeviceInternal::initialize() noexcept
>>>>>>>>>> > Missing separate debuginfos, use: debuginfo-install
>>>>>>>>>> > bzip2-libs-1.0.6-13.el7.x86_64
>>>>>>>>>> elfutils-libelf-0.176-5.el7.x86_64
>>>>>>>>>> > elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64
>>>>>>>>>> > libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64
>>>>>>>>>> > libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64
>>>>>>>>>> > libibmad-5.4.0.MLNX20190423.1d917ae-0.1.49224.x86_64
>>>>>>>>>> > libibumad-43.1.1.MLNX20200211.078947f-0.1.49224.x86_64
>>>>>>>>>> > libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64
>>>>>>>>>> > libmlx4-41mlnx1-OFED.4.7.3.0.3.49224.x86_64
>>>>>>>>>> > libmlx5-41mlnx1-OFED.4.9.0.1.2.49224.x86_64
>>>>>>>>>> libnl3-3.2.28-4.el7.x86_64
>>>>>>>>>> > librdmacm-41mlnx1-OFED.4.7.3.0.6.49224.x86_64
>>>>>>>>>> > librxe-41mlnx1-OFED.4.4.2.4.6.49224.x86_64
>>>>>>>>>> libxcb-1.13-1.el7.x86_64
>>>>>>>>>> > libxml2-2.9.1-6.el7_9.6.x86_64 numactl-libs-2.0.12-5.el7.x86_64
>>>>>>>>>> > systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64
>>>>>>>>>> > zlib-1.2.7-19.el7_9.x86_64
>>>>>>>>>> > (gdb) bt
>>>>>>>>>> > #0  0x00002aaab5558b11 in
>>>>>>>>>> >
>>>>>>>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize
>>>>>>>>>> > (this=0x1) at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54
>>>>>>>>>> > #1  0x00002aaab5558db7 in
>>>>>>>>>> > Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::getDevice
>>>>>>>>>> > (this=this at entry=0x2aaab7f37b70
>>>>>>>>>> > <CUDADevice>, device=0x115da00, id=-35, id at entry=-1) at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:344
>>>>>>>>>> > #2  0x00002aaab55577de in PetscDeviceCreate (type=type at entry
>>>>>>>>>> =PETSC_DEVICE_CUDA,
>>>>>>>>>> > devid=devid at entry=-1, device=device at entry=0x2aaab7f37b48
>>>>>>>>>> > <defaultDevices+8>) at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:107
>>>>>>>>>> > #3  0x00002aaab5557b3a in
>>>>>>>>>> PetscDeviceInitializeDefaultDevice_Internal
>>>>>>>>>> > (type=type at entry=PETSC_DEVICE_CUDA,
>>>>>>>>>> defaultDeviceId=defaultDeviceId at entry=-1)
>>>>>>>>>> > at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:273
>>>>>>>>>> > #4  0x00002aaab5557bf6 in PetscDeviceInitialize
>>>>>>>>>> > (type=type at entry=PETSC_DEVICE_CUDA)
>>>>>>>>>> > at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:234
>>>>>>>>>> > #5  0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/seq/seqcuda/veccuda.c:244
>>>>>>>>>> > #6  0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
>>>>>>>>>> > method=method at entry=0x2aaab70b45b8 "seqcuda") at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93
>>>>>>>>>> > #7  0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/mpi/mpicuda/
>>>>>>>>>> > mpicuda.cu:214
>>>>>>>>>> > #8  0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
>>>>>>>>>> > method=method at entry=0x7fffffff9260 "cuda") at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93
>>>>>>>>>> > #9  0x00002aaab5648bf1 in VecSetTypeFromOptions_Private
>>>>>>>>>> (vec=0x115d150,
>>>>>>>>>> > PetscOptionsObject=0x7fffffff9210) at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1263
>>>>>>>>>> > #10 VecSetFromOptions (vec=0x115d150) at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1297
>>>>>>>>>> > #11 0x00002aaab02ef227 in libMesh::PetscVector<double>::init
>>>>>>>>>> > (this=0x11cd1a0, n=441, n_local=441, fast=false,
>>>>>>>>>> ptype=libMesh::PARALLEL)
>>>>>>>>>> > at
>>>>>>>>>> >
>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/scripts/../libmesh/installed/include/libmesh/petsc_vector.h:693
>>>>>>>>>> >
>>>>>>>>>> > On Thu, Jan 20, 2022 at 1:09 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> >> Thanks, Jed,
>>>>>>>>>> >>
>>>>>>>>>> >> This worked!
>>>>>>>>>> >>
>>>>>>>>>> >> Fande
>>>>>>>>>> >>
>>>>>>>>>> >> On Wed, Jan 19, 2022 at 11:03 PM Jed Brown <jed at jedbrown.org>
>>>>>>>>>> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >>> Fande Kong <fdkong.jd at gmail.com> writes:
>>>>>>>>>> >>>
>>>>>>>>>> >>> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch <
>>>>>>>>>> >>> jacob.fai at gmail.com>
>>>>>>>>>> >>> > wrote:
>>>>>>>>>> >>> >
>>>>>>>>>> >>> >> Are you running on login nodes or compute nodes (I can’t
>>>>>>>>>> seem to tell
>>>>>>>>>> >>> from
>>>>>>>>>> >>> >> the configure.log)?
>>>>>>>>>> >>> >>
>>>>>>>>>> >>> >
>>>>>>>>>> >>> > I was compiling codes on login nodes, and running codes on
>>>>>>>>>> compute
>>>>>>>>>> >>> nodes.
>>>>>>>>>> >>> > Login nodes do not have GPUs, but compute nodes do have
>>>>>>>>>> GPUs.
>>>>>>>>>> >>> >
>>>>>>>>>> >>> > Just to be clear, the same thing (code, machine) with
>>>>>>>>>> PETSc-3.16.1
>>>>>>>>>> >>> worked
>>>>>>>>>> >>> > perfectly. I have this trouble with PETSc-main.
>>>>>>>>>> >>>
>>>>>>>>>> >>> I assume you can
>>>>>>>>>> >>>
>>>>>>>>>> >>> export PETSC_OPTIONS='-device_enable lazy'
>>>>>>>>>> >>>
>>>>>>>>>> >>> and it'll work.
>>>>>>>>>> >>>
>>>>>>>>>> >>> I think this should be the default. The main complaint is
>>>>>>>>>> that timing the
>>>>>>>>>> >>> first GPU-using event isn't accurate if it includes
>>>>>>>>>> initialization, but I
>>>>>>>>>> >>> think this is mostly hypothetical because you can't trust any
>>>>>>>>>> timing that
>>>>>>>>>> >>> doesn't preload in some form and the first GPU-using event
>>>>>>>>>> will almost
>>>>>>>>>> >>> always be something uninteresting so I think it will rarely
>>>>>>>>>> lead to
>>>>>>>>>> >>> confusion. Meanwhile, eager initialization is viscerally
>>>>>>>>>> disruptive for
>>>>>>>>>> >>> lots of people.
>>>>>>>>>> >>>
>>>>>>>>>> >>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>>> experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>
>>>>>>>>
>>>>>>> <configure_bad.log><configure_good.log>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220126/396be8d7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make.log
Type: application/octet-stream
Size: 110081 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220126/396be8d7/attachment-0001.obj>


More information about the petsc-users mailing list