[petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

Fande Kong fdkong.jd at gmail.com
Thu Jan 20 14:29:20 CST 2022

I spoke too soon. It seems that we have trouble creating cuda/kokkos vecs
now. Got Segmentation fault.



Program received signal SIGSEGV, Segmentation fault.
0x00002aaab5558b11 in
(this=0x1) at
54 PetscErrorCode CUPMDevice<T>::CUPMDeviceInternal::initialize() noexcept
Missing separate debuginfos, use: debuginfo-install
bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.176-5.el7.x86_64
elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64
libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64
libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64
libmlx5-41mlnx1-OFED. libnl3-3.2.28-4.el7.x86_64
librxe-41mlnx1-OFED. libxcb-1.13-1.el7.x86_64
libxml2-2.9.1-6.el7_9.6.x86_64 numactl-libs-2.0.12-5.el7.x86_64
systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64
(gdb) bt
#0  0x00002aaab5558b11 in
(this=0x1) at
#1  0x00002aaab5558db7 in
(this=this at entry=0x2aaab7f37b70
<CUDADevice>, device=0x115da00, id=-35, id at entry=-1) at
#2  0x00002aaab55577de in PetscDeviceCreate (type=type at entry=PETSC_DEVICE_CUDA,
devid=devid at entry=-1, device=device at entry=0x2aaab7f37b48
<defaultDevices+8>) at
#3  0x00002aaab5557b3a in PetscDeviceInitializeDefaultDevice_Internal
(type=type at entry=PETSC_DEVICE_CUDA, defaultDeviceId=defaultDeviceId at entry=-1)
#4  0x00002aaab5557bf6 in PetscDeviceInitialize
(type=type at entry=PETSC_DEVICE_CUDA)
#5  0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) at
#6  0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
method=method at entry=0x2aaab70b45b8 "seqcuda") at
#7  0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at
#8  0x00002aaab5649b40 in VecSetType (vec=vec at entry=0x115d150,
method=method at entry=0x7fffffff9260 "cuda") at
#9  0x00002aaab5648bf1 in VecSetTypeFromOptions_Private (vec=0x115d150,
PetscOptionsObject=0x7fffffff9210) at
#10 VecSetFromOptions (vec=0x115d150) at
#11 0x00002aaab02ef227 in libMesh::PetscVector<double>::init
(this=0x11cd1a0, n=441, n_local=441, fast=false, ptype=libMesh::PARALLEL)

On Thu, Jan 20, 2022 at 1:09 PM Fande Kong <fdkong.jd at gmail.com> wrote:

> Thanks, Jed,
> This worked!
> Fande
> On Wed, Jan 19, 2022 at 11:03 PM Jed Brown <jed at jedbrown.org> wrote:
>> Fande Kong <fdkong.jd at gmail.com> writes:
>> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch <
>> jacob.fai at gmail.com>
>> > wrote:
>> >
>> >> Are you running on login nodes or compute nodes (I can’t seem to tell
>> from
>> >> the configure.log)?
>> >>
>> >
>> > I was compiling codes on login nodes, and running codes on compute
>> nodes.
>> > Login nodes do not have GPUs, but compute nodes do have GPUs.
>> >
>> > Just to be clear, the same thing (code, machine) with PETSc-3.16.1
>> worked
>> > perfectly. I have this trouble with PETSc-main.
>> I assume you can
>> export PETSC_OPTIONS='-device_enable lazy'
>> and it'll work.
>> I think this should be the default. The main complaint is that timing the
>> first GPU-using event isn't accurate if it includes initialization, but I
>> think this is mostly hypothetical because you can't trust any timing that
>> doesn't preload in some form and the first GPU-using event will almost
>> always be something uninteresting so I think it will rarely lead to
>> confusion. Meanwhile, eager initialization is viscerally disruptive for
>> lots of people.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220120/3036746d/attachment-0001.html>

More information about the petsc-users mailing list