<div dir="ltr">I spoke too soon. It seems that we have trouble creating cuda/kokkos vecs now. Got Segmentation fault.<div><br></div><div>Thanks,<br><br>Fande<br><br>Program received signal SIGSEGV, Segmentation fault.<br>0x00002aaab5558b11 in Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize (this=0x1) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54<br>54 PetscErrorCode CUPMDevice<T>::CUPMDeviceInternal::initialize() noexcept<br>Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.176-5.el7.x86_64 elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64 libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64 libibmad-5.4.0.MLNX20190423.1d917ae-0.1.49224.x86_64 libibumad-43.1.1.MLNX20200211.078947f-0.1.49224.x86_64 libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64 libmlx4-41mlnx1-OFED.4.7.3.0.3.49224.x86_64 libmlx5-41mlnx1-OFED.4.9.0.1.2.49224.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-41mlnx1-OFED.4.7.3.0.6.49224.x86_64 librxe-41mlnx1-OFED.4.4.2.4.6.49224.x86_64 libxcb-1.13-1.el7.x86_64 libxml2-2.9.1-6.el7_9.6.x86_64 numactl-libs-2.0.12-5.el7.x86_64 systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-19.el7_9.x86_64<br>(gdb) bt<br>#0 0x00002aaab5558b11 in Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize (this=0x1) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54<br>#1 0x00002aaab5558db7 in Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::getDevice (this=this@entry=0x2aaab7f37b70 <CUDADevice>, device=0x115da00, id=-35, id@entry=-1) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:344<br>#2 0x00002aaab55577de in PetscDeviceCreate (type=type@entry=PETSC_DEVICE_CUDA, devid=devid@entry=-1, device=device@entry=0x2aaab7f37b48 <defaultDevices+8>) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:107<br>#3 0x00002aaab5557b3a in PetscDeviceInitializeDefaultDevice_Internal (type=type@entry=PETSC_DEVICE_CUDA, defaultDeviceId=defaultDeviceId@entry=-1) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:273<br>#4 0x00002aaab5557bf6 in PetscDeviceInitialize (type=type@entry=PETSC_DEVICE_CUDA) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:234<br>#5 0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/seq/seqcuda/veccuda.c:244<br>#6 0x00002aaab5649b40 in VecSetType (vec=vec@entry=0x115d150, method=method@entry=0x2aaab70b45b8 "seqcuda") at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93<br>#7 0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/mpi/mpicuda/<a href="http://mpicuda.cu:214">mpicuda.cu:214</a><br>#8 0x00002aaab5649b40 in VecSetType (vec=vec@entry=0x115d150, method=method@entry=0x7fffffff9260 "cuda") at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93<br>#9 0x00002aaab5648bf1 in VecSetTypeFromOptions_Private (vec=0x115d150, PetscOptionsObject=0x7fffffff9210) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1263<br>#10 VecSetFromOptions (vec=0x115d150) at /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1297<br>#11 0x00002aaab02ef227 in libMesh::PetscVector<double>::init (this=0x11cd1a0, n=441, n_local=441, fast=false, ptype=libMesh::PARALLEL) at /home/kongf/workhome/sawtooth/moosegpu/scripts/../libmesh/installed/include/libmesh/petsc_vector.h:693<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 20, 2022 at 1:09 PM Fande Kong <<a href="mailto:fdkong.jd@gmail.com">fdkong.jd@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks, Jed,<div><br></div><div>This worked!</div><div><br></div><div>Fande</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 19, 2022 at 11:03 PM Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Fande Kong <<a href="mailto:fdkong.jd@gmail.com" target="_blank">fdkong.jd@gmail.com</a>> writes:<br>
<br>
> On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch <<a href="mailto:jacob.fai@gmail.com" target="_blank">jacob.fai@gmail.com</a>><br>
> wrote:<br>
><br>
>> Are you running on login nodes or compute nodes (I can’t seem to tell from<br>
>> the configure.log)?<br>
>><br>
><br>
> I was compiling codes on login nodes, and running codes on compute nodes.<br>
> Login nodes do not have GPUs, but compute nodes do have GPUs.<br>
><br>
> Just to be clear, the same thing (code, machine) with PETSc-3.16.1 worked<br>
> perfectly. I have this trouble with PETSc-main.<br>
<br>
I assume you can<br>
<br>
export PETSC_OPTIONS='-device_enable lazy'<br>
<br>
and it'll work.<br>
<br>
I think this should be the default. The main complaint is that timing the first GPU-using event isn't accurate if it includes initialization, but I think this is mostly hypothetical because you can't trust any timing that doesn't preload in some form and the first GPU-using event will almost always be something uninteresting so I think it will rarely lead to confusion. Meanwhile, eager initialization is viscerally disruptive for lots of people.<br>
</blockquote></div>
</blockquote></div>