[petsc-dev] kokkos hang after rebase over main, maybe, Crusher

Mark Adams mfadams at lbl.gov
Wed May 25 07:20:55 CDT 2022


transient failure. works now.

On Tue, May 24, 2022 at 10:13 PM Mark Adams <mfadams at lbl.gov> wrote:

> I was working on Crusher yesterday and I think I rebased over main and now
> I am hanging here.
>
> Any ideas?
>
> (gdb) bt
> #0  0x00007fff81bd5547 in sched_yield () from /lib64/libc.so.6
> #1  0x00007fff79e43665 in ?? () from
> /opt/rocm-5.1.0/hsa/lib/libhsa-runtime64.so.1
> #2  0x00007fff79e382f4 in ?? () from
> /opt/rocm-5.1.0/hsa/lib/libhsa-runtime64.so.1
> #3  0x00007fff79e46bff in ?? () from
> /opt/rocm-5.1.0/hsa/lib/libhsa-runtime64.so.1
> #4  0x00007fff79e7cd07 in ?? () from
> /opt/rocm-5.1.0/hsa/lib/libhsa-runtime64.so.1
> #5  0x00007fff79e8183d in ?? () from
> /opt/rocm-5.1.0/hsa/lib/libhsa-runtime64.so.1
> #6  0x00007fff79e822a3 in ?? () from
> /opt/rocm-5.1.0/hsa/lib/libhsa-runtime64.so.1
> #7  0x00007fff79e553a4 in ?? () from
> /opt/rocm-5.1.0/hsa/lib/libhsa-runtime64.so.1
> #8  0x00007fff8a837f0b in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #9  0x00007fff8a7f6843 in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #10 0x00007fff8a7f68af in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #11 0x00007fff8a8205c4 in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #12 0x00007fff8a7f5180 in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #13 0x00007fff8a82e261 in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #14 0x00007fff8a872100 in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #15 0x00007fff8a843efd in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #16 0x00007fff8a82a253 in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #17 0x00007fff8a818890 in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #18 0x00007fff8a7a09f1 in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #19 0x00007fff8a7a0d00 in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #20 0x00007fff8a676f4e in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #21 0x00007fff8a726575 in hipMemcpy () from
> /opt/rocm-5.1.0/lib/libamdhip64.so.5
> #22 0x00007fffec390194 in
> Petsc::Device::CUPM::Impl::Interface<(Petsc::Device::CUPM::DeviceType)1>::PetscAliasFunctionDispatch_460_hipMemcpy<int*&,
> int const (&) [2], unsigned long const&, hipMemcpyKind const&>
> (args=<optimized out>, args=<optimized out>, args=<optimized out>,
> args=<optimized out>)
>     at include/petsc/private/cupminterface.hpp:460
> #23
> Petsc::Device::CUPM::Impl::Interface<(Petsc::Device::CUPM::DeviceType)1>::cupmMemcpy<int*&,
> int const (&) [2], unsigned long const&, hipMemcpyKind const&>
> (args=<optimized out>, args=<optimized out>, args=<optimized out>,
> args=<optimized out>) at include/petsc/private/cupminterface.hpp:460
> #24
> Petsc::Device::CUPM::Device<(Petsc::Device::CUPM::DeviceType)1>::DeviceInternal::CUPMAwareMPI_
> () at src/sys/objects/device/impls/cupm/cupmdevice.cxx:184
> #25 0x00007fffec38ee3d in
> Petsc::Device::CUPM::Device<(Petsc::Device::CUPM::DeviceType)1>::DeviceInternal::initialize
> (this=<optimized out>) at
> src/sys/objects/device/impls/cupm/cupmdevice.cxx:85
> #26 0x00007fffec38fea0 in
> Petsc::Device::CUPM::Device<(Petsc::Device::CUPM::DeviceType)1>::getDevice
> (this=0x7fffed56b928 <HIPDevice>, device=0xe76f10, id=<optimized out>) at
> src/sys/objects/device/impls/cupm/cupmdevice.cxx:410
> #27 0x00007fffec38cfa1 in PetscDeviceCreate (type=PETSC_DEVICE_HIP,
> devid=-1, device=device at entry=0x7fffed58b230 <defaultDevices+16>) at
> src/sys/objects/device/interface/device.cxx:130
> #28 0x00007fffec38d382 in PetscDeviceInitializeDefaultDevice_Internal
> (type=14723904, defaultDeviceId=0, defaultDeviceId at entry=-1) at
> src/sys/objects/device/interface/device.cxx:313
> #29 0x00007fffec38e157 in PetscDeviceInitialize (type=PETSC_DEVICE_HIP) at
> src/sys/objects/device/interface/device.cxx:274
> #30 PetscDeviceGetDefaultForType_Internal (type=PETSC_DEVICE_HIP,
> device=0x7fffffff49c0) at src/sys/objects/device/interface/device.cxx:493
> #31 0x00007fffec38aad7 in
> PetscDeviceContextSetDefaultDeviceForType_Internal (dctx=0xe58da0,
> type=14723904) at include/petsc/private/deviceimpl.h:241
> #32 0x00007fffec38b89d in PetscDeviceContextSetupGlobalContext_Private ()
> at src/sys/objects/device/interface/dcontext.cxx:674
> #33 PetscDeviceContextGetCurrentContext (dctx=0x7fffffff4a78) at
> src/sys/objects/device/interface/dcontext.cxx:705
> #34 0x00007fffec4a2a16 in PetscKokkosInitializeCheck () at
> src/sys/objects/kokkos/kinit.kokkos.cxx:31
> #35 0x00007fffec6c7f61 in VecCreate_SeqKokkos (v=0xe787f0) at
> src/vec/vec/impls/seq/kokkos/veckok.kokkos.cxx:1142
> #36 0x00007fffec58d37c in VecSetType (vec=0xe787f0, method=<optimized
> out>) at src/vec/vec/interface/vecreg.c:92
> #37 0x00007fffec6ac017 in VecCreate_Kokkos (v=0xe787f0) at
> src/vec/vec/impls/mpi/kokkos/mpikok.kokkos.cxx:476
> #38 0x00007fffec58d37c in VecSetType (vec=0xe787f0, method=<optimized
> out>) at src/vec/vec/interface/vecreg.c:92
> #39 0x00007fffecbc7d75 in DMCreateGlobalVector_Section_Private
> (dm=0xe068a0, vec=0x7fffffff5d80) at src/dm/interface/dmi.c:57
> #40 0x00007fffece8d6a0 in DMCreateGlobalVector_p4est (dm=0xe0ab40,
> vec=0x7fffffff5d80) at src/dm/impls/forest/p4est/pforest.h:4929
> #41 0x00007fffecba5d45 in DMCreateGlobalVector (dm=0xe068a0, vec=0x0,
> vec at entry=0x7fffffff5d80) at src/dm/interface/dm.c:997
> #42 0x00007fffed4292d1 in DMPlexLandauCreateVelocitySpace
> (comm=1140850688, dim=2, prefix=<optimized out>, X=X at entry=0x7fffffff5df8,
> J=J at entry=0x7fffffff5e40, pack=pack at entry=0x7fffffff5e60) at
> src/ts/utils/dmplexlandau/plexland.c:2086
> #43 0x0000000000205063 in main (argc=63, argv=0x7fffffff6368) at ex2.c:679
> (gdb)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220525/29024685/attachment.html>


More information about the petsc-dev mailing list