[petsc-users] hypre / hip usage

Jed Brown jed at jedbrown.org
Mon Jan 24 09:53:03 CST 2022


"Paul T. Bauman" <ptbauman at gmail.com> writes:

> 1. `rocgdb` will be in your PATH when the `rocm` module is loaded. This is
> gdb, but with some extra AMDGPU goodies. AFAIK, you cannot, yet, do
> stepping through a kernel in the source (only the ISA), but you can query
> device variables in host code, print their values, etc.
> 1a. Note that multiple threads can be spawned by the HIP runtime.
> Furthermore, it's likely the thread you'll be on when you catch the error
> is (one of) the runtime thread(s). You'll need to do `info threads` and
> then select your host thread to get back to it.
> 2. To get an accurate stacktrace (meaning get the line in the host code
> where the error is actually happening), I recommend setting the following
> environment variables for debugging that will force the serialization of
> async memcopies and kernel launches:
> AMD_SERIALIZE_KERNEL = 3
> AMD_SERIALIZE_COPY=3

Is there a tutorial on this? I bet a 10-minute screencast demo would make a big impact in the use of these tools.

AMD_SERIALIZE_COPY isn't documented at all and AMD_SERIALIZE_KERNEL isn't mentioned in this context.

https://rocmdocs.amd.com/en/latest/search.html?q=amd_serialize_copy&check_keywords=yes&area=default


More information about the petsc-users mailing list