[petsc-users] hypre / hip usage

Paul T. Bauman ptbauman at gmail.com
Mon Jan 24 10:14:32 CST 2022


On Mon, Jan 24, 2022 at 9:53 AM Jed Brown <jed at jedbrown.org> wrote:

> "Paul T. Bauman" <ptbauman at gmail.com> writes:
>
> > 1. `rocgdb` will be in your PATH when the `rocm` module is loaded. This
> is
> > gdb, but with some extra AMDGPU goodies. AFAIK, you cannot, yet, do
> > stepping through a kernel in the source (only the ISA), but you can query
> > device variables in host code, print their values, etc.
> > 1a. Note that multiple threads can be spawned by the HIP runtime.
> > Furthermore, it's likely the thread you'll be on when you catch the error
> > is (one of) the runtime thread(s). You'll need to do `info threads` and
> > then select your host thread to get back to it.
> > 2. To get an accurate stacktrace (meaning get the line in the host code
> > where the error is actually happening), I recommend setting the following
> > environment variables for debugging that will force the serialization of
> > async memcopies and kernel launches:
> > AMD_SERIALIZE_KERNEL = 3
> > AMD_SERIALIZE_COPY=3
>
> Is there a tutorial on this? I bet a 10-minute screencast demo would make
> a big impact in the use of these tools.
>

The one that springs to mind is a 3-day (virtual) workshop from last May at
OLCF. There was a recent workshop on crusher that may also cover this.

https://www.olcf.ornl.gov/calendar/2021hip/
https://www.olcf.ornl.gov/wp-content/uploads/2021/04/rocgdb_hipmath_ornl_2021_v2.pdf

They recorded it, but I can't seem to find the recordings, not sure what
OLCF did with them. Justin did live demos of the debugger during his talk.
:(

AMD_SERIALIZE_COPY isn't documented at all and AMD_SERIALIZE_KERNEL isn't
> mentioned in this context.
>
>
> https://rocmdocs.amd.com/en/latest/search.html?q=amd_serialize_copy&check_keywords=yes&area=default


Sigh. This is a never-ending source of frustration on my end. Sorry, it is
really unacceptable. This link is probably the best description at this
moment:
https://github.com/ROCm-Developer-Tools/HIP/blob/develop/docs/markdown/hip_debugging.md
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220124/5fbbd6bb/attachment.html>


More information about the petsc-users mailing list