[petsc-users] hypre / hip usage

Mark Adams mfadams at lbl.gov
Mon Jan 24 09:31:27 CST 2022


Thanks Paul,

How do I get a stack trace? I have been relying on PETSc's which piggybacks
on timers so it is not getting too deep here.

On Mon, Jan 24, 2022 at 10:16 AM Paul T. Bauman <ptbauman at gmail.com> wrote:

> On Mon, Jan 24, 2022 at 8:53 AM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Mon, Jan 24, 2022 at 9:24 AM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> What is the fastest way to rebuild hypre? reconfiguring did not work and
>>> is slow.
>>>
>>> I am printf debugging to find this HSA_STATUS_ERROR_MEMORY_FAULT  (no
>>> debuggers other than valgrind on Crusher??!?!)
>>>
>>
> Again, apologies for interjecting, but I wanted to offer a few pointers
> here.
>
> 1. `rocgdb` will be in your PATH when the `rocm` module is loaded. This is
> gdb, but with some extra AMDGPU goodies. AFAIK, you cannot, yet, do
> stepping through a kernel in the source (only the ISA), but you can query
> device variables in host code, print their values, etc.
> 1a. Note that multiple threads can be spawned by the HIP runtime.
> Furthermore, it's likely the thread you'll be on when you catch the error
> is (one of) the runtime thread(s). You'll need to do `info threads` and
> then select your host thread to get back to it.
> 2. To get an accurate stacktrace (meaning get the line in the host code
> where the error is actually happening), I recommend setting the following
> environment variables for debugging that will force the serialization of
> async memcopies and kernel launches:
> AMD_SERIALIZE_KERNEL = 3
> AMD_SERIALIZE_COPY=3
>
> Thanks,
>
> Paul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220124/e4554e81/attachment.html>


More information about the petsc-users mailing list