[petsc-users] hypre / hip usage

Mark Adams mfadams at lbl.gov
Mon Jan 24 13:42:55 CST 2022


On Sat, Jan 22, 2022 at 11:31 AM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> Mark
>
> the two options are only there to test the code in CI, and are not needed
> in general
>
>    '--download-hypre-configure-arguments=--enable-unified-memory',
> This is only here to test the unified memory code path
>
>     '--with-hypre-gpuarch=gfx90a',
> This is not needed if rocminfo is in PATH
>
> Our interface code with HYPRE GPU works fine for HIP, it is tested in CI.
>

I think there may be a problem with Crusher (he says after trying to debug
this all day). I was not able to get to the error. rocgdb hung and I did
not manage to get a print statement from hypre.
It could be that this error happens _at_ the call
to HYPRE_IJMatrixAddToValues in PETSc (ie, it never gets to hypre code).
Not sure.
Oddly HYPRE_IJMatrixSetToValues worked in snes/ex56.
I didn't figure out how to simple do this until now:

14:21 adams/aijkokkos-gpu-logging *=
crusher:/gpfs/alpine/csc314/scratch/adams/petsc$ make -f gmakefile
PETSC_ARCH=arch-olcf-crusher-g test
search='ksp_ksp_tutorials-ex55_hypre_device'
Using MAKEFLAGS: -- search=ksp_ksp_tutorials-ex55_hypre_device
PETSC_ARCH=arch-olcf-crusher-g
        TEST
arch-olcf-crusher-g/tests/counts/ksp_ksp_tutorials-ex55_hypre_device.counts
# retrying ksp_ksp_tutorials-ex55_hypre_device
not ok ksp_ksp_tutorials-ex55_hypre_device # Error code: 134
# :0:rocdevice.cpp            :2589: 360810731350 us: Device::callbackQueue
aborting with error : HSA_STATUS_ERROR_INVALID_ISA: The instruction set
architecture is invalid. code: 0x100f
# :0:rocdevice.cpp            :2589: 360810732560 us: Device::callbackQueue
aborting with error : HSA_STATUS_ERROR_INVALID_ISA: The instruction set
architecture is invalid. code: 0x100f
# :0:rocdevice.cpp            :2589: 360810735300 us: Device::callbackQueue
aborting with error : HSA_STATUS_ERROR_INVALID_ISA: The instruction set
architecture is invalid. code: 0x100f
# :0:rocdevice.cpp            :2589: 360810736352 us: Device::callbackQueue
aborting with error : HSA_STATUS_ERROR_INVALID_ISA: The instruction set
architecture is invalid. code: 0x100f
# srun: error: crusher002: tasks 0-3: Aborted
# srun: launch/slurm: _step_signal: Terminating StepId=66195.4
 ok ksp_ksp_tutorials-ex55_hypre_device # SKIP Command failed so no diff


# FAILED ksp_ksp_tutorials-ex55_hypre_device
#
# To rerun failed tests:
#     /usr/bin/gmake -f gmakefile test test-fail=1




> The -mat_type hypre assembling for ex19 does not work because ex19 uses
> FDColoring. Just assemble in mpiaij format (look at  runex19_hypre_hip in
> src/snes/tutorials/makefile); the interface code will copy the matrix to
> the GPU
>
> Il giorno ven 21 gen 2022 alle ore 19:24 Mark Adams <mfadams at lbl.gov> ha
> scritto:
>
>>
>>
>> On Fri, Jan 21, 2022 at 11:14 AM Jed Brown <jed at jedbrown.org> wrote:
>>
>>> "Paul T. Bauman" <ptbauman at gmail.com> writes:
>>>
>>> > On Fri, Jan 21, 2022 at 8:52 AM Paul T. Bauman <ptbauman at gmail.com>
>>> wrote:
>>> >> Yes. The way HYPRE's memory model is setup is that ALL GPU
>>> allocations are
>>> >> "native" (i.e. [cuda,hip]Malloc) or, if unified memory is enabled,
>>> then ALL
>>> >> GPU allocations are unified memory (i.e. [cuda,hip]MallocManaged).
>>> >> Regarding HIP, there is an HMM implementation of hipMallocManaged
>>> planned,
>>> >> but is it not yet delivered AFAIK (and it will *not* support gfx906,
>>> e.g.
>>> >> RVII, FYI), so, today, under the covers, hipMallocManaged is calling
>>> >> hipHostMalloc. So, today, all your unified memory allocations in
>>> HYPRE on
>>> >> HIP are doing CPU-pinned memory accesses. And performance is just
>>> truly
>>> >> terrible (as you might expect).
>>>
>>> Thanks for this important bit of information.
>>>
>>> And it sounds like when we add support to hand off Kokkos matrices and
>>> vectors (our current support for matrices on ROCm devices uses Kokkos) or
>>> add direct support for hipSparse, we'll avoid touching host memory in
>>> assembly-to-solve with hypre.
>>>
>>
>> It does not look like anyone has made Hypre work with HIP. Stafano added
>> a runex19_hypre_hip target 4 months ago and hypre.py has some HIP things.
>>
>> I have a user that would like to try this, no hurry but, can I get an
>> idea of a plan for this?
>>
>> Thanks,
>> Mark
>>
>>
>
>
> --
> Stefano
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220124/7dc393a3/attachment.html>


More information about the petsc-users mailing list