[petsc-users] hypre / hip usage

Mark Adams mfadams at lbl.gov
Sun Jan 23 08:55:18 CST 2022


Stefano and Matt, This segv looks like a Plexism.

+ srun -n1 -N1 --ntasks-per-gpu=1 --gpu-bind=closest ../ex13
-dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid
1,1,1 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1
-dm_refine 2 -dm_view -malloc_debug -log_trace -pc_type hypre -dm_vec_type
hip -dm_mat_type hypre
+ tee out_001_kokkos_Crusher_2_8_hypre.txt
 [0] 1.293e-06 Event begin: DMPlexSymmetrize
[0] 8.9463e-05 Event end: DMPlexSymmetrize
   .....
[0] 0.554529 Event end: VecHIPCopyFrom
[0] 0.559891 Event begin: DMCreateInterp
 [0] 0.560603 Event begin: DMPlexInterpFE
   [0] 0.566707 Event begin: MatAssemblyBegin
     [0] 0.566962 Event begin: BuildTwoSidedF
       [0] 0.567068 Event begin: BuildTwoSided
       [0] 0.567119 Event end: BuildTwoSided
     [0] 0.567154 Event end: BuildTwoSidedF
   [0] 0.567162 Event end: MatAssemblyBegin
   [0] 0.567164 Event begin: MatAssemblyEnd
   [0] 0.567356 Event end: MatAssemblyEnd
   [0] 0.572884 Event begin: MatAssemblyBegin
   [0] 0.57289 Event end: MatAssemblyBegin
   [0] 0.572892 Event begin: MatAssemblyEnd
   [0] 0.573978 Event end: MatAssemblyEnd
   [0] 0.574428 Event begin: MatZeroEntries
   [0] 0.579998 Event end: MatZeroEntries
:0:rocdevice.cpp            :2589: 257935591316 us: Device::callbackQueue
aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to
access an inaccessible address. code: 0x2b
srun: error: crusher001: task 0: Aborted
srun: launch/slurm: _step_signal: Terminating StepId=65929.4
+ date
Sun 23 Jan 2022 09:46:55 AM EST

On Sun, Jan 23, 2022 at 8:15 AM Mark Adams <mfadams at lbl.gov> wrote:

> Thanks,
> '-mat_type hypre' was failing for me. I could not find a test that used it
> and I was not sure it was considered functional.
> I will look at it again and collect a bug report if needed.
>
> On Sat, Jan 22, 2022 at 11:31 AM Stefano Zampini <
> stefano.zampini at gmail.com> wrote:
>
>> Mark
>>
>> the two options are only there to test the code in CI, and are not needed
>> in general
>>
>>    '--download-hypre-configure-arguments=--enable-unified-memory',
>> This is only here to test the unified memory code path
>>
>>     '--with-hypre-gpuarch=gfx90a',
>> This is not needed if rocminfo is in PATH
>>
>> Our interface code with HYPRE GPU works fine for HIP, it is tested in CI.
>> The -mat_type hypre assembling for ex19 does not work because ex19 uses
>> FDColoring. Just assemble in mpiaij format (look at  runex19_hypre_hip in
>> src/snes/tutorials/makefile); the interface code will copy the matrix to
>> the GPU
>>
>> Il giorno ven 21 gen 2022 alle ore 19:24 Mark Adams <mfadams at lbl.gov> ha
>> scritto:
>>
>>>
>>>
>>> On Fri, Jan 21, 2022 at 11:14 AM Jed Brown <jed at jedbrown.org> wrote:
>>>
>>>> "Paul T. Bauman" <ptbauman at gmail.com> writes:
>>>>
>>>> > On Fri, Jan 21, 2022 at 8:52 AM Paul T. Bauman <ptbauman at gmail.com>
>>>> wrote:
>>>> >> Yes. The way HYPRE's memory model is setup is that ALL GPU
>>>> allocations are
>>>> >> "native" (i.e. [cuda,hip]Malloc) or, if unified memory is enabled,
>>>> then ALL
>>>> >> GPU allocations are unified memory (i.e. [cuda,hip]MallocManaged).
>>>> >> Regarding HIP, there is an HMM implementation of hipMallocManaged
>>>> planned,
>>>> >> but is it not yet delivered AFAIK (and it will *not* support gfx906,
>>>> e.g.
>>>> >> RVII, FYI), so, today, under the covers, hipMallocManaged is calling
>>>> >> hipHostMalloc. So, today, all your unified memory allocations in
>>>> HYPRE on
>>>> >> HIP are doing CPU-pinned memory accesses. And performance is just
>>>> truly
>>>> >> terrible (as you might expect).
>>>>
>>>> Thanks for this important bit of information.
>>>>
>>>> And it sounds like when we add support to hand off Kokkos matrices and
>>>> vectors (our current support for matrices on ROCm devices uses Kokkos) or
>>>> add direct support for hipSparse, we'll avoid touching host memory in
>>>> assembly-to-solve with hypre.
>>>>
>>>
>>> It does not look like anyone has made Hypre work with HIP. Stafano added
>>> a runex19_hypre_hip target 4 months ago and hypre.py has some HIP things.
>>>
>>> I have a user that would like to try this, no hurry but, can I get an
>>> idea of a plan for this?
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>
>>
>> --
>> Stefano
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220123/82a390b8/attachment.html>


More information about the petsc-users mailing list