[petsc-users] hypre / hip usage

Matthew Knepley knepley at gmail.com
Mon Jan 24 08:52:58 CST 2022


On Mon, Jan 24, 2022 at 9:24 AM Mark Adams <mfadams at lbl.gov> wrote:

> What is the fastest way to rebuild hypre? reconfiguring did not work and
> is slow.
>
> I am printf debugging to find this HSA_STATUS_ERROR_MEMORY_FAULT  (no
> debuggers other than valgrind on Crusher??!?!) and I get to a hypre call:
>
>
> PetscStackCallStandard(HYPRE_IJMatrixAddToValues,(hA->ij,1,&hnc,(HYPRE_BigInt*)(rows+i),(HYPRE_BigInt*)cscr[0],sscr));
>
> This is from DMPlexComputeJacobian_Internal and MatSetClosure.
> HYPRE_IJMatrixAddToValues is successfully called in earlier parts of the
> run.
>

So MatSetClosure() just calls MatSetValues(). That should find any out of
range index. I guess it is possible that the element matrix storage is
invalid, but that is a hard thing to mess up. Hopefully, debugging shows
the SEGV in Hypre.

  Thanks,

    Matt


> The args look OK, so I am going into HYPRE_IJMatrixAddToValues.
>
> Thanks,
> Mark
>
>
>
> On Sun, Jan 23, 2022 at 9:55 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Stefano and Matt, This segv looks like a Plexism.
>>
>> + srun -n1 -N1 --ntasks-per-gpu=1 --gpu-bind=closest ../ex13
>> -dm_plex_box_faces 2,2,2 -petscpartitioner_simple_process_grid
>> 1,1,1 -dm_plex_box_upper 1,1,1 -petscpartitioner_simple_node_grid 1,1,1
>> -dm_refine 2 -dm_view -malloc_debug -log_trace -pc_type hypre -dm_vec_type
>> hip -dm_mat_type hypre
>> + tee out_001_kokkos_Crusher_2_8_hypre.txt
>>  [0] 1.293e-06 Event begin: DMPlexSymmetrize
>> [0] 8.9463e-05 Event end: DMPlexSymmetrize
>>    .....
>> [0] 0.554529 Event end: VecHIPCopyFrom
>> [0] 0.559891 Event begin: DMCreateInterp
>>  [0] 0.560603 Event begin: DMPlexInterpFE
>>    [0] 0.566707 Event begin: MatAssemblyBegin
>>      [0] 0.566962 Event begin: BuildTwoSidedF
>>        [0] 0.567068 Event begin: BuildTwoSided
>>        [0] 0.567119 Event end: BuildTwoSided
>>      [0] 0.567154 Event end: BuildTwoSidedF
>>    [0] 0.567162 Event end: MatAssemblyBegin
>>    [0] 0.567164 Event begin: MatAssemblyEnd
>>    [0] 0.567356 Event end: MatAssemblyEnd
>>    [0] 0.572884 Event begin: MatAssemblyBegin
>>    [0] 0.57289 Event end: MatAssemblyBegin
>>    [0] 0.572892 Event begin: MatAssemblyEnd
>>    [0] 0.573978 Event end: MatAssemblyEnd
>>    [0] 0.574428 Event begin: MatZeroEntries
>>    [0] 0.579998 Event end: MatZeroEntries
>> :0:rocdevice.cpp            :2589: 257935591316 us: Device::callbackQueue
>> aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to
>> access an inaccessible address. code: 0x2b
>> srun: error: crusher001: task 0: Aborted
>> srun: launch/slurm: _step_signal: Terminating StepId=65929.4
>> + date
>> Sun 23 Jan 2022 09:46:55 AM EST
>>
>> On Sun, Jan 23, 2022 at 8:15 AM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Thanks,
>>> '-mat_type hypre' was failing for me. I could not find a test that used
>>> it and I was not sure it was considered functional.
>>> I will look at it again and collect a bug report if needed.
>>>
>>> On Sat, Jan 22, 2022 at 11:31 AM Stefano Zampini <
>>> stefano.zampini at gmail.com> wrote:
>>>
>>>> Mark
>>>>
>>>> the two options are only there to test the code in CI, and are not
>>>> needed in general
>>>>
>>>>    '--download-hypre-configure-arguments=--enable-unified-memory',
>>>> This is only here to test the unified memory code path
>>>>
>>>>     '--with-hypre-gpuarch=gfx90a',
>>>> This is not needed if rocminfo is in PATH
>>>>
>>>> Our interface code with HYPRE GPU works fine for HIP, it is tested in
>>>> CI.
>>>> The -mat_type hypre assembling for ex19 does not work because ex19 uses
>>>> FDColoring. Just assemble in mpiaij format (look at  runex19_hypre_hip in
>>>> src/snes/tutorials/makefile); the interface code will copy the matrix to
>>>> the GPU
>>>>
>>>> Il giorno ven 21 gen 2022 alle ore 19:24 Mark Adams <mfadams at lbl.gov>
>>>> ha scritto:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 21, 2022 at 11:14 AM Jed Brown <jed at jedbrown.org> wrote:
>>>>>
>>>>>> "Paul T. Bauman" <ptbauman at gmail.com> writes:
>>>>>>
>>>>>> > On Fri, Jan 21, 2022 at 8:52 AM Paul T. Bauman <ptbauman at gmail.com>
>>>>>> wrote:
>>>>>> >> Yes. The way HYPRE's memory model is setup is that ALL GPU
>>>>>> allocations are
>>>>>> >> "native" (i.e. [cuda,hip]Malloc) or, if unified memory is enabled,
>>>>>> then ALL
>>>>>> >> GPU allocations are unified memory (i.e. [cuda,hip]MallocManaged).
>>>>>> >> Regarding HIP, there is an HMM implementation of hipMallocManaged
>>>>>> planned,
>>>>>> >> but is it not yet delivered AFAIK (and it will *not* support
>>>>>> gfx906, e.g.
>>>>>> >> RVII, FYI), so, today, under the covers, hipMallocManaged is
>>>>>> calling
>>>>>> >> hipHostMalloc. So, today, all your unified memory allocations in
>>>>>> HYPRE on
>>>>>> >> HIP are doing CPU-pinned memory accesses. And performance is just
>>>>>> truly
>>>>>> >> terrible (as you might expect).
>>>>>>
>>>>>> Thanks for this important bit of information.
>>>>>>
>>>>>> And it sounds like when we add support to hand off Kokkos matrices
>>>>>> and vectors (our current support for matrices on ROCm devices uses Kokkos)
>>>>>> or add direct support for hipSparse, we'll avoid touching host memory in
>>>>>> assembly-to-solve with hypre.
>>>>>>
>>>>>
>>>>> It does not look like anyone has made Hypre work with HIP. Stafano
>>>>> added a runex19_hypre_hip target 4 months ago and hypre.py has some HIP
>>>>> things.
>>>>>
>>>>> I have a user that would like to try this, no hurry but, can I get an
>>>>> idea of a plan for this?
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Stefano
>>>>
>>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220124/ca93ff41/attachment-0001.html>


More information about the petsc-users mailing list