[petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

Matthew Knepley knepley at gmail.com
Wed Jan 4 18:27:21 CST 2023


On Wed, Jan 4, 2023 at 7:22 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> We don't have a machine for us to test with both "--with-cuda --with-hip"
>

Yes, but your answer suggested that the structure of the code prevented
this combination.

  Thanks,

     Matt


> --Junchao Zhang
>
>
> On Wed, Jan 4, 2023 at 6:17 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, Jan 4, 2023 at 7:09 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>
>>>>>> Oh, is the device backend not known at compile time?
>>>>>>
>>>>> Currently it is known at compile time.
>>>>>
>>>>
>>>> Are you sure? I don't think it is known at compile time.
>>>>
>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>>
>>
>> Where is the logic for that in the code? This seems like a crazy design.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>> Or multiple backends can be alive at once?
>>>>>>
>>>>>
>>>>> Some petsc developers (Jed and Barry) want to support this, but we are
>>>>> incapable now.
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>>>
>>>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
>>>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>>>>>>>> GPUs, ...
>>>>>>>>
>>>>>>>>
>>>>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays
>>>>>>>> in CSR format, they're just raw device pointers in all cases.
>>>>>>>>
>>>>>>> But we need to know what device it is (to dispatch to either
>>>>>>> petsc-CUDA or petsc-HIP backend)
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
>>>>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
>>>>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>>>>>>>> GPUs, ...
>>>>>>>>>
>>>>>>>>> The real problem I think is to deal with multiple MPI ranks.
>>>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
>>>>>>>>> discouraged for users to do so.
>>>>>>>>>
>>>>>>>>> A workaround is to let petsc build the matrix and allocate the
>>>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill
>>>>>>>>> it up.
>>>>>>>>>
>>>>>>>>> We recently added routines to support matrix assembly on GPUs, see
>>>>>>>>> if MatSetValuesCOO
>>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MatSetValuesCOO/>
>>>>>>>>>  helps
>>>>>>>>>
>>>>>>>>> --Junchao Zhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry <mlohry at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I have a sparse matrix constructed in non-petsc code using a
>>>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an
>>>>>>>>>> implicit TS context. In the CPU world I call
>>>>>>>>>>
>>>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
>>>>>>>>>> rowidxptr, colidxptr, valptr, Jac);
>>>>>>>>>>
>>>>>>>>>> which as I understand it -- (1) never copies/allocates that
>>>>>>>>>> information, and the matrix Jac is just a non-owning view into the already
>>>>>>>>>> allocated CSR, (2) I can write directly into the original data structures
>>>>>>>>>> and the Mat just "knows" about it, although it still needs a call to
>>>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
>>>>>>>>>> works great with GAMG.
>>>>>>>>>>
>>>>>>>>>> I have the same CSR representation filled in GPU data allocated
>>>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat
>>>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230104/2045a577/attachment.html>


More information about the petsc-users mailing list