[petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

Mark Lohry mlohry at gmail.com
Thu Jan 5 09:39:27 CST 2023


>
>
> A workaround is to let petsc build the matrix and allocate the memory,
> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>

Junchao, looking at the code for this it seems to only return a pointer to
the value array, but not pointers to the column and row index arrays, is
that right?

On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch <jacob.fai at gmail.com>
wrote:

> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>
>
> CUPM works with both enabled simultaneously, I don’t think there are any
> direct restrictions for it. Vec at least was fully usable with both cuda
> and hip (though untested) last time I checked.
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Jan 5, 2023, at 00:09, Junchao Zhang <junchao.zhang at gmail.com> wrote:
>
> 
>
>
>
> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>>
>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>
>>>> Oh, is the device backend not known at compile time?
>>>>
>>> Currently it is known at compile time.
>>>
>>
>> Are you sure? I don't think it is known at compile time.
>>
> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>
>
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Or multiple backends can be alive at once?
>>>>
>>>
>>> Some petsc developers (Jed and Barry) want to support this, but we are
>>> incapable now.
>>>
>>>
>>>>
>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>
>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
>>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>>>>>> GPUs, ...
>>>>>>
>>>>>>
>>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays
>>>>>> in CSR format, they're just raw device pointers in all cases.
>>>>>>
>>>>> But we need to know what device it is (to dispatch to either
>>>>> petsc-CUDA or petsc-HIP backend)
>>>>>
>>>>>
>>>>>>
>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
>>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
>>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>>>>>> GPUs, ...
>>>>>>>
>>>>>>> The real problem I think is to deal with multiple MPI ranks.
>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
>>>>>>> discouraged for users to do so.
>>>>>>>
>>>>>>> A workaround is to let petsc build the matrix and allocate the
>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill
>>>>>>> it up.
>>>>>>>
>>>>>>> We recently added routines to support matrix assembly on GPUs, see if
>>>>>>>  MatSetValuesCOO
>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MatSetValuesCOO/>
>>>>>>>  helps
>>>>>>>
>>>>>>> --Junchao Zhang
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>>>
>>>>>>>> I have a sparse matrix constructed in non-petsc code using a
>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an
>>>>>>>> implicit TS context. In the CPU world I call
>>>>>>>>
>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
>>>>>>>> rowidxptr, colidxptr, valptr, Jac);
>>>>>>>>
>>>>>>>> which as I understand it -- (1) never copies/allocates that
>>>>>>>> information, and the matrix Jac is just a non-owning view into the already
>>>>>>>> allocated CSR, (2) I can write directly into the original data structures
>>>>>>>> and the Mat just "knows" about it, although it still needs a call to
>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
>>>>>>>> works great with GAMG.
>>>>>>>>
>>>>>>>> I have the same CSR representation filled in GPU data allocated
>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat
>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mark
>>>>>>>>
>>>>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230105/12f6d054/attachment.html>


More information about the petsc-users mailing list