[petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

Junchao Zhang junchao.zhang at gmail.com
Thu Jan 5 10:06:04 CST 2023


On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry <mlohry at gmail.com> wrote:

>
>> A workaround is to let petsc build the matrix and allocate the memory,
>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>>
>
> Junchao, looking at the code for this it seems to only return a pointer to
> the value array, but not pointers to the column and row index arrays, is
> that right?
>
Yes, that is correct.
I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const
PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype),
which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if
A is a device matrix, otherwise (a,i, j) on host and mtype =
PETSC_MEMTYPE_HOST.
We currently have similar things like
VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding
MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).

It looks like you need (a, i, j) for assembly, but the above function only
works for an assembled matrix.


>
>
> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch <jacob.fai at gmail.com>
> wrote:
>
>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>
>>
>> CUPM works with both enabled simultaneously, I don’t think there are any
>> direct restrictions for it. Vec at least was fully usable with both cuda
>> and hip (though untested) last time I checked.
>>
>> Best regards,
>>
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>>
>> On Jan 5, 2023, at 00:09, Junchao Zhang <junchao.zhang at gmail.com> wrote:
>>
>> 
>>
>>
>>
>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>>
>>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>
>>>>> Oh, is the device backend not known at compile time?
>>>>>
>>>> Currently it is known at compile time.
>>>>
>>>
>>> Are you sure? I don't think it is known at compile time.
>>>
>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
>>
>>
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> Or multiple backends can be alive at once?
>>>>>
>>>>
>>>> Some petsc developers (Jed and Barry) want to support this, but we are
>>>> incapable now.
>>>>
>>>>
>>>>>
>>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>>
>>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
>>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>>>>>>> GPUs, ...
>>>>>>>
>>>>>>>
>>>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays
>>>>>>> in CSR format, they're just raw device pointers in all cases.
>>>>>>>
>>>>>> But we need to know what device it is (to dispatch to either
>>>>>> petsc-CUDA or petsc-HIP backend)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>
>>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for
>>>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we
>>>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>>>>>>> GPUs, ...
>>>>>>>>
>>>>>>>> The real problem I think is to deal with multiple MPI ranks.
>>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
>>>>>>>> discouraged for users to do so.
>>>>>>>>
>>>>>>>> A workaround is to let petsc build the matrix and allocate the
>>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill
>>>>>>>> it up.
>>>>>>>>
>>>>>>>> We recently added routines to support matrix assembly on GPUs, see
>>>>>>>> if MatSetValuesCOO
>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MatSetValuesCOO/>
>>>>>>>>  helps
>>>>>>>>
>>>>>>>> --Junchao Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I have a sparse matrix constructed in non-petsc code using a
>>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an
>>>>>>>>> implicit TS context. In the CPU world I call
>>>>>>>>>
>>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
>>>>>>>>> rowidxptr, colidxptr, valptr, Jac);
>>>>>>>>>
>>>>>>>>> which as I understand it -- (1) never copies/allocates that
>>>>>>>>> information, and the matrix Jac is just a non-owning view into the already
>>>>>>>>> allocated CSR, (2) I can write directly into the original data structures
>>>>>>>>> and the Mat just "knows" about it, although it still needs a call to
>>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
>>>>>>>>> works great with GAMG.
>>>>>>>>>
>>>>>>>>> I have the same CSR representation filled in GPU data allocated
>>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat
>>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230105/785a8f54/attachment-0001.html>


More information about the petsc-users mailing list