[petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

Thu Jan 5 10:38:18 CST 2023

On Thu, Jan 5, 2023 at 10:24 AM Mark Lohry <mlohry at gmail.com> wrote:

>
> I am thinking something like MatSeqAIJGetArrayAndMemType
>>
>
> Isn't the "MemType" of the matrix an invariant on creation? e.g. a user
> shouldn't care what memtype a pointer is for, just that if a device matrix
> was created it returns device pointers, if a host matrix was created it
> returns host pointers.
>
> Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType
> <https://petsc.org/release/docs/manualpages/Mat/MatSeqAIJGetCSRAndMemType/>,
> isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will
> cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return
> me those raw device pointers?
>
> Yeah, I forgot I added it :). On "a user shouldn't care what memtype a
pointer is":  yes if you can, otherwise you can use mtype to differentiate
your code path.

>
>
>
>
> On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry <mlohry at gmail.com> wrote:
>>
>>>
>>>> A workaround is to let petsc build the matrix and allocate the memory,
>>>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up.
>>>>
>>>
>>> Junchao, looking at the code for this it seems to only return a pointer
>>> to the value array, but not pointers to the column and row index arrays, is
>>> that right?
>>>
>> Yes, that is correct.
>> I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const
>> PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype),
>> which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if
>> A is a device matrix, otherwise (a,i, j) on host and mtype =
>> PETSC_MEMTYPE_HOST.
>> We currently have similar things like
>> VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding
>> MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).
>>
>> It looks like you need (a, i, j) for assembly, but the above function
>> only works for an assembled matrix.
>>
>>
>>>
>>>
>>> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch <jacob.fai at gmail.com>
>>> wrote:
>>>
>>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not
>>>> both
>>>>
>>>>
>>>> CUPM works with both enabled simultaneously, I don’t think there are
>>>> any direct restrictions for it. Vec at least was fully usable with both
>>>> cuda and hip (though untested) last time I checked.
>>>>
>>>> Best regards,
>>>>
>>>> Jacob Faibussowitsch
>>>> (Jacob Fai - booss - oh - vitch)
>>>>
>>>> On Jan 5, 2023, at 00:09, Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>> 
>>>>
>>>>
>>>>
>>>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>>
>>>>>>> Oh, is the device backend not known at compile time?
>>>>>>>
>>>>>> Currently it is known at compile time.
>>>>>>
>>>>>
>>>>> Are you sure? I don't think it is known at compile time.
>>>>>
>>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not
>>>> both
>>>>
>>>>
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>      Matt
>>>>>
>>>>>
>>>>>> Or multiple backends can be alive at once?
>>>>>>>
>>>>>>
>>>>>> Some petsc developers (Jed and Barry) want to support this, but we
>>>>>> are incapable now.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then
>>>>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD
>>>>>>>>>> GPUs, ...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Wouldn't one function suffice? Assuming these are contiguous
>>>>>>>>> arrays in CSR format, they're just raw device pointers in all cases.
>>>>>>>>>
>>>>>>>> But we need to know what device it is (to dispatch to either
>>>>>>>> petsc-CUDA or petsc-HIP backend)
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang <
>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays()
>>>>>>>>>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but
>>>>>>>>>> then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on
>>>>>>>>>> AMD GPUs, ...
>>>>>>>>>>
>>>>>>>>>> The real problem I think is to deal with multiple MPI ranks.
>>>>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is
>>>>>>>>>> discouraged for users to do so.
>>>>>>>>>>
>>>>>>>>>> A workaround is to let petsc build the matrix and allocate the
>>>>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill
>>>>>>>>>> it up.
>>>>>>>>>>
>>>>>>>>>> We recently added routines to support matrix assembly on GPUs,
>>>>>>>>>> see if MatSetValuesCOO
>>>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MatSetValuesCOO/>
>>>>>>>>>>  helps
>>>>>>>>>>
>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry <mlohry at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I have a sparse matrix constructed in non-petsc code using a
>>>>>>>>>>> standard CSR representation where I compute the Jacobian to be used in an
>>>>>>>>>>> implicit TS context. In the CPU world I call
>>>>>>>>>>>
>>>>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols,
>>>>>>>>>>> rowidxptr, colidxptr, valptr, Jac);
>>>>>>>>>>>
>>>>>>>>>>> which as I understand it -- (1) never copies/allocates that
>>>>>>>>>>> information, and the matrix Jac is just a non-owning view into the already
>>>>>>>>>>> allocated CSR, (2) I can write directly into the original data structures
>>>>>>>>>>> and the Mat just "knows" about it, although it still needs a call to
>>>>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this
>>>>>>>>>>> works great with GAMG.
>>>>>>>>>>>
>>>>>>>>>>> I have the same CSR representation filled in GPU data allocated
>>>>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat
>>>>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary copies?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Mark
>>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230105/31db08fe/attachment-0001.html>