[petsc-users] building kokkos matrices on the device

Junchao Zhang junchao.zhang at gmail.com
Wed Feb 26 11:15:27 CST 2025


That is a good idea.  Perhaps a new MatCreateMPIXAIJWithSeqXAIJ(MPI_Comm
comm, PetscInt M, PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat
*mat) since garray[] is only meaningful for MATAIJ and subclasses.

--Junchao Zhang


On Wed, Feb 26, 2025 at 11:02 AM Barry Smith <bsmith at petsc.dev> wrote:

>
>     The new function doesn't seem to have anything to do with Kokkos so
> why have any new functions? Just have *MatCreateMPIAIJWithSeqAIJ() work
> properly when the two matrices are Kokkos (or CUDA or HIP).   Or if you
> want to eliminate the global reduction maybe make your new function
> MatCreateMPIWithSeq() and have it work for any type of submatrix and
> eventually we could deprecate the **MatCreateMPIAIJWithSeqAIJ() *
>
> *   Barry*
>
>
>
>
> On Feb 26, 2025, at 11:54 AM, Steven Dargaville <
> dargaville.steven at gmail.com> wrote:
>
> I think that sounds great, I'm happy to put together an MR (likely next
> week) for review.
>
> On Wed, 26 Feb 2025 at 16:11, Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> This fuction *MatCreateMPIAIJWithSeqAIJ(MPI_Comm comm, Mat A, Mat B,
>> const PetscInt garray[], Mat *mat) *is rarely used. To compute the
>> global matrix's row/col size M, N, it has to do an MPI_Allreduce(). I think
>> it is a waste, as the caller usually knows M, N already. So I think we can
>> depart from it and have a new one:
>>
>> MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, PetscInt M, PetscInt
>> N, Mat A, Mat B, const PetscInt garray[], Mat *mat)
>> * M, N are global row/col size of  mat
>> * A, B are MATSEQAIJKOKKOS
>> * M, N could be PETSC_DECIDE, if so, petsc will compute mat's M, N from
>> A, i.e.,  M = Sum of A's M,  N= Sum of A's N
>> * if garray is NULL, B uses global column indices (and B's N  should be
>> equal to the output mat's N)
>> * if garray is not NULL, B uses local column indices; garray[] was
>> allocated by PetscMalloc() and after the call,  garray will be owned by mat
>> (user should not free garray afterwards).
>>
>> What do you think? If you agree, could you contribute an MR?
>>
>> BTW, I think we need to create a new header, petscmat_kokkos.hpp to
>> declare
>>   PetscErrorCode MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm comm,
>> KokkosCsrMatrix csr, Mat *A)
>> but
>> PetscErrorCode MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm,
>> PetscInt M, PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat *mat)
>> can be in petscmat.h as it uses only C types.
>>
>> Barry, what do you think of the two new APIs?
>>
>> --Junchao Zhang
>>
>>
>> On Wed, Feb 26, 2025 at 6:26 AM Steven Dargaville <
>> dargaville.steven at gmail.com> wrote:
>>
>>> Those two constructors would definitely meet my needs, thanks!
>>>
>>> Also I should note that the comment about garray and B in
>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices is correct if garray is
>>> passed in as NULL, it's just that if you pass in a completed garray it
>>> doesn't bother creating one or changing the column indices of B. So I would
>>> suggest the comment be: "if garray is NULL the offdiag matrix B should
>>> have global col ids; if garray is not NULL the offdiag matrix B should have
>>> local col ids"
>>>
>>> On Wed, 26 Feb 2025 at 03:35, Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>> Mat_SeqAIJKokkos is private because it is in a private header
>>>> src/mat/impls/aij/seq/kokkos/aijkok.hpp
>>>>
>>>> Your observation about the garray in MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices()
>>>> might be right.  The comment
>>>>
>>>> - B - the offdiag matrix using global col ids
>>>>
>>>> is out of date. Perhaps it should be "the offdiag matrix uses local
>>>> column indices and garray contains the local to global mapping".  But I
>>>> need to double check it.
>>>>
>>>> Since you use Kokkos, I think we could provide these two constructors
>>>> for MATSEQAIJKOKKOS and MATMPIAIJKOKKOS respectively
>>>>
>>>>    - MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm comm,
>>>>    KokkosCsrMatrix csr, Mat *A)
>>>>
>>>>
>>>>    - MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, Mat A, Mat
>>>>    B, PetscInt *garray, Mat *mat)
>>>>
>>>>          // To mimic the existing MatCreateMPIAIJWithSeqAIJ(MPI_Comm
>>>> comm, Mat A, Mat B, const PetscInt garray[], Mat *mat);
>>>>          // A and B are MATSEQAIJKOKKOS matrices and use local column
>>>> indices
>>>>
>>>> Do they meet your needs?
>>>>
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Tue, Feb 25, 2025 at 5:35 PM Steven Dargaville <
>>>> dargaville.steven at gmail.com> wrote:
>>>>
>>>>> Thanks for the response!
>>>>>
>>>>> Although MatSetValuesCOO happens on the device if the input coo_v
>>>>> pointer is device memory, I believe MatSetPreallocationCOO requires host
>>>>> pointers for coo_i and coo_j, and the preallocation (and construction of
>>>>> the COO structures) happens on the host and is then copied onto the device?
>>>>> I need to be able to create a matrix object with minimal work on the host
>>>>> (like many of the routines in aijkok.kokkos.cxx do internally). I
>>>>> originally used the COO interface to build the matrices I need, but that
>>>>> was around 5x slower than constructing the aij structures myself on the
>>>>> device and then just directly using the MatSetSeqAIJKokkosWithCSRMatrix
>>>>> type methods.
>>>>>
>>>>> The reason I thought MatSetSeqAIJKokkosWithCSRMatrix could easily be
>>>>> made public is that the Mat_SeqAIJKokkos constructors are already publicly
>>>>> accessible? In particular one of those constructors takes in pointers to
>>>>> the Kokkos dual views which store a,i,j, and hence one can build a
>>>>> sequential matrix with nothing (or very little) occuring on the host. The
>>>>> only change I can see that would be necessary is for
>>>>> MatSetSeqAIJKokkosWithCSRMatrix (or MatCreateSeqAIJKokkosWithCSRMatrix) to
>>>>> be public is to change the PETSC_INTERN to PETSC_EXTERN?
>>>>>
>>>>> For MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, I believe all
>>>>> that is required is declaring the method in the .hpp, as it's already
>>>>> defined as static in mpiaijkok.kokkos.cxx. In particular, the comments
>>>>> above MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices suggest that the
>>>>> off-diagonal block B needs to be built with global column ids, with
>>>>> mpiaij->garray constructed on the host along with the rewriting of the
>>>>> global column indices in B. This happens in MatSetUpMultiply_MPIAIJ, but
>>>>> checking the code there shows that if you pass in a non-null garray to
>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, the construction and
>>>>> compatification is skipped, meaning B can be built with local column ids as
>>>>> long as garray is provided on the host (which I also build on the device
>>>>> and then just copy to the host). Again this is what some of the internal
>>>>> Kokkos routines rely on, like the matrix-product.
>>>>>
>>>>> I am happy to try doing this and submitting a request to the petsc
>>>>> gitlab if this seems sensible, I just wanted to double check that I wasn't
>>>>> missing something important?
>>>>> Thanks
>>>>> Steven
>>>>>
>>>>> On Tue, 25 Feb 2025 at 22:16, Junchao Zhang <junchao.zhang at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, Steven,
>>>>>>   MatSetSeqAIJKokkosWithCSRMatrix(Mat A, Mat_SeqAIJKokkos *akok) uses
>>>>>> a private data type Mat_SeqAIJKokkos, so it can not be directly made
>>>>>> public.
>>>>>>   If you already use COO, then why not directly make the matrix of
>>>>>> type MATAIJKOKKOS and call MatSetPreallocationCOO() and MatSetValuesCOO()?
>>>>>>   So I am confused by your needs.
>>>>>>
>>>>>> Thanks!
>>>>>> --Junchao Zhang
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 25, 2025 at 3:39 PM Steven Dargaville <
>>>>>> dargaville.steven at gmail.com> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> I'm just wondering if there is any possibility of making:
>>>>>>> MatSetSeqAIJKokkosWithCSRMatrix
>>>>>>> or MatCreateSeqAIJKokkosWithCSRMatrix in
>>>>>>> src/mat/impls/aij/seq/kokkos/aijkok.kokkos.cxx
>>>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices in
>>>>>>> src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx
>>>>>>>
>>>>>>> publicly accessible outside of petsc, or if there is an interface I
>>>>>>> have missed for creating Kokkos matrices entirely on the device?
>>>>>>> MatCreateSeqAIJKokkosWithCSRMatrix for example is marked as PETSC_INTERN so
>>>>>>> I can't link to it.
>>>>>>>
>>>>>>> I've currently just copied the code inside of those methods so that
>>>>>>> I can build without any preallocation on the host (e.g., through the COO
>>>>>>> interface) and it works really well.
>>>>>>>
>>>>>>> Thanks for your help
>>>>>>> Steven
>>>>>>>
>>>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250226/f488bae6/attachment-0001.html>


More information about the petsc-users mailing list