[petsc-users] building kokkos matrices on the device

Steven Dargaville dargaville.steven at gmail.com
Wed Feb 26 10:54:32 CST 2025


I think that sounds great, I'm happy to put together an MR (likely next
week) for review.

On Wed, 26 Feb 2025 at 16:11, Junchao Zhang <junchao.zhang at gmail.com> wrote:

> This fuction *MatCreateMPIAIJWithSeqAIJ(MPI_Comm comm, Mat A, Mat B,
> const PetscInt garray[], Mat *mat) *is rarely used. To compute the global
> matrix's row/col size M, N, it has to do an MPI_Allreduce(). I think it is
> a waste, as the caller usually knows M, N already. So I think we can depart
> from it and have a new one:
>
> MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, PetscInt M, PetscInt
> N, Mat A, Mat B, const PetscInt garray[], Mat *mat)
> * M, N are global row/col size of  mat
> * A, B are MATSEQAIJKOKKOS
> * M, N could be PETSC_DECIDE, if so, petsc will compute mat's M, N from A,
> i.e.,  M = Sum of A's M,  N= Sum of A's N
> * if garray is NULL, B uses global column indices (and B's N  should be
> equal to the output mat's N)
> * if garray is not NULL, B uses local column indices; garray[] was
> allocated by PetscMalloc() and after the call,  garray will be owned by mat
> (user should not free garray afterwards).
>
> What do you think? If you agree, could you contribute an MR?
>
> BTW, I think we need to create a new header, petscmat_kokkos.hpp to declare
>   PetscErrorCode MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm comm,
> KokkosCsrMatrix csr, Mat *A)
> but
> PetscErrorCode MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm,
> PetscInt M, PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat *mat)
> can be in petscmat.h as it uses only C types.
>
> Barry, what do you think of the two new APIs?
>
> --Junchao Zhang
>
>
> On Wed, Feb 26, 2025 at 6:26 AM Steven Dargaville <
> dargaville.steven at gmail.com> wrote:
>
>> Those two constructors would definitely meet my needs, thanks!
>>
>> Also I should note that the comment about garray and B in
>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices is correct if garray is
>> passed in as NULL, it's just that if you pass in a completed garray it
>> doesn't bother creating one or changing the column indices of B. So I would
>> suggest the comment be: "if garray is NULL the offdiag matrix B should
>> have global col ids; if garray is not NULL the offdiag matrix B should have
>> local col ids"
>>
>> On Wed, 26 Feb 2025 at 03:35, Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>> Mat_SeqAIJKokkos is private because it is in a private header
>>> src/mat/impls/aij/seq/kokkos/aijkok.hpp
>>>
>>> Your observation about the garray in MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices()
>>> might be right.  The comment
>>>
>>> - B - the offdiag matrix using global col ids
>>>
>>> is out of date. Perhaps it should be "the offdiag matrix uses local
>>> column indices and garray contains the local to global mapping".  But I
>>> need to double check it.
>>>
>>> Since you use Kokkos, I think we could provide these two constructors
>>> for MATSEQAIJKOKKOS and MATMPIAIJKOKKOS respectively
>>>
>>>    - MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm comm,
>>>    KokkosCsrMatrix csr, Mat *A)
>>>
>>>
>>>    - MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, Mat A, Mat B,
>>>    PetscInt *garray, Mat *mat)
>>>
>>>          // To mimic the existing MatCreateMPIAIJWithSeqAIJ(MPI_Comm
>>> comm, Mat A, Mat B, const PetscInt garray[], Mat *mat);
>>>          // A and B are MATSEQAIJKOKKOS matrices and use local column
>>> indices
>>>
>>> Do they meet your needs?
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Tue, Feb 25, 2025 at 5:35 PM Steven Dargaville <
>>> dargaville.steven at gmail.com> wrote:
>>>
>>>> Thanks for the response!
>>>>
>>>> Although MatSetValuesCOO happens on the device if the input coo_v
>>>> pointer is device memory, I believe MatSetPreallocationCOO requires host
>>>> pointers for coo_i and coo_j, and the preallocation (and construction of
>>>> the COO structures) happens on the host and is then copied onto the device?
>>>> I need to be able to create a matrix object with minimal work on the host
>>>> (like many of the routines in aijkok.kokkos.cxx do internally). I
>>>> originally used the COO interface to build the matrices I need, but that
>>>> was around 5x slower than constructing the aij structures myself on the
>>>> device and then just directly using the MatSetSeqAIJKokkosWithCSRMatrix
>>>> type methods.
>>>>
>>>> The reason I thought MatSetSeqAIJKokkosWithCSRMatrix could easily be
>>>> made public is that the Mat_SeqAIJKokkos constructors are already publicly
>>>> accessible? In particular one of those constructors takes in pointers to
>>>> the Kokkos dual views which store a,i,j, and hence one can build a
>>>> sequential matrix with nothing (or very little) occuring on the host. The
>>>> only change I can see that would be necessary is for
>>>> MatSetSeqAIJKokkosWithCSRMatrix (or MatCreateSeqAIJKokkosWithCSRMatrix) to
>>>> be public is to change the PETSC_INTERN to PETSC_EXTERN?
>>>>
>>>> For MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, I believe all that
>>>> is required is declaring the method in the .hpp, as it's already defined as
>>>> static in mpiaijkok.kokkos.cxx. In particular, the comments
>>>> above MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices suggest that the
>>>> off-diagonal block B needs to be built with global column ids, with
>>>> mpiaij->garray constructed on the host along with the rewriting of the
>>>> global column indices in B. This happens in MatSetUpMultiply_MPIAIJ, but
>>>> checking the code there shows that if you pass in a non-null garray to
>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, the construction and
>>>> compatification is skipped, meaning B can be built with local column ids as
>>>> long as garray is provided on the host (which I also build on the device
>>>> and then just copy to the host). Again this is what some of the internal
>>>> Kokkos routines rely on, like the matrix-product.
>>>>
>>>> I am happy to try doing this and submitting a request to the petsc
>>>> gitlab if this seems sensible, I just wanted to double check that I wasn't
>>>> missing something important?
>>>> Thanks
>>>> Steven
>>>>
>>>> On Tue, 25 Feb 2025 at 22:16, Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi, Steven,
>>>>>   MatSetSeqAIJKokkosWithCSRMatrix(Mat A, Mat_SeqAIJKokkos *akok) uses
>>>>> a private data type Mat_SeqAIJKokkos, so it can not be directly made
>>>>> public.
>>>>>   If you already use COO, then why not directly make the matrix of
>>>>> type MATAIJKOKKOS and call MatSetPreallocationCOO() and MatSetValuesCOO()?
>>>>>   So I am confused by your needs.
>>>>>
>>>>> Thanks!
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2025 at 3:39 PM Steven Dargaville <
>>>>> dargaville.steven at gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I'm just wondering if there is any possibility of making:
>>>>>> MatSetSeqAIJKokkosWithCSRMatrix or MatCreateSeqAIJKokkosWithCSRMatrix
>>>>>> in src/mat/impls/aij/seq/kokkos/aijkok.kokkos.cxx
>>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices in
>>>>>> src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx
>>>>>>
>>>>>> publicly accessible outside of petsc, or if there is an interface I
>>>>>> have missed for creating Kokkos matrices entirely on the device?
>>>>>> MatCreateSeqAIJKokkosWithCSRMatrix for example is marked as PETSC_INTERN so
>>>>>> I can't link to it.
>>>>>>
>>>>>> I've currently just copied the code inside of those methods so that I
>>>>>> can build without any preallocation on the host (e.g., through the COO
>>>>>> interface) and it works really well.
>>>>>>
>>>>>> Thanks for your help
>>>>>> Steven
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250226/c97db042/attachment-0001.html>


More information about the petsc-users mailing list