[petsc-users] building kokkos matrices on the device
Junchao Zhang
junchao.zhang at gmail.com
Wed Feb 26 12:00:23 CST 2025
On Wed, Feb 26, 2025 at 11:27 AM Steven Dargaville <
dargaville.steven at gmail.com> wrote:
> Ok so just to double check the things I should do:
>
> 1. Create a new header for MatCreateSeqAIJKokkosWithCSRMatrix (and declare
> it PETSC_EXTERN) so users can call the existing method and build a
> seqaijkokkos matrix with no host involvement.
>
No, We already have a private MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm
comm, KokkosCsrMatrix csr, Mat *A), you just need to make it public in a
new header petscmat_kokkos.hpp.
BTW, I am also thinking MatCreateSeqAIJKokkosWithKokkosViews(MPI_Comm comm,
PetscInt m, PetscInt n, Kokkos::View<PetscInt*> i, Kokkos::View<PetscInt*>
j, Kokkos::View<PetscScalar*> a, Mat *mat), as we already have
MatCreateSeqAIJWithArrays(MPI_Comm
comm, PetscInt m, PetscInt n, PetscInt i[], PetscInt j[], PetscScalar a[],
Mat *mat)
The benefit is that we don't need to include <KokkosSparse_CrsMatrix.hpp>
in petscmat_kokkos.hpp, to decouple petsc and kokkos to the least.
But either is fine with me.
>
> 2. Modify *MatCreateMPIAIJWithSeqAIJ (*or equivalent*) *so it does the
> same thing as MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices in the case
> that A and B are seqaijkokkos matrices.
>
Keep the existing MatCreateMPIAIJWithSeqAIJ() but depreciate it in favor
of a new MatCreateMPIXAIJWithSeqXAIJ(MPI_Comm comm, PetscInt M, PetscInt N,
Mat A, Mat B, const PetscInt garray[], Mat *mat). The new function should
handle cases that the A, B are MATSEQAIJKOKKOS.
>
> 3. Potentially remove MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices
> given it would be redundant?
>
Yes, remove it and use the new API at places calling it.
>
> On Wed, 26 Feb 2025 at 17:15, Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> That is a good idea. Perhaps a new MatCreateMPIXAIJWithSeqXAIJ(MPI_Comm
>> comm, PetscInt M, PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat
>> *mat) since garray[] is only meaningful for MATAIJ and subclasses.
>>
>> --Junchao Zhang
>>
>>
>> On Wed, Feb 26, 2025 at 11:02 AM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>> The new function doesn't seem to have anything to do with Kokkos so
>>> why have any new functions? Just have *MatCreateMPIAIJWithSeqAIJ() work
>>> properly when the two matrices are Kokkos (or CUDA or HIP). Or if you
>>> want to eliminate the global reduction maybe make your new function
>>> MatCreateMPIWithSeq() and have it work for any type of submatrix and
>>> eventually we could deprecate the **MatCreateMPIAIJWithSeqAIJ() *
>>>
>>> * Barry*
>>>
>>>
>>>
>>>
>>> On Feb 26, 2025, at 11:54 AM, Steven Dargaville <
>>> dargaville.steven at gmail.com> wrote:
>>>
>>> I think that sounds great, I'm happy to put together an MR (likely next
>>> week) for review.
>>>
>>> On Wed, 26 Feb 2025 at 16:11, Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>> This fuction *MatCreateMPIAIJWithSeqAIJ(MPI_Comm comm, Mat A, Mat B,
>>>> const PetscInt garray[], Mat *mat) *is rarely used. To compute the
>>>> global matrix's row/col size M, N, it has to do an MPI_Allreduce(). I think
>>>> it is a waste, as the caller usually knows M, N already. So I think we can
>>>> depart from it and have a new one:
>>>>
>>>> MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, PetscInt M,
>>>> PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat *mat)
>>>> * M, N are global row/col size of mat
>>>> * A, B are MATSEQAIJKOKKOS
>>>> * M, N could be PETSC_DECIDE, if so, petsc will compute mat's M, N from
>>>> A, i.e., M = Sum of A's M, N= Sum of A's N
>>>> * if garray is NULL, B uses global column indices (and B's N should be
>>>> equal to the output mat's N)
>>>> * if garray is not NULL, B uses local column indices; garray[] was
>>>> allocated by PetscMalloc() and after the call, garray will be owned by mat
>>>> (user should not free garray afterwards).
>>>>
>>>> What do you think? If you agree, could you contribute an MR?
>>>>
>>>> BTW, I think we need to create a new header, petscmat_kokkos.hpp to
>>>> declare
>>>> PetscErrorCode MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm
>>>> comm, KokkosCsrMatrix csr, Mat *A)
>>>> but
>>>> PetscErrorCode MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm,
>>>> PetscInt M, PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat *mat)
>>>> can be in petscmat.h as it uses only C types.
>>>>
>>>> Barry, what do you think of the two new APIs?
>>>>
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Wed, Feb 26, 2025 at 6:26 AM Steven Dargaville <
>>>> dargaville.steven at gmail.com> wrote:
>>>>
>>>>> Those two constructors would definitely meet my needs, thanks!
>>>>>
>>>>> Also I should note that the comment about garray and B in
>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices is correct if garray
>>>>> is passed in as NULL, it's just that if you pass in a completed garray it
>>>>> doesn't bother creating one or changing the column indices of B. So I would
>>>>> suggest the comment be: "if garray is NULL the offdiag matrix B
>>>>> should have global col ids; if garray is not NULL the offdiag matrix B
>>>>> should have local col ids"
>>>>>
>>>>> On Wed, 26 Feb 2025 at 03:35, Junchao Zhang <junchao.zhang at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Mat_SeqAIJKokkos is private because it is in a private header
>>>>>> src/mat/impls/aij/seq/kokkos/aijkok.hpp
>>>>>>
>>>>>> Your observation about the garray in MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices()
>>>>>> might be right. The comment
>>>>>>
>>>>>> - B - the offdiag matrix using global col ids
>>>>>>
>>>>>> is out of date. Perhaps it should be "the offdiag matrix uses local
>>>>>> column indices and garray contains the local to global mapping". But I
>>>>>> need to double check it.
>>>>>>
>>>>>> Since you use Kokkos, I think we could provide these two constructors
>>>>>> for MATSEQAIJKOKKOS and MATMPIAIJKOKKOS respectively
>>>>>>
>>>>>> - MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm comm,
>>>>>> KokkosCsrMatrix csr, Mat *A)
>>>>>>
>>>>>>
>>>>>> - MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, Mat A, Mat
>>>>>> B, PetscInt *garray, Mat *mat)
>>>>>>
>>>>>> // To mimic the existing MatCreateMPIAIJWithSeqAIJ(MPI_Comm
>>>>>> comm, Mat A, Mat B, const PetscInt garray[], Mat *mat);
>>>>>> // A and B are MATSEQAIJKOKKOS matrices and use local column
>>>>>> indices
>>>>>>
>>>>>> Do they meet your needs?
>>>>>>
>>>>>> --Junchao Zhang
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 25, 2025 at 5:35 PM Steven Dargaville <
>>>>>> dargaville.steven at gmail.com> wrote:
>>>>>>
>>>>>>> Thanks for the response!
>>>>>>>
>>>>>>> Although MatSetValuesCOO happens on the device if the input coo_v
>>>>>>> pointer is device memory, I believe MatSetPreallocationCOO requires host
>>>>>>> pointers for coo_i and coo_j, and the preallocation (and construction of
>>>>>>> the COO structures) happens on the host and is then copied onto the device?
>>>>>>> I need to be able to create a matrix object with minimal work on the host
>>>>>>> (like many of the routines in aijkok.kokkos.cxx do internally). I
>>>>>>> originally used the COO interface to build the matrices I need, but that
>>>>>>> was around 5x slower than constructing the aij structures myself on the
>>>>>>> device and then just directly using the MatSetSeqAIJKokkosWithCSRMatrix
>>>>>>> type methods.
>>>>>>>
>>>>>>> The reason I thought MatSetSeqAIJKokkosWithCSRMatrix could easily be
>>>>>>> made public is that the Mat_SeqAIJKokkos constructors are already publicly
>>>>>>> accessible? In particular one of those constructors takes in pointers to
>>>>>>> the Kokkos dual views which store a,i,j, and hence one can build a
>>>>>>> sequential matrix with nothing (or very little) occuring on the host. The
>>>>>>> only change I can see that would be necessary is for
>>>>>>> MatSetSeqAIJKokkosWithCSRMatrix (or MatCreateSeqAIJKokkosWithCSRMatrix) to
>>>>>>> be public is to change the PETSC_INTERN to PETSC_EXTERN?
>>>>>>>
>>>>>>> For MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, I believe all
>>>>>>> that is required is declaring the method in the .hpp, as it's already
>>>>>>> defined as static in mpiaijkok.kokkos.cxx. In particular, the comments
>>>>>>> above MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices suggest that the
>>>>>>> off-diagonal block B needs to be built with global column ids, with
>>>>>>> mpiaij->garray constructed on the host along with the rewriting of the
>>>>>>> global column indices in B. This happens in MatSetUpMultiply_MPIAIJ, but
>>>>>>> checking the code there shows that if you pass in a non-null garray to
>>>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, the construction and
>>>>>>> compatification is skipped, meaning B can be built with local column ids as
>>>>>>> long as garray is provided on the host (which I also build on the device
>>>>>>> and then just copy to the host). Again this is what some of the internal
>>>>>>> Kokkos routines rely on, like the matrix-product.
>>>>>>>
>>>>>>> I am happy to try doing this and submitting a request to the petsc
>>>>>>> gitlab if this seems sensible, I just wanted to double check that I wasn't
>>>>>>> missing something important?
>>>>>>> Thanks
>>>>>>> Steven
>>>>>>>
>>>>>>> On Tue, 25 Feb 2025 at 22:16, Junchao Zhang <junchao.zhang at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi, Steven,
>>>>>>>> MatSetSeqAIJKokkosWithCSRMatrix(Mat A, Mat_SeqAIJKokkos *akok) uses
>>>>>>>> a private data type Mat_SeqAIJKokkos, so it can not be directly made
>>>>>>>> public.
>>>>>>>> If you already use COO, then why not directly make the matrix of
>>>>>>>> type MATAIJKOKKOS and call MatSetPreallocationCOO() and MatSetValuesCOO()?
>>>>>>>> So I am confused by your needs.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> --Junchao Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 25, 2025 at 3:39 PM Steven Dargaville <
>>>>>>>> dargaville.steven at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> I'm just wondering if there is any possibility of making:
>>>>>>>>> MatSetSeqAIJKokkosWithCSRMatrix
>>>>>>>>> or MatCreateSeqAIJKokkosWithCSRMatrix in
>>>>>>>>> src/mat/impls/aij/seq/kokkos/aijkok.kokkos.cxx
>>>>>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices in
>>>>>>>>> src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx
>>>>>>>>>
>>>>>>>>> publicly accessible outside of petsc, or if there is an interface
>>>>>>>>> I have missed for creating Kokkos matrices entirely on the device?
>>>>>>>>> MatCreateSeqAIJKokkosWithCSRMatrix for example is marked as PETSC_INTERN so
>>>>>>>>> I can't link to it.
>>>>>>>>>
>>>>>>>>> I've currently just copied the code inside of those methods so
>>>>>>>>> that I can build without any preallocation on the host (e.g., through the
>>>>>>>>> COO interface) and it works really well.
>>>>>>>>>
>>>>>>>>> Thanks for your help
>>>>>>>>> Steven
>>>>>>>>>
>>>>>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250226/9cc1fa9d/attachment-0001.html>
More information about the petsc-users
mailing list