[petsc-users] building kokkos matrices on the device
Steven Dargaville
dargaville.steven at gmail.com
Wed Feb 26 11:26:54 CST 2025
Ok so just to double check the things I should do:
1. Create a new header for MatCreateSeqAIJKokkosWithCSRMatrix (and declare
it PETSC_EXTERN) so users can call the existing method and build a
seqaijkokkos matrix with no host involvement.
2. Modify *MatCreateMPIAIJWithSeqAIJ (*or equivalent*) *so it does the same
thing as MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices in the case that A
and B are seqaijkokkos matrices.
3. Potentially remove MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices given
it would be redundant?
On Wed, 26 Feb 2025 at 17:15, Junchao Zhang <junchao.zhang at gmail.com> wrote:
> That is a good idea. Perhaps a new MatCreateMPIXAIJWithSeqXAIJ(MPI_Comm
> comm, PetscInt M, PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat
> *mat) since garray[] is only meaningful for MATAIJ and subclasses.
>
> --Junchao Zhang
>
>
> On Wed, Feb 26, 2025 at 11:02 AM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>> The new function doesn't seem to have anything to do with Kokkos so
>> why have any new functions? Just have *MatCreateMPIAIJWithSeqAIJ() work
>> properly when the two matrices are Kokkos (or CUDA or HIP). Or if you
>> want to eliminate the global reduction maybe make your new function
>> MatCreateMPIWithSeq() and have it work for any type of submatrix and
>> eventually we could deprecate the **MatCreateMPIAIJWithSeqAIJ() *
>>
>> * Barry*
>>
>>
>>
>>
>> On Feb 26, 2025, at 11:54 AM, Steven Dargaville <
>> dargaville.steven at gmail.com> wrote:
>>
>> I think that sounds great, I'm happy to put together an MR (likely next
>> week) for review.
>>
>> On Wed, 26 Feb 2025 at 16:11, Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>> This fuction *MatCreateMPIAIJWithSeqAIJ(MPI_Comm comm, Mat A, Mat B,
>>> const PetscInt garray[], Mat *mat) *is rarely used. To compute the
>>> global matrix's row/col size M, N, it has to do an MPI_Allreduce(). I think
>>> it is a waste, as the caller usually knows M, N already. So I think we can
>>> depart from it and have a new one:
>>>
>>> MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, PetscInt M,
>>> PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat *mat)
>>> * M, N are global row/col size of mat
>>> * A, B are MATSEQAIJKOKKOS
>>> * M, N could be PETSC_DECIDE, if so, petsc will compute mat's M, N from
>>> A, i.e., M = Sum of A's M, N= Sum of A's N
>>> * if garray is NULL, B uses global column indices (and B's N should be
>>> equal to the output mat's N)
>>> * if garray is not NULL, B uses local column indices; garray[] was
>>> allocated by PetscMalloc() and after the call, garray will be owned by mat
>>> (user should not free garray afterwards).
>>>
>>> What do you think? If you agree, could you contribute an MR?
>>>
>>> BTW, I think we need to create a new header, petscmat_kokkos.hpp to
>>> declare
>>> PetscErrorCode MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm
>>> comm, KokkosCsrMatrix csr, Mat *A)
>>> but
>>> PetscErrorCode MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm,
>>> PetscInt M, PetscInt N, Mat A, Mat B, const PetscInt garray[], Mat *mat)
>>> can be in petscmat.h as it uses only C types.
>>>
>>> Barry, what do you think of the two new APIs?
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Feb 26, 2025 at 6:26 AM Steven Dargaville <
>>> dargaville.steven at gmail.com> wrote:
>>>
>>>> Those two constructors would definitely meet my needs, thanks!
>>>>
>>>> Also I should note that the comment about garray and B in
>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices is correct if garray
>>>> is passed in as NULL, it's just that if you pass in a completed garray it
>>>> doesn't bother creating one or changing the column indices of B. So I would
>>>> suggest the comment be: "if garray is NULL the offdiag matrix B should
>>>> have global col ids; if garray is not NULL the offdiag matrix B should have
>>>> local col ids"
>>>>
>>>> On Wed, 26 Feb 2025 at 03:35, Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>> Mat_SeqAIJKokkos is private because it is in a private header
>>>>> src/mat/impls/aij/seq/kokkos/aijkok.hpp
>>>>>
>>>>> Your observation about the garray in MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices()
>>>>> might be right. The comment
>>>>>
>>>>> - B - the offdiag matrix using global col ids
>>>>>
>>>>> is out of date. Perhaps it should be "the offdiag matrix uses local
>>>>> column indices and garray contains the local to global mapping". But I
>>>>> need to double check it.
>>>>>
>>>>> Since you use Kokkos, I think we could provide these two constructors
>>>>> for MATSEQAIJKOKKOS and MATMPIAIJKOKKOS respectively
>>>>>
>>>>> - MatCreateSeqAIJKokkosWithKokkosCsrMatrix(MPI_Comm comm,
>>>>> KokkosCsrMatrix csr, Mat *A)
>>>>>
>>>>>
>>>>> - MatCreateMPIAIJKokkosWithSeqAIJKokkos(MPI_Comm comm, Mat A, Mat
>>>>> B, PetscInt *garray, Mat *mat)
>>>>>
>>>>> // To mimic the existing MatCreateMPIAIJWithSeqAIJ(MPI_Comm
>>>>> comm, Mat A, Mat B, const PetscInt garray[], Mat *mat);
>>>>> // A and B are MATSEQAIJKOKKOS matrices and use local column
>>>>> indices
>>>>>
>>>>> Do they meet your needs?
>>>>>
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2025 at 5:35 PM Steven Dargaville <
>>>>> dargaville.steven at gmail.com> wrote:
>>>>>
>>>>>> Thanks for the response!
>>>>>>
>>>>>> Although MatSetValuesCOO happens on the device if the input coo_v
>>>>>> pointer is device memory, I believe MatSetPreallocationCOO requires host
>>>>>> pointers for coo_i and coo_j, and the preallocation (and construction of
>>>>>> the COO structures) happens on the host and is then copied onto the device?
>>>>>> I need to be able to create a matrix object with minimal work on the host
>>>>>> (like many of the routines in aijkok.kokkos.cxx do internally). I
>>>>>> originally used the COO interface to build the matrices I need, but that
>>>>>> was around 5x slower than constructing the aij structures myself on the
>>>>>> device and then just directly using the MatSetSeqAIJKokkosWithCSRMatrix
>>>>>> type methods.
>>>>>>
>>>>>> The reason I thought MatSetSeqAIJKokkosWithCSRMatrix could easily be
>>>>>> made public is that the Mat_SeqAIJKokkos constructors are already publicly
>>>>>> accessible? In particular one of those constructors takes in pointers to
>>>>>> the Kokkos dual views which store a,i,j, and hence one can build a
>>>>>> sequential matrix with nothing (or very little) occuring on the host. The
>>>>>> only change I can see that would be necessary is for
>>>>>> MatSetSeqAIJKokkosWithCSRMatrix (or MatCreateSeqAIJKokkosWithCSRMatrix) to
>>>>>> be public is to change the PETSC_INTERN to PETSC_EXTERN?
>>>>>>
>>>>>> For MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, I believe all
>>>>>> that is required is declaring the method in the .hpp, as it's already
>>>>>> defined as static in mpiaijkok.kokkos.cxx. In particular, the comments
>>>>>> above MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices suggest that the
>>>>>> off-diagonal block B needs to be built with global column ids, with
>>>>>> mpiaij->garray constructed on the host along with the rewriting of the
>>>>>> global column indices in B. This happens in MatSetUpMultiply_MPIAIJ, but
>>>>>> checking the code there shows that if you pass in a non-null garray to
>>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices, the construction and
>>>>>> compatification is skipped, meaning B can be built with local column ids as
>>>>>> long as garray is provided on the host (which I also build on the device
>>>>>> and then just copy to the host). Again this is what some of the internal
>>>>>> Kokkos routines rely on, like the matrix-product.
>>>>>>
>>>>>> I am happy to try doing this and submitting a request to the petsc
>>>>>> gitlab if this seems sensible, I just wanted to double check that I wasn't
>>>>>> missing something important?
>>>>>> Thanks
>>>>>> Steven
>>>>>>
>>>>>> On Tue, 25 Feb 2025 at 22:16, Junchao Zhang <junchao.zhang at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, Steven,
>>>>>>> MatSetSeqAIJKokkosWithCSRMatrix(Mat A, Mat_SeqAIJKokkos *akok) uses
>>>>>>> a private data type Mat_SeqAIJKokkos, so it can not be directly made
>>>>>>> public.
>>>>>>> If you already use COO, then why not directly make the matrix of
>>>>>>> type MATAIJKOKKOS and call MatSetPreallocationCOO() and MatSetValuesCOO()?
>>>>>>> So I am confused by your needs.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> --Junchao Zhang
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 25, 2025 at 3:39 PM Steven Dargaville <
>>>>>>> dargaville.steven at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> I'm just wondering if there is any possibility of making:
>>>>>>>> MatSetSeqAIJKokkosWithCSRMatrix
>>>>>>>> or MatCreateSeqAIJKokkosWithCSRMatrix in
>>>>>>>> src/mat/impls/aij/seq/kokkos/aijkok.kokkos.cxx
>>>>>>>> MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices in
>>>>>>>> src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx
>>>>>>>>
>>>>>>>> publicly accessible outside of petsc, or if there is an interface I
>>>>>>>> have missed for creating Kokkos matrices entirely on the device?
>>>>>>>> MatCreateSeqAIJKokkosWithCSRMatrix for example is marked as PETSC_INTERN so
>>>>>>>> I can't link to it.
>>>>>>>>
>>>>>>>> I've currently just copied the code inside of those methods so that
>>>>>>>> I can build without any preallocation on the host (e.g., through the COO
>>>>>>>> interface) and it works really well.
>>>>>>>>
>>>>>>>> Thanks for your help
>>>>>>>> Steven
>>>>>>>>
>>>>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250226/17d7d1da/attachment.html>
More information about the petsc-users
mailing list