[petsc-users] kokkos, matcopy, matdiagonalscale on gpu
Junchao Zhang
junchao.zhang at gmail.com
Tue Apr 8 00:04:30 CDT 2025
Hi, Steven
Thank you for the bug report and test example. You were right. The
MatCopy(A, B,..) implementation was wrong when B was on the device.
I have a fix at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8288__;!!G_uCfscf7eWS!e15LlybplgvD-3BTmMELbBy9EiU1biKGxn_qlVmPVnTPUjMYT9nZ7kf3ZtjPPhbQL7zFDI8QGuKk_V-tW9IYCL3xFVri$ , and
will add your test to the MR tomorrow.
Thanks!
--Junchao Zhang
On Mon, Apr 7, 2025 at 4:57 PM Steven Dargaville <
dargaville.steven at gmail.com> wrote:
> Hi
>
> I have some code that computes a MatMatMult, then modifies one of the
> matrices and calls the MatMatMult again with MAT_REUSE_MATRIX. This gives
> the correct results with mat_types aij, aijkokkos and either run on a CPU
> or a GPU.
>
> In some cases one of the matrices is diagonal so I have been modifying my
> code to call MatDiagonalScale instead and have seen some peculiar
> behaviour. If the mat type is aij, everything works correctly on the CPU
> and GPU. If the mat type is aijkokkos, everything works correctly on the
> CPU but when run on a machine with a GPU the results differ.
>
> I have some example code below that shows this, it prints out four
> matrices; the first two should match and the last two should match. To see
> the failure run with "-mat_type aijkokkos" on a machine with a GPU (again
> this gives the correct results with aijkokkos run on a CPU). I get the
> output:
>
> Mat Object: 1 MPI process
> type: seqaijkokkos
> row 0: (0, 4.) (1, 6.)
> row 1: (0, 10.) (1, 14.)
> Mat Object: 1 MPI process
> type: seqaijkokkos
> row 0: (0, 4.) (1, 6.)
> row 1: (0, 10.) (1, 14.)
> Mat Object: 1 MPI process
> type: seqaijkokkos
> row 0: (0, 6.) (1, 9.)
> row 1: (0, 15.) (1, 21.)
> Mat Object: 1 MPI process
> type: seqaijkokkos
> row 0: (0, 12.) (1, 18.)
> row 1: (0, 30.) (1, 42.)
>
> I have narrowed down this failure to a MatCopy call, where the values of
> result_diag should be overwritten with A before calling MatDiagonalScale.
> The results with aijkokos on a GPU suggest the values of result_diag are
> not being changed. If instead of using the MatCopy I destroy result_diag
> and call MatDuplicate, the results are correct. You can trigger the correct
> behaviour with "-mat_type aijkokkos -correct".
>
> To me this looks like the values are out of sync between the device/host,
> but I would have thought a call to MatCopy would be safe. Any thoughts on
> what I might be doing wrong?
>
> Thanks for your help!
> Steven
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> static char help[] = "Test matmat products with matdiagonal on gpus \n\n";
>
> #include <iostream>
> #include <petscmat.h>
>
>
> int main(int argc, char **args)
> {
> const PetscInt inds[] = {0, 1};
> PetscScalar avals[] = {2, 3, 5, 7};
> Mat A, B_diag, B_aij_diag, result, result_diag;
> Vec diag;
>
> PetscCall(PetscInitialize(&argc, &args, NULL, help));
>
> // Create matrix to start
> PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2,
> &A));
> PetscCall(MatSetUp(A));
> PetscCall(MatSetValues(A, 2, inds, 2, inds, avals, INSERT_VALUES));
> PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
> PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
>
> // Create a matdiagonal matrix
> // Will be the matching vec type as A
> PetscCall(MatCreateVecs(A, &diag, NULL));
> PetscCall(VecSet(diag, 2.0));
> PetscCall(MatCreateDiagonal(diag, &B_diag));
>
> // Create the same matrix as the matdiagonal but in aij format
> PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2,
> &B_aij_diag));
> PetscCall(MatSetUp(B_aij_diag));
> PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES));
> PetscCall(MatAssemblyBegin(B_aij_diag, MAT_FINAL_ASSEMBLY));
> PetscCall(MatAssemblyEnd(B_aij_diag, MAT_FINAL_ASSEMBLY));
> PetscCall(VecDestroy(&diag));
>
> // ~~~~~~~~~~~~~
> // Do an initial matmatmult
> // A * B_aij_diag
> // and then
> // A * B_diag but just using MatDiagonalScale
> // ~~~~~~~~~~~~~
>
> // aij * aij
> PetscCall(MatMatMult(A, B_aij_diag, MAT_INITIAL_MATRIX, 1.5, &result));
> PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD));
>
> // aij * diagonal
> PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag));
> PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
> PetscCall(MatDiagonalScale(result_diag, NULL, diag));
> PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
> PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD));
>
> // ~~~~~~~~~~~~~
> // Now let's modify the diagonal and do it again with "reuse"
> // ~~~~~~~~~~~~~
> PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
> PetscCall(VecSet(diag, 3.0));
> PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES));
> PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
>
> // aij * aij
> PetscCall(MatMatMult(A, B_aij_diag, MAT_REUSE_MATRIX, 1.5, &result));
> PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD));
>
> PetscBool correct = PETSC_FALSE;
> PetscCall(PetscOptionsGetBool(NULL, NULL, "-correct", &correct, NULL));
>
> if (!correct)
> {
> // This gives the wrong results below when run on gpu
> // Results suggest this isn't copied
> PetscCall(MatCopy(A, result_diag, SAME_NONZERO_PATTERN));
> }
> else
> {
> // This gives the correct results below
> PetscCall(MatDestroy(&result_diag));
> PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag));
> }
>
> // aij * diagonal
> PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
> PetscCall(MatDiagonalScale(result_diag, NULL, diag));
> PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
> PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD));
>
> PetscCall(PetscFinalize());
> return 0;
> }
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250408/d395826a/attachment-0001.html>
More information about the petsc-users
mailing list