[petsc-users] kokkos, matcopy, matdiagonalscale on gpu

Steven Dargaville dargaville.steven at gmail.com
Mon Apr 7 16:57:03 CDT 2025


Hi

I have some code that computes a MatMatMult, then modifies one of the
matrices and calls the MatMatMult again with MAT_REUSE_MATRIX. This gives
the correct results with mat_types aij, aijkokkos and either run on a CPU
or a GPU.

In some cases one of the matrices is diagonal so I have been modifying my
code to call MatDiagonalScale instead and have seen some peculiar
behaviour. If the mat type is aij, everything works correctly on the CPU
and GPU. If the mat type is aijkokkos, everything works correctly on the
CPU but when run on a machine with a GPU the results differ.

I have some example code below that shows this, it prints out four
matrices; the first two should match and the last two should match. To see
the failure run with "-mat_type aijkokkos" on a machine with a GPU (again
this gives the correct results with aijkokkos run on a CPU). I get the
output:

Mat Object: 1 MPI process
  type: seqaijkokkos
row 0: (0, 4.)  (1, 6.)
row 1: (0, 10.)  (1, 14.)
Mat Object: 1 MPI process
  type: seqaijkokkos
row 0: (0, 4.)  (1, 6.)
row 1: (0, 10.)  (1, 14.)
Mat Object: 1 MPI process
  type: seqaijkokkos
row 0: (0, 6.)  (1, 9.)
row 1: (0, 15.)  (1, 21.)
Mat Object: 1 MPI process
  type: seqaijkokkos
row 0: (0, 12.)  (1, 18.)
row 1: (0, 30.)  (1, 42.)

I have narrowed down this failure to a MatCopy call, where the values of
result_diag should be overwritten with A before calling MatDiagonalScale.
The results with aijkokos on a GPU suggest the values of result_diag are
not being changed. If instead of using the MatCopy I destroy result_diag
and call MatDuplicate, the results are correct. You can trigger the correct
behaviour with "-mat_type aijkokkos -correct".

To me this looks like the values are out of sync between the device/host,
but I would have thought a call to MatCopy would be safe. Any thoughts on
what I might be doing wrong?

Thanks for your help!
Steven

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

static char help[] = "Test matmat products with matdiagonal on gpus \n\n";

#include <iostream>
#include <petscmat.h>


int main(int argc, char **args)
{
  const PetscInt    inds[]  = {0, 1};
  PetscScalar       avals[] = {2, 3, 5, 7};
  Mat               A, B_diag, B_aij_diag, result, result_diag;
  Vec               diag;

  PetscCall(PetscInitialize(&argc, &args, NULL, help));

  // Create matrix to start
  PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2,
&A));
  PetscCall(MatSetUp(A));
  PetscCall(MatSetValues(A, 2, inds, 2, inds, avals, INSERT_VALUES));
  PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
  PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));

  // Create a matdiagonal matrix
  // Will be the matching vec type as A
  PetscCall(MatCreateVecs(A, &diag, NULL));
  PetscCall(VecSet(diag, 2.0));
  PetscCall(MatCreateDiagonal(diag, &B_diag));

  // Create the same matrix as the matdiagonal but in aij format
  PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2,
&B_aij_diag));
  PetscCall(MatSetUp(B_aij_diag));
  PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES));
  PetscCall(MatAssemblyBegin(B_aij_diag, MAT_FINAL_ASSEMBLY));
  PetscCall(MatAssemblyEnd(B_aij_diag, MAT_FINAL_ASSEMBLY));
  PetscCall(VecDestroy(&diag));

  // ~~~~~~~~~~~~~
  // Do an initial matmatmult
  // A * B_aij_diag
  // and then
  // A * B_diag but just using MatDiagonalScale
  // ~~~~~~~~~~~~~

  // aij * aij
  PetscCall(MatMatMult(A, B_aij_diag, MAT_INITIAL_MATRIX, 1.5, &result));
  PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD));

  // aij * diagonal
  PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag));
  PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
  PetscCall(MatDiagonalScale(result_diag, NULL, diag));
  PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
  PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD));

  // ~~~~~~~~~~~~~
  // Now let's modify the diagonal and do it again with "reuse"
  // ~~~~~~~~~~~~~
  PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
  PetscCall(VecSet(diag, 3.0));
  PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES));
  PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));

  // aij * aij
  PetscCall(MatMatMult(A, B_aij_diag, MAT_REUSE_MATRIX, 1.5, &result));
  PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD));

  PetscBool correct = PETSC_FALSE;
  PetscCall(PetscOptionsGetBool(NULL, NULL, "-correct", &correct, NULL));

  if (!correct)
  {
      // This gives the wrong results below when run on gpu
      // Results suggest this isn't copied
      PetscCall(MatCopy(A, result_diag, SAME_NONZERO_PATTERN));
  }
  else
  {
      // This gives the correct results below
      PetscCall(MatDestroy(&result_diag));
      PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag));
  }

  // aij * diagonal
  PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
  PetscCall(MatDiagonalScale(result_diag, NULL, diag));
  PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
  PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD));

  PetscCall(PetscFinalize());
  return 0;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250407/c38b6af6/attachment.html>


More information about the petsc-users mailing list