[petsc-users] kokkos, matcopy, matdiagonalscale on gpu
Steven Dargaville
dargaville.steven at gmail.com
Mon Apr 7 16:57:03 CDT 2025
Hi
I have some code that computes a MatMatMult, then modifies one of the
matrices and calls the MatMatMult again with MAT_REUSE_MATRIX. This gives
the correct results with mat_types aij, aijkokkos and either run on a CPU
or a GPU.
In some cases one of the matrices is diagonal so I have been modifying my
code to call MatDiagonalScale instead and have seen some peculiar
behaviour. If the mat type is aij, everything works correctly on the CPU
and GPU. If the mat type is aijkokkos, everything works correctly on the
CPU but when run on a machine with a GPU the results differ.
I have some example code below that shows this, it prints out four
matrices; the first two should match and the last two should match. To see
the failure run with "-mat_type aijkokkos" on a machine with a GPU (again
this gives the correct results with aijkokkos run on a CPU). I get the
output:
Mat Object: 1 MPI process
type: seqaijkokkos
row 0: (0, 4.) (1, 6.)
row 1: (0, 10.) (1, 14.)
Mat Object: 1 MPI process
type: seqaijkokkos
row 0: (0, 4.) (1, 6.)
row 1: (0, 10.) (1, 14.)
Mat Object: 1 MPI process
type: seqaijkokkos
row 0: (0, 6.) (1, 9.)
row 1: (0, 15.) (1, 21.)
Mat Object: 1 MPI process
type: seqaijkokkos
row 0: (0, 12.) (1, 18.)
row 1: (0, 30.) (1, 42.)
I have narrowed down this failure to a MatCopy call, where the values of
result_diag should be overwritten with A before calling MatDiagonalScale.
The results with aijkokos on a GPU suggest the values of result_diag are
not being changed. If instead of using the MatCopy I destroy result_diag
and call MatDuplicate, the results are correct. You can trigger the correct
behaviour with "-mat_type aijkokkos -correct".
To me this looks like the values are out of sync between the device/host,
but I would have thought a call to MatCopy would be safe. Any thoughts on
what I might be doing wrong?
Thanks for your help!
Steven
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
static char help[] = "Test matmat products with matdiagonal on gpus \n\n";
#include <iostream>
#include <petscmat.h>
int main(int argc, char **args)
{
const PetscInt inds[] = {0, 1};
PetscScalar avals[] = {2, 3, 5, 7};
Mat A, B_diag, B_aij_diag, result, result_diag;
Vec diag;
PetscCall(PetscInitialize(&argc, &args, NULL, help));
// Create matrix to start
PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2,
&A));
PetscCall(MatSetUp(A));
PetscCall(MatSetValues(A, 2, inds, 2, inds, avals, INSERT_VALUES));
PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
// Create a matdiagonal matrix
// Will be the matching vec type as A
PetscCall(MatCreateVecs(A, &diag, NULL));
PetscCall(VecSet(diag, 2.0));
PetscCall(MatCreateDiagonal(diag, &B_diag));
// Create the same matrix as the matdiagonal but in aij format
PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2,
&B_aij_diag));
PetscCall(MatSetUp(B_aij_diag));
PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES));
PetscCall(MatAssemblyBegin(B_aij_diag, MAT_FINAL_ASSEMBLY));
PetscCall(MatAssemblyEnd(B_aij_diag, MAT_FINAL_ASSEMBLY));
PetscCall(VecDestroy(&diag));
// ~~~~~~~~~~~~~
// Do an initial matmatmult
// A * B_aij_diag
// and then
// A * B_diag but just using MatDiagonalScale
// ~~~~~~~~~~~~~
// aij * aij
PetscCall(MatMatMult(A, B_aij_diag, MAT_INITIAL_MATRIX, 1.5, &result));
PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD));
// aij * diagonal
PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag));
PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
PetscCall(MatDiagonalScale(result_diag, NULL, diag));
PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD));
// ~~~~~~~~~~~~~
// Now let's modify the diagonal and do it again with "reuse"
// ~~~~~~~~~~~~~
PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
PetscCall(VecSet(diag, 3.0));
PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES));
PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
// aij * aij
PetscCall(MatMatMult(A, B_aij_diag, MAT_REUSE_MATRIX, 1.5, &result));
PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD));
PetscBool correct = PETSC_FALSE;
PetscCall(PetscOptionsGetBool(NULL, NULL, "-correct", &correct, NULL));
if (!correct)
{
// This gives the wrong results below when run on gpu
// Results suggest this isn't copied
PetscCall(MatCopy(A, result_diag, SAME_NONZERO_PATTERN));
}
else
{
// This gives the correct results below
PetscCall(MatDestroy(&result_diag));
PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag));
}
// aij * diagonal
PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
PetscCall(MatDiagonalScale(result_diag, NULL, diag));
PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD));
PetscCall(PetscFinalize());
return 0;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250407/c38b6af6/attachment.html>
More information about the petsc-users
mailing list