[petsc-users] kokkos, matcopy, matdiagonalscale on gpu

Steven Dargaville dargaville.steven at gmail.com
Tue Apr 8 13:26:47 CDT 2025


Fantastic, that seems to have fixed it. Thanks for your help!

On Tue, 8 Apr 2025 at 06:04, Junchao Zhang <junchao.zhang at gmail.com> wrote:

> Hi, Steven
>   Thank you for the bug report and test example.  You were right. The
> MatCopy(A, B,..) implementation was wrong when B was on the device.
>   I have a fix at https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8288__;!!G_uCfscf7eWS!czsmOAhl7kNlur21sGgNTcYMe79EP113IoDsGXFSWeuR9b1LW7-vxQgIfp2YGSsKyiIR3s193a6KcdEqeQAv-XxqSVRCPUHa$ ,
> and will add your test to the MR tomorrow.
>
>   Thanks!
> --Junchao Zhang
>
>
> On Mon, Apr 7, 2025 at 4:57 PM Steven Dargaville <
> dargaville.steven at gmail.com> wrote:
>
>> Hi
>>
>> I have some code that computes a MatMatMult, then modifies one of the
>> matrices and calls the MatMatMult again with MAT_REUSE_MATRIX. This gives
>> the correct results with mat_types aij, aijkokkos and either run on a CPU
>> or a GPU.
>>
>> In some cases one of the matrices is diagonal so I have been modifying my
>> code to call MatDiagonalScale instead and have seen some peculiar
>> behaviour. If the mat type is aij, everything works correctly on the CPU
>> and GPU. If the mat type is aijkokkos, everything works correctly on the
>> CPU but when run on a machine with a GPU the results differ.
>>
>> I have some example code below that shows this, it prints out four
>> matrices; the first two should match and the last two should match. To see
>> the failure run with "-mat_type aijkokkos" on a machine with a GPU (again
>> this gives the correct results with aijkokkos run on a CPU). I get the
>> output:
>>
>> Mat Object: 1 MPI process
>>   type: seqaijkokkos
>> row 0: (0, 4.)  (1, 6.)
>> row 1: (0, 10.)  (1, 14.)
>> Mat Object: 1 MPI process
>>   type: seqaijkokkos
>> row 0: (0, 4.)  (1, 6.)
>> row 1: (0, 10.)  (1, 14.)
>> Mat Object: 1 MPI process
>>   type: seqaijkokkos
>> row 0: (0, 6.)  (1, 9.)
>> row 1: (0, 15.)  (1, 21.)
>> Mat Object: 1 MPI process
>>   type: seqaijkokkos
>> row 0: (0, 12.)  (1, 18.)
>> row 1: (0, 30.)  (1, 42.)
>>
>> I have narrowed down this failure to a MatCopy call, where the values of
>> result_diag should be overwritten with A before calling MatDiagonalScale.
>> The results with aijkokos on a GPU suggest the values of result_diag are
>> not being changed. If instead of using the MatCopy I destroy result_diag
>> and call MatDuplicate, the results are correct. You can trigger the correct
>> behaviour with "-mat_type aijkokkos -correct".
>>
>> To me this looks like the values are out of sync between the device/host,
>> but I would have thought a call to MatCopy would be safe. Any thoughts on
>> what I might be doing wrong?
>>
>> Thanks for your help!
>> Steven
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> static char help[] = "Test matmat products with matdiagonal on gpus \n\n";
>>
>> #include <iostream>
>> #include <petscmat.h>
>>
>>
>> int main(int argc, char **args)
>> {
>>   const PetscInt    inds[]  = {0, 1};
>>   PetscScalar       avals[] = {2, 3, 5, 7};
>>   Mat               A, B_diag, B_aij_diag, result, result_diag;
>>   Vec               diag;
>>
>>   PetscCall(PetscInitialize(&argc, &args, NULL, help));
>>
>>   // Create matrix to start
>>   PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2,
>> &A));
>>   PetscCall(MatSetUp(A));
>>   PetscCall(MatSetValues(A, 2, inds, 2, inds, avals, INSERT_VALUES));
>>   PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
>>   PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
>>
>>   // Create a matdiagonal matrix
>>   // Will be the matching vec type as A
>>   PetscCall(MatCreateVecs(A, &diag, NULL));
>>   PetscCall(VecSet(diag, 2.0));
>>   PetscCall(MatCreateDiagonal(diag, &B_diag));
>>
>>   // Create the same matrix as the matdiagonal but in aij format
>>   PetscCall(MatCreateFromOptions(PETSC_COMM_WORLD, NULL, 1, 2, 2, 2, 2,
>> &B_aij_diag));
>>   PetscCall(MatSetUp(B_aij_diag));
>>   PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES));
>>   PetscCall(MatAssemblyBegin(B_aij_diag, MAT_FINAL_ASSEMBLY));
>>   PetscCall(MatAssemblyEnd(B_aij_diag, MAT_FINAL_ASSEMBLY));
>>   PetscCall(VecDestroy(&diag));
>>
>>   // ~~~~~~~~~~~~~
>>   // Do an initial matmatmult
>>   // A * B_aij_diag
>>   // and then
>>   // A * B_diag but just using MatDiagonalScale
>>   // ~~~~~~~~~~~~~
>>
>>   // aij * aij
>>   PetscCall(MatMatMult(A, B_aij_diag, MAT_INITIAL_MATRIX, 1.5, &result));
>>   PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD));
>>
>>   // aij * diagonal
>>   PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag));
>>   PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
>>   PetscCall(MatDiagonalScale(result_diag, NULL, diag));
>>   PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
>>   PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD));
>>
>>   // ~~~~~~~~~~~~~
>>   // Now let's modify the diagonal and do it again with "reuse"
>>   // ~~~~~~~~~~~~~
>>   PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
>>   PetscCall(VecSet(diag, 3.0));
>>   PetscCall(MatDiagonalSet(B_aij_diag, diag, INSERT_VALUES));
>>   PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
>>
>>   // aij * aij
>>   PetscCall(MatMatMult(A, B_aij_diag, MAT_REUSE_MATRIX, 1.5, &result));
>>   PetscCall(MatView(result, PETSC_VIEWER_STDOUT_WORLD));
>>
>>   PetscBool correct = PETSC_FALSE;
>>   PetscCall(PetscOptionsGetBool(NULL, NULL, "-correct", &correct, NULL));
>>
>>   if (!correct)
>>   {
>>       // This gives the wrong results below when run on gpu
>>       // Results suggest this isn't copied
>>       PetscCall(MatCopy(A, result_diag, SAME_NONZERO_PATTERN));
>>   }
>>   else
>>   {
>>       // This gives the correct results below
>>       PetscCall(MatDestroy(&result_diag));
>>       PetscCall(MatDuplicate(A, MAT_COPY_VALUES, &result_diag));
>>   }
>>
>>   // aij * diagonal
>>   PetscCall(MatDiagonalGetDiagonal(B_diag, &diag));
>>   PetscCall(MatDiagonalScale(result_diag, NULL, diag));
>>   PetscCall(MatDiagonalRestoreDiagonal(B_diag, &diag));
>>   PetscCall(MatView(result_diag, PETSC_VIEWER_STDOUT_WORLD));
>>
>>   PetscCall(PetscFinalize());
>>   return 0;
>> }
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250408/5d76a1d3/attachment-0001.html>


More information about the petsc-users mailing list