[petsc-users] MatMatMul inefficient

Wed Feb 15 11:15:47 CST 2023

Thank you for the reproducer.
I didn’t realize your test case was _this_ small.
Still, you are not setting the MatType of Q, and PETSc tends to only like AIJ, so everything defaults to this type.
So instead of doing C=AB with a sparse A and a dense B, it does a sparse-sparse product which is much costlier.
If you add  call MatSetType(Q,MATDENSE,ierr) before the MatLoad(), you will then get:
 Running with            1  processors
 AQ time using MatMatMul   1.0620000000471919E-003
 AQ time using 6 MatMul   1.4270000001488370E-003
Not an ideal efficiency (still greater than 1 though, so we are in the clear), but things will get better if you either increase the size of A or Q.

Thanks,
Pierre

> On 15 Feb 2023, at 4:34 PM, Guido Margherita <margherita.guido at epfl.ch> wrote:
> 
> Hi, 
> 
> You can find the reproducer at this link https://github.com/margheguido/Miniapp_MatMatMul , including the matrices I used.
> I have trouble undrerstanding what is different in my case from the one you referenced me to. 
> 
> Thank you so much,
> Margherita 
> 
>> Il giorno 13 feb 2023, alle ore 3:51 PM, Pierre Jolivet <pierre at joliv.et> ha scritto:
>> 
>> Could you please share a reproducer?
>> What you are seeing is not typical of the performance of such a kernel, both from a theoretical or a practical (see fig. 2 of https://joliv.et/article.pdf) point of view.
>> 
>> Thanks,
>> Pierre
>> 
>>> On 13 Feb 2023, at 3:38 PM, Guido Margherita via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>> 
>>> A is a sparse MATSEQAIJ, Q is dense.
>>> 
>>> Thanks,
>>> Margherita 
>>> 
>>>> Il giorno 13 feb 2023, alle ore 3:27 PM, knepley at gmail.com ha scritto:
>>>> 
>>>> On Mon, Feb 13, 2023 at 9:21 AM Guido Margherita via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>> Hi all, 
>>>> 
>>>> I realised that performing a matrix-matrix multiplication using the function MatMatMult it is not at all computationally efficient with respect to performing N times a matrix-vector multiplication with MatMul, being N the number of columns of the second matrix in the product. 
>>>> When I multiply I matrix A  46816 x 46816 to a matrix Q  46816 x 6, the MatMatMul function is indeed 6 times more expensive than 6 times a call to MatMul, when performed sequentially (0.04056  s vs 0.0062 s ). When the same code is run in parallel the gap grows even more, being10 times more expensive.
>>>>  Is there an explanation for it?
>>>> 
>>>> So we can reproduce this, what kind of matrix is A? I am assuming that Q is dense.
>>>> 
>>>>  Thanks,
>>>> 
>>>>     Matt
>>>> 
>>>> 
>>>> t1 = MPI_Wtime()
>>>> call MatMatMult(A,Q,MAT_INITIAL_MATRIX, PETSC_DEFAULT_REAL, AQ, ierr )
>>>> t2 = MPI_Wtime() 
>>>> t_MatMatMul = t2-t1
>>>> 
>>>> t_MatMul=0.0
>>>> do j = 0, m-1
>>>>        call MatGetColumnVector(Q, q_vec, j,ierr)
>>>> 
>>>>        t1 = MPI_Wtime()
>>>>        call MatMult(A, q_vec, aq_vec, ierr) 
>>>>        t2 = MPI_Wtime()
>>>> 
>>>>        t_MatMul = t_MatMul + t2-t1
>>>> end do
>>>> 
>>>> Thank you, 
>>>> Margherita Guido
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> https://www.cse.buffalo.edu/~knepley/
>>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230215/9aaf11c5/attachment.html>