[petsc-dev] Broken MatMatMult_MPIAIJ_MPIDense

Stefano Zampini stefano.zampini at gmail.com
Sun Sep 22 13:16:18 CDT 2019


it's been always a valid use case to pass an already allocated DENSE matrix
with MAT_REUSE_MATRIX
Right above MatMatMultNumeric_MPIDense there's a quite explicit comment
that appears to be not valid anymore
/*
    This is a "dummy function" that handles the case where matrix C was
created as a dense matrix
  directly by the user and passed to MatMatMult() with the MAT_REUSE_MATRIX
option

  It is the same as MatMatMultSymbolic_MPIAIJ_MPIDense() except does not
create C
*/


Il giorno dom 22 set 2019 alle ore 20:11 Smith, Barry F. via petsc-dev <
petsc-dev at mcs.anl.gov> ha scritto:

>
>    Jose,
>
>      Thanks for the pointer.
>
>      Will this change dramatically affect the organization of SLEPc? As
> noted in my previous email eventually we need to switch to a new API where
> the REUSE with a different matrix is even more problematic.
>
>       If you folks have use cases that fundamentally require reusing a
> previous matrix instead of destroying and getting a new one created we will
> need to think about additional features in the API that would allow this
> reusing of an array. But it seems to me that destroying the old matrix and
> using the initial call to create the matrix should be ok and just require
> relatively minor changes to your codes?
>
>   Barry
>
>
>
>
> > On Sep 22, 2019, at 11:55 AM, Jose E. Roman <jroman at dsic.upv.es> wrote:
> >
> > The man page of MatMatMult says:
> > "In the special case where matrix B (and hence C) are dense you can
> create the correctly sized matrix C yourself and then call this routine
> with MAT_REUSE_MATRIX, rather than first having MatMatMult() create it for
> you."
> >
> > If you are going to change the usage, don't forget to remove this
> sentence. This use case is what we use in SLEPc and is now causing trouble.
> > Jose
> >
> >
> >
> >> El 22 sept 2019, a las 18:49, Pierre Jolivet via petsc-dev <
> petsc-dev at mcs.anl.gov> escribió:
> >>
> >>
> >>> On 22 Sep 2019, at 6:33 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >>>
> >>>
> >>> Ok. So we definitely need better error checking and to clean up the
> code, comments and docs
> >>>
> >>> As the approaches for these computations of products get more
> complicated it becomes a bit harder to support the use of a raw product
> matrix so I don't think we want to add all the code needed to call the
> symbolic part (after the fact) when the matrix is raw.
> >>
> >> To the best of my knowledge, there is only a single method (not taking
> MR 2069 into account) that uses a MPIDense B and for which these approaches
> are necessary, so it’s not like there is a hundred of code paths to fix,
> but I understand your point.
> >>
> >>> Would that make things terribly difficult for you not being able to
> use a raw matrix?
> >>
> >> Definitely not, but that would require some more memory + one copy
> after the MatMatMult (depending on the size of your block Krylov space,
> that can be quite large, and that defeats the purpose of MR 2032 of being
> more memory efficient).
> >> (BTW, I now remember that I’ve been using this “feature” since our SC16
> paper on block Krylov methods)
> >>
> >>> I suspect that the dense case was just lucky that using a raw matrix
> worked.
> >>
> >> I don’t think so, this is clearly the intent of
> MatMatMultNumeric_MPIDense (vs. MatMatMultNumeric_MPIAIJ_MPIDense).
> >>
> >>> The removal of the de facto support for REUSE on the raw matrix should
> be added to the changes document.
> >>>
> >>> Sorry for the difficulties. We have trouble testing all the
> combinations of possible usage, even a coverage tool would not have
> indicated a problems the lack of lda support.
> >>
> >> No problem!
> >>
> >> Thank you,
> >> Pierre
> >>
> >>> Hong,
> >>>
> >>>   Can you take a look at these things on Monday and maybe get a clean
> into a MR so it gets into the release?
> >>>
> >>> Thanks
> >>>
> >>>
> >>> Barry
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> On Sep 22, 2019, at 11:12 AM, Pierre Jolivet <
> pierre.jolivet at enseeiht.fr> wrote:
> >>>>
> >>>>
> >>>>> On 22 Sep 2019, at 6:03 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Sep 22, 2019, at 10:14 AM, Pierre Jolivet via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
> >>>>>>
> >>>>>> FWIW, I’ve fixed MatMatMult and MatTransposeMatMult here
> https://gitlab.com/petsc/petsc/commit/93d7d1d6d29b0d66b5629a261178b832a925de80
> (with MAT_INITIAL_MATRIX).
> >>>>>> I believe there is something not right in your MR (2032) with
> MAT_REUSE_MATRIX (without having called MAT_INITIAL_MATRIX first), cf.
> https://gitlab.com/petsc/petsc/merge_requests/2069#note_220269898.
> >>>>>> Of course, I’d love to be proved wrong!
> >>>>>
> >>>>> I don't understand the context.
> >>>>>
> >>>>> MAT_REUSE_MATRIX requires that the C matrix has come from a previous
> call with MAT_INITIAL_MATRIX, you cannot just put any matrix in the C
> location.
> >>>>
> >>>> 1) It was not the case before the MR, I’ve used that “feature” (which
> may be specific for MatMatMult_MPIAIJ_MPIDense) for as long as I can
> remember
> >>>> 2) If it is not the case anymore, I think it should be mentioned
> somewhere (and not only in the git log, because I don’t think all users
> will go through that)
> >>>> 3) This comment should be removed from the code as well:
> https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line398
> >>>>
> >>>>> This is documented in the manual page. We should have better error
> checking that this is the case so the code doesn't crash at memory access
> but instead produces a very useful error message if the matrix was not
> obtained with MAT_INITIAL_MATRIX.
> >>>>>
> >>>>> Is this the issue or do I not understand?
> >>>>
> >>>> This is exactly the issue.
> >>>>
> >>>>> Barry
> >>>>>
> >>>>> BTW: yes MAT_REUSE_MATRIX has different meanings for different
> matrix operations in terms of where the matrix came from, this is suppose
> to be all documented in each methods manual page but some may be missing or
> incomplete, and error checking is probably not complete for all cases.
> Perhaps the code should be changed to have multiple different names for
> each reuse case for clarity to user?
> >>>>
> >>>> Definitely, cf. above.
> >>>>
> >>>> Thanks,
> >>>> Pierre
> >>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Pierre
> >>>>>>
> >>>>>>> On 22 Sep 2019, at 5:04 PM, Zhang, Hong <hzhang at mcs.anl.gov>
> wrote:
> >>>>>>>
> >>>>>>> I'll check it tomorrow.
> >>>>>>> Hong
> >>>>>>>
> >>>>>>> On Sun, Sep 22, 2019 at 1:04 AM Pierre Jolivet via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
> >>>>>>> Jed,
> >>>>>>> I’m not sure how easy it is to put more than a few lines of code
> on GitLab, so I’ll just send the (tiny) source here, as a follow-up of our
> discussion
> https://gitlab.com/petsc/petsc/merge_requests/2069#note_220229648.
> >>>>>>> Please find attached a .cpp showing the brokenness of C=A*B with A
> of type MPIAIJ and B of type MPIDense when the LDA of B is not equal to its
> number of local rows.
> >>>>>>> It does [[1,1];[1,1]] * [[0,1,2,3];[0,1,2,3]]
> >>>>>>> C should be equal to 2*B, but it’s not, unless lda = m (= 1).
> >>>>>>> Mat Object: 2 MPI processes
> >>>>>>> type: mpidense
> >>>>>>> 0.0000000000000000e+00 1.0000000000000000e+00
> 2.0000000000000000e+00 3.0000000000000000e+00
> >>>>>>> 0.0000000000000000e+00 1.0000000000000000e+00
> 2.0000000000000000e+00 3.0000000000000000e+00
> >>>>>>>
> >>>>>>> If you change Bm here
> https://www.mcs.anl.gov/petsc/petsc-dev/src/mat/impls/aij/mpi/mpimatmatmult.c.html#line549
> to the LDA of B, you’ll get the correct result.
> >>>>>>> Mat Object: 2 MPI processes
> >>>>>>> type: mpidense
> >>>>>>> 0.0000000000000000e+00 2.0000000000000000e+00
> 4.0000000000000000e+00 6.0000000000000000e+00
> >>>>>>> 0.0000000000000000e+00 2.0000000000000000e+00
> 4.0000000000000000e+00 6.0000000000000000e+00
> >>>>>>>
> >>>>>>> Unfortunately, w.r.t. MR 2069, I still don’t get the same results
> with a plain view LDA > m (KO) and a view + duplicate LDA = m (OK).
> >>>>>>> So there might be something else to fix (or this might not even be
> a correct fix), but the only reproducer I have right now is the full solver.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Pierre
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

-- 
Stefano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20190922/02c467f1/attachment.html>


More information about the petsc-dev mailing list