[petsc-users] zero pattern of result of matmatmult

Wed Sep 18 03:00:40 CDT 2013

On Tue, 2013-09-17 at 07:24 -0700, Jed Brown wrote:
> Frederik Treue <frtr at fysik.dtu.dk> writes:
> 
> > On Tue, 2013-09-17 at 08:15 -0500, Barry Smith wrote:
> >>     Are you calling MatSetValuesStencil() for each single entry or once per row or block row? Row or block row will be faster than for each entry.
> > once per entry - this I should definitely improve. What's the logic
> > behind row or block rows being faster? The fewer calls to
> > MatSetValuesStencil, the better?
> 
> It shares index translation and the call stack.

Just to be sure: If I could make the building of the matrix in one go
(pr. mpi process), ie. give a list of stencils from all the points
handled by the process, and then make one MatSetValuesStencil, would
this always be faster? Assuming there is no memory concerns.

> > PS. FYI: My stencil is essentially a standard 2D, 9-point box stencil,
> > but the values in all rows are different (due, among other things, to
> > the geometric tensor).
> 
> That's normal.  Is the geometry changing?

Both that, and my operator depends explicitly on the fields, which
obviously do change.
> 
> >       ##########################################################
> >       #                                                        #
> >       #                          WARNING!!!                    #
> >       #                                                        #
> >       #   This code was compiled with a debugging option,      #
> >       #   To get timing results run ./configure                #
> >       #   using --with-debugging=no, the performance will      #
> >       #   be generally two or three times faster.              #
> >       #                                                        #
> >       ##########################################################
> 
> We weren't joking.
> 
> You should configure an optimized build using a different value of
> PETSC_ARCH so that you can easily switch back and forth between the
> debug and optimized versions.  Everything will get faster, but
> especially insertion.

Ah, so I can't assume that because version 1 of my code is X times
faster than version 2, measured using a debug version of petsc, this
will also hold for the production version of petsc... Thanks for the
heads up :)

/Frederik

PS. Barry, the constant parts of my operator requires no calculation at
all, but thanks for the info anyway.