[petsc-users] Jacobian for DMComposite, containing DMRedundant

zakaryah . zakaryah at gmail.com
Tue Aug 20 10:55:42 CDT 2019


Thanks Barry - adding -matstash_legacy seemed to fix everything.  Does it
look like I also stumbled onto the pathological case?

On Tue, Aug 20, 2019 at 2:07 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   Before the latest release we changed the way we handled communicating
> matrix entries that needed to be sent between processors; normally that way
> would be much faster but another user reported a pathologically case where
> the PetscSortIntWithArrayPair() was producing a huge number of nested
> calls. That user actually provided a fix (a better sort) that resolved his
> problem.
>
>  You can either run with -matstash_legacy (may need a prefix on the this
> your matrix has a prefix) to get the old behavior or
>
>  You can switch to the master branch in the PETSc repository version to
> use it.  Note we have switched to Gitlab for our repository. You can get
> the master branch with  git clone https://gitlab.com/petsc/petsc
>
>   Please let us know if this does not resolve your problem
>
>   Barry
>
>
>
>
>
> > On Aug 20, 2019, at 12:42 AM, zakaryah . via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> > I'm working on solving a nonlinear system with a DMComposite, which is
> composed of a DMRedundant and a DMDA.  The DMRedundant has a single dof,
> while the DMDA can be large (n grid points, where n is a few million).  The
> Jacobian is then (n+1)x(n+1).  I set it by calling MatGetLocalSubMatrix 4
> times, to get the submatrices J11 (1 x 1), J12 (1 x n), J21 (n x 1), and
> J22 (n x n).  In the past, setting the values for these matrices seemed to
> work fine - the Jacobian was correct, the code was valgrind quiet, etc.
> Since then, something has changed - I went from v. 3.11.0 to 3.11.3, my
> sysadmin may have fiddled with MPI, I'm experimenting with different
> problem sizes and number of processors, etc.
> >
> > Now I am getting segfaults, and the valgrind trace has not been terribly
> illuminating.  Both valgrind and the debugger indicate the segfault
> originating in PetscSortIntWithArrayPair_Private, with the back trace
> showing nested calls to this function running *thousands* deep...  I'm not
> sure but I suspect that the trace to PetscSortIntWithArrayPair_Private
> originates in MatAssemblyBegin.
> >
> > I've created a working example which is minimal for my purposes
> (although probably not fun for anyone else to inspect), and I've narrowed
> the issue down to the code which sets the values of J12, the submatrix with
> a single row (the DMRedundant dof) and many columns (n).  If I don't set
> any values in that submatrix, and change the function so that this Jacobian
> is correct, then the code goes back to being valgrind quiet, completing
> correctly (for the modified problem), and not throwing a segfault.  If I
> run in serial, it also works.
> >
> > Searching the archive, I found an old thread about segfaults in
> MatAssemblyBegin when a very large number of values were set between
> flushes, and Matt said that even though there should not be a segfault, it
> is a good idea in practice to frequently flush the assembly within a loop.
> >
> > I guess my question is whether you think the segfault I am getting is
> due to setting too many matrix values, and how I should go about flushing
> the assembly.  Since I add values to J12 using MatSetValuesLocal, do I need
> to call MatRestoreLocalSubmatrix to restore J12, MatAssemblyBegin and End
> with MAT_FLUSH_ASSEMBLY on the full Jacobian, then call
> MatGetLocalSubmatrix to get J12 again, all inside the loop?
> >
> > One last point - when I call MatSetValuesLocal on J12, I do that on all
> processors, with each processor accessing only the columns which correspond
> to the grid points it owns.  Is there anything wrong with doing that?  Do I
> need to only set values on the processor which owns the DMRedundant?  If I
> call MatGetOwnershipRange on J12, it appears that all processes own the
> matrix's single row.
> >
> > Thanks for any help you can share.
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190820/b271017c/attachment.html>


More information about the petsc-users mailing list