[petsc-users] Jacobian for DMComposite, containing DMRedundant

zakaryah . zakaryah at gmail.com
Tue Aug 20 00:42:09 CDT 2019


I'm working on solving a nonlinear system with a DMComposite, which is
composed of a DMRedundant and a DMDA.  The DMRedundant has a single dof,
while the DMDA can be large (n grid points, where n is a few million).  The
Jacobian is then (n+1)x(n+1).  I set it by calling MatGetLocalSubMatrix 4
times, to get the submatrices J11 (1 x 1), J12 (1 x n), J21 (n x 1), and
J22 (n x n).  In the past, setting the values for these matrices seemed to
work fine - the Jacobian was correct, the code was valgrind quiet, etc.
Since then, something has changed - I went from v. 3.11.0 to 3.11.3, my
sysadmin may have fiddled with MPI, I'm experimenting with different
problem sizes and number of processors, etc.

Now I am getting segfaults, and the valgrind trace has not been terribly
illuminating.  Both valgrind and the debugger indicate the segfault
originating in PetscSortIntWithArrayPair_Private, with the back trace
showing nested calls to this function running *thousands* deep...  I'm not
sure but I suspect that the trace to PetscSortIntWithArrayPair_Private
originates in MatAssemblyBegin.

I've created a working example which is minimal for my purposes (although
probably not fun for anyone else to inspect), and I've narrowed the issue
down to the code which sets the values of J12, the submatrix with a single
row (the DMRedundant dof) and many columns (n).  If I don't set any values
in that submatrix, and change the function so that this Jacobian is
correct, then the code goes back to being valgrind quiet, completing
correctly (for the modified problem), and not throwing a segfault.  If I
run in serial, it also works.

Searching the archive, I found an old thread about segfaults in
MatAssemblyBegin when a very large number of values were set between
flushes, and Matt said that even though there should not be a segfault, it
is a good idea in practice to frequently flush the assembly within a loop.

I guess my question is whether you think the segfault I am getting is due
to setting too many matrix values, and how I should go about flushing the
assembly.  Since I add values to J12 using MatSetValuesLocal, do I need to
call MatRestoreLocalSubmatrix to restore J12, MatAssemblyBegin and End with
MAT_FLUSH_ASSEMBLY on the full Jacobian, then call MatGetLocalSubmatrix to
get J12 again, all inside the loop?

One last point - when I call MatSetValuesLocal on J12, I do that on all
processors, with each processor accessing only the columns which correspond
to the grid points it owns.  Is there anything wrong with doing that?  Do I
need to only set values on the processor which owns the DMRedundant?  If I
call MatGetOwnershipRange on J12, it appears that all processes own the
matrix's single row.

Thanks for any help you can share.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190820/2dbc0678/attachment-0001.html>


More information about the petsc-users mailing list