[petsc-users] Jacobian for DMComposite, containing DMRedundant

Smith, Barry F. bsmith at mcs.anl.gov
Tue Aug 20 01:07:48 CDT 2019


  Before the latest release we changed the way we handled communicating matrix entries that needed to be sent between processors; normally that way would be much faster but another user reported a pathologically case where the PetscSortIntWithArrayPair() was producing a huge number of nested calls. That user actually provided a fix (a better sort) that resolved his problem. 

 You can either run with -matstash_legacy (may need a prefix on the this your matrix has a prefix) to get the old behavior or 

 You can switch to the master branch in the PETSc repository version to use it.  Note we have switched to Gitlab for our repository. You can get the master branch with  git clone https://gitlab.com/petsc/petsc

  Please let us know if this does not resolve your problem

  Barry


  


> On Aug 20, 2019, at 12:42 AM, zakaryah . via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> I'm working on solving a nonlinear system with a DMComposite, which is composed of a DMRedundant and a DMDA.  The DMRedundant has a single dof, while the DMDA can be large (n grid points, where n is a few million).  The Jacobian is then (n+1)x(n+1).  I set it by calling MatGetLocalSubMatrix 4 times, to get the submatrices J11 (1 x 1), J12 (1 x n), J21 (n x 1), and J22 (n x n).  In the past, setting the values for these matrices seemed to work fine - the Jacobian was correct, the code was valgrind quiet, etc.  Since then, something has changed - I went from v. 3.11.0 to 3.11.3, my sysadmin may have fiddled with MPI, I'm experimenting with different problem sizes and number of processors, etc.
> 
> Now I am getting segfaults, and the valgrind trace has not been terribly illuminating.  Both valgrind and the debugger indicate the segfault originating in PetscSortIntWithArrayPair_Private, with the back trace showing nested calls to this function running *thousands* deep...  I'm not sure but I suspect that the trace to PetscSortIntWithArrayPair_Private originates in MatAssemblyBegin.
> 
> I've created a working example which is minimal for my purposes (although probably not fun for anyone else to inspect), and I've narrowed the issue down to the code which sets the values of J12, the submatrix with a single row (the DMRedundant dof) and many columns (n).  If I don't set any values in that submatrix, and change the function so that this Jacobian is correct, then the code goes back to being valgrind quiet, completing correctly (for the modified problem), and not throwing a segfault.  If I run in serial, it also works.
> 
> Searching the archive, I found an old thread about segfaults in MatAssemblyBegin when a very large number of values were set between flushes, and Matt said that even though there should not be a segfault, it is a good idea in practice to frequently flush the assembly within a loop.
> 
> I guess my question is whether you think the segfault I am getting is due to setting too many matrix values, and how I should go about flushing the assembly.  Since I add values to J12 using MatSetValuesLocal, do I need to call MatRestoreLocalSubmatrix to restore J12, MatAssemblyBegin and End with MAT_FLUSH_ASSEMBLY on the full Jacobian, then call MatGetLocalSubmatrix to get J12 again, all inside the loop?
> 
> One last point - when I call MatSetValuesLocal on J12, I do that on all processors, with each processor accessing only the columns which correspond to the grid points it owns.  Is there anything wrong with doing that?  Do I need to only set values on the processor which owns the DMRedundant?  If I call MatGetOwnershipRange on J12, it appears that all processes own the matrix's single row.
> 
> Thanks for any help you can share.
> 
> 
> 



More information about the petsc-users mailing list