[petsc-dev] Memory corruption on 'master' (barry/improve-memory-logging)

Barry Smith bsmith at mcs.anl.gov
Tue Aug 20 17:11:33 CDT 2013


  So the problem doesn't appear in any test, you have to write a silly example to demonstrate it?

  How about instead removing TSSetSNES, SNESSetKSP, KSPSetPC and any other weird shit that I put in that shouldn't be there away?

   I think it is important that we maintain internally information about relationships of objects. I would rather fix the problem rather than remove the problem by removing the information.  

   Note that each object can have at most one parent at a time. We could even restrict it so that an object can only have one parent ever. 

   I think we can manage everything by simply having each object maintain a list of children and updating that list as children are destroyed and removing the parent reference when the parent is destroyed. I will prepare a branch to add this functionality.  Use of memory would only ever be assigned to one parent so it would not be double counted.

   Barry
 

On Aug 20, 2013, at 2:58 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> Barry, back in May, we discussed in the commit comments how
> 'barry/improve-memory-logging' would leave dangling references.
> 
> https://bitbucket.org/petsc/petsc/commits/27b6d19d01a4b6521ae8cc73b0b6b8ae1e4130c4?at=barry/improve-memory-logging
> 
>  [The reference there to an issue should be to #43, not #33.
>   https://bitbucket.org/petsc/petsc/issue/44/parent-object-keeps-linked-list-of]
> 
> I now see that you merged to 'next' in
> 
> commit 58851192db7fb3a0d13a9d343e7fb3758861e5c5
> Merge: 5004532 8990d58
> Author: Barry Smith <bsmith at mcs.anl.gov>
> Date:   Sat Aug 17 16:58:27 2013 -0500
> 
>    Merge branch 'barry/improve-memory-logging' into next
> 
> 
> and merged to 'master' 45 minutes later (no nightly tests)
> 
> commit 9b9212943afab704d2d52b88268833e129aca788
> Merge: ac1d125 8990d58
> Author: Barry Smith <bsmith at mcs.anl.gov>
> Date:   Sat Aug 17 17:45:43 2013 -0500
> 
>    Merge branch 'barry/improve-memory-logging'
> 
> 
> However, this is *totally* broken for the reasons discussed in the
> commit comments above.  I encountered this with GAMG Classical, but it
> can occur in many places.  Here is a patch that demonstrates the
> breakage.
> 
> diff --git i/src/ksp/ksp/examples/tutorials/ex2.c w/src/ksp/ksp/examples/tutorials/ex2.c
> index 49200de..5f588c7 100644
> --- i/src/ksp/ksp/examples/tutorials/ex2.c
> +++ w/src/ksp/ksp/examples/tutorials/ex2.c
> @@ -163,6 +163,15 @@ int main(int argc,char **args)
>   */
>   ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
> 
> +  {
> +    KSP eksp;
> +    PC pc;
> +    ierr = KSPCreate(PETSC_COMM_WORLD,&eksp);CHKERRQ(ierr);
> +    ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
> +    ierr = KSPSetPC(eksp,pc);CHKERRQ(ierr);
> +    ierr = KSPDestroy(&eksp);CHKERRQ(ierr);
> +  }
> +
>   /*
>      Set operators. Here the matrix that defines the linear system
>      also serves as the preconditioning matrix.
> 
> 
> This part actually completes successfully and all is well as far as
> reference counting is concerned, but ((PetscObject)pc)->parent now
> points at the old memory for eksp, leading to SEGV the next time
> PetscLogObjectMemory or PetscLogObjectParent is called.
> 
> ==10082== Invalid read of size 8
> ==10082==    at 0x4E8CB27: PetscLogObjectMemory (plog.c:36)
> ==10082==    by 0x5B92EA1: PCCreate_ICC (icc.c:183)
> ==10082==    by 0x5CB48CE: PCSetType (pcset.c:88)
> ==10082==    by 0x5CB58B8: PCSetFromOptions (pcset.c:168)
> ==10082==    by 0x5D89630: KSPSetFromOptions (itcl.c:357)
> ==10082==    by 0x4032F8: main (ex2.c:202)
> ==10082==  Address 0xc736400 is 48 bytes inside a block of size 1,000 free'd
> ==10082==    at 0x4C288AC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==10082==    by 0x4EDD7D3: PetscFreeAlign (mal.c:70)
> ==10082==    by 0x5D9B8BE: KSPDestroy (itfunc.c:795)
> ==10082==    by 0x4030EB: main (ex2.c:172)
> 
> 
> PetscErrorCode PetscLogObjectMemory(PetscObject p,PetscLogDouble m)
> {
>  p->mem += m;
>  p       =  p->parent;
>  while (p) {
>    /* Create all ancestors with the memory */
>    p->memchildren += m;                    /* <---- Line 36 */
>    p               =  p->parent;
>  }
>  return 0;
> }
> 
> 
> I still think this is really complicated to do right and will inevitably
> lead to subtle non-local bugs.  In the general case, objects can have
> multiple parents and multiple children.  How would we avoid
> over-counting?  How do we make the addition or removal of parent-child
> relationships commutative in the sense that the memory logging would be
> the same?
> 
> 
> PetscErrorCode PetscLogObjectParent(PetscObject p,PetscObject c)
> {
>  PetscObject pp = p;
>  if (!c || !p) return 0;
>  while (!c->parent && pp) {
>    /* if not credited elsewhere credit all childs memory to all new ancestors */
>    pp->memchildren += c->mem + c->memchildren;
>    pp               = pp->parent;
>  }
>  c->parent   = p;
>  c->parentid = p->id;
>  return 0;
> }
> 
> 
> 
> Can we revert this parent traversal on 'master' for now?  (I don't like
> that the old code stores a pointer to the parent at all, but it was
> benign since it was never used.)




More information about the petsc-dev mailing list