[petsc-dev] Memory corruption on 'master' (barry/improve-memory-logging)
Barry Smith
bsmith at mcs.anl.gov
Tue Aug 20 17:11:33 CDT 2013
So the problem doesn't appear in any test, you have to write a silly example to demonstrate it?
How about instead removing TSSetSNES, SNESSetKSP, KSPSetPC and any other weird shit that I put in that shouldn't be there away?
I think it is important that we maintain internally information about relationships of objects. I would rather fix the problem rather than remove the problem by removing the information.
Note that each object can have at most one parent at a time. We could even restrict it so that an object can only have one parent ever.
I think we can manage everything by simply having each object maintain a list of children and updating that list as children are destroyed and removing the parent reference when the parent is destroyed. I will prepare a branch to add this functionality. Use of memory would only ever be assigned to one parent so it would not be double counted.
Barry
On Aug 20, 2013, at 2:58 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> Barry, back in May, we discussed in the commit comments how
> 'barry/improve-memory-logging' would leave dangling references.
>
> https://bitbucket.org/petsc/petsc/commits/27b6d19d01a4b6521ae8cc73b0b6b8ae1e4130c4?at=barry/improve-memory-logging
>
> [The reference there to an issue should be to #43, not #33.
> https://bitbucket.org/petsc/petsc/issue/44/parent-object-keeps-linked-list-of]
>
> I now see that you merged to 'next' in
>
> commit 58851192db7fb3a0d13a9d343e7fb3758861e5c5
> Merge: 5004532 8990d58
> Author: Barry Smith <bsmith at mcs.anl.gov>
> Date: Sat Aug 17 16:58:27 2013 -0500
>
> Merge branch 'barry/improve-memory-logging' into next
>
>
> and merged to 'master' 45 minutes later (no nightly tests)
>
> commit 9b9212943afab704d2d52b88268833e129aca788
> Merge: ac1d125 8990d58
> Author: Barry Smith <bsmith at mcs.anl.gov>
> Date: Sat Aug 17 17:45:43 2013 -0500
>
> Merge branch 'barry/improve-memory-logging'
>
>
> However, this is *totally* broken for the reasons discussed in the
> commit comments above. I encountered this with GAMG Classical, but it
> can occur in many places. Here is a patch that demonstrates the
> breakage.
>
> diff --git i/src/ksp/ksp/examples/tutorials/ex2.c w/src/ksp/ksp/examples/tutorials/ex2.c
> index 49200de..5f588c7 100644
> --- i/src/ksp/ksp/examples/tutorials/ex2.c
> +++ w/src/ksp/ksp/examples/tutorials/ex2.c
> @@ -163,6 +163,15 @@ int main(int argc,char **args)
> */
> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>
> + {
> + KSP eksp;
> + PC pc;
> + ierr = KSPCreate(PETSC_COMM_WORLD,&eksp);CHKERRQ(ierr);
> + ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
> + ierr = KSPSetPC(eksp,pc);CHKERRQ(ierr);
> + ierr = KSPDestroy(&eksp);CHKERRQ(ierr);
> + }
> +
> /*
> Set operators. Here the matrix that defines the linear system
> also serves as the preconditioning matrix.
>
>
> This part actually completes successfully and all is well as far as
> reference counting is concerned, but ((PetscObject)pc)->parent now
> points at the old memory for eksp, leading to SEGV the next time
> PetscLogObjectMemory or PetscLogObjectParent is called.
>
> ==10082== Invalid read of size 8
> ==10082== at 0x4E8CB27: PetscLogObjectMemory (plog.c:36)
> ==10082== by 0x5B92EA1: PCCreate_ICC (icc.c:183)
> ==10082== by 0x5CB48CE: PCSetType (pcset.c:88)
> ==10082== by 0x5CB58B8: PCSetFromOptions (pcset.c:168)
> ==10082== by 0x5D89630: KSPSetFromOptions (itcl.c:357)
> ==10082== by 0x4032F8: main (ex2.c:202)
> ==10082== Address 0xc736400 is 48 bytes inside a block of size 1,000 free'd
> ==10082== at 0x4C288AC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==10082== by 0x4EDD7D3: PetscFreeAlign (mal.c:70)
> ==10082== by 0x5D9B8BE: KSPDestroy (itfunc.c:795)
> ==10082== by 0x4030EB: main (ex2.c:172)
>
>
> PetscErrorCode PetscLogObjectMemory(PetscObject p,PetscLogDouble m)
> {
> p->mem += m;
> p = p->parent;
> while (p) {
> /* Create all ancestors with the memory */
> p->memchildren += m; /* <---- Line 36 */
> p = p->parent;
> }
> return 0;
> }
>
>
> I still think this is really complicated to do right and will inevitably
> lead to subtle non-local bugs. In the general case, objects can have
> multiple parents and multiple children. How would we avoid
> over-counting? How do we make the addition or removal of parent-child
> relationships commutative in the sense that the memory logging would be
> the same?
>
>
> PetscErrorCode PetscLogObjectParent(PetscObject p,PetscObject c)
> {
> PetscObject pp = p;
> if (!c || !p) return 0;
> while (!c->parent && pp) {
> /* if not credited elsewhere credit all childs memory to all new ancestors */
> pp->memchildren += c->mem + c->memchildren;
> pp = pp->parent;
> }
> c->parent = p;
> c->parentid = p->id;
> return 0;
> }
>
>
>
> Can we revert this parent traversal on 'master' for now? (I don't like
> that the old code stores a pointer to the parent at all, but it was
> benign since it was never used.)
More information about the petsc-dev
mailing list