[petsc-dev] problem with your DMCountNonCyclicReferences code?
Tobin Isaac
tisaac at uchicago.edu
Wed Mar 16 01:21:50 CDT 2016
On Tue, Mar 15, 2016 at 11:47:53PM -0500, Barry Smith wrote:
>
> This is a really nasty problem. The example as previously written was completely reasonable, so your fix is a total hack :-). All the circular reference counting in PETSc is problematic because it is so dependent on exactly the details of how each particular object and its relationships are handled.
I agree that the need to call VecSetDM() in that case is bad, and it
stems from assuming that the recycled vectors reference the dm: if
we're going to count circular references, we should actually count
them instead of assuming they exist.
Where I added DMDestroy() in the Coarsen() routine, however, was in
line with the kind of code we typically expect from users.
>
> Do we really need to even allow these nasty circular relationships to exist? What would we lose if we, for example, removed the two way relationships between the DMs and the Vecs? Just a little efficiency in not needing to create new Vecs because we can recycle them? But at the cost of very difficult to debug code that "should just work?" Similarly the nasty circular dependencies with dm->coarseMesh is done for "efficiency", is there a way to keep the efficiency but not the tricking dependencies?
I introduced dm->fineMesh, and I'll consider removing it, but having
both dm->coarseMesh and dm->fineMesh references is about more than
just efficiency. Particularly with the inverted multigrid that
everyone's working on, there are workflows where it is more natural
for the user to just maintain a handle on the coarsest mesh, not the
finest mesh.
>
> I accept your "fix", thanks for figuring it out so quickly! but don't like it :-).
>
> Barry
>
>
>
> > On Mar 15, 2016, at 11:30 PM, Tobin Isaac <tisaac at uchicago.edu> wrote:
> >
> >
> > I pushed a fix. There's a long explanation in the commit message:
> > while this could be called user error, the cycle counting isn't very
> > robust and should probably be changed.
> >
> > Toby
> >
> > On Tue, Mar 15, 2016 at 09:54:53PM -0500, Barry Smith wrote:
> >>
> >> Dang, dang, dang, I can't believe I fell for that git trapdoor. Ok pushed now.
> >>
> >> Barry
> >>
> >>> On Mar 15, 2016, at 9:46 PM, Tobin Isaac <tisaac at uchicago.edu> wrote:
> >>>
> >>>
> >>> Barry, please check in ex65.c
> >>>
> >>> On Sun, Mar 13, 2016 at 04:20:06PM -0500, Barry Smith wrote:
> >>>>
> >>>> Toby,
> >>>>
> >>>> I'm trying to put together a very simple but complete DMSHELL example for popov at uni-mainz.de and having some trouble which I think it might point to a bug or logical error in the code you wrote for maintaining dm->coarseMesh and dm->fineMesh and stuff.
> >>>>
> >>>> $ petscmpiexec -valgrind -n 1 ./ex65 -pc_type mg -pc_mg_levels 2
> >>>> ==80209== Invalid read of size 8
> >>>> ==80209== at 0x100A9E2D5: DMCountNonCyclicReferences (dm.c:500)
> >>>> ==80209== by 0x100A8F70A: DMDestroy (dm.c:573)
> >>>> ==80209== by 0x101221BBE: KSPDestroy (itfunc.c:985)
> >>>> ==80209== by 0x1010BCBFC: PCDestroy_MG (mg.c:302)
> >>>> ==80209== by 0x1010E23F7: PCDestroy (precon.c:122)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x100001C4C: main (in ./ex65)
> >>>> ==80209== Address 0x10398fd68 is 5,864 bytes inside a block of size 6,196 free'd
> >>>> ==80209== at 0x10001595D: free (vg_replace_malloc.c:480)
> >>>> ==80209== by 0x1000FE393: PetscFreeAlign (mal.c:72)
> >>>> ==80209== by 0x100100D1E: PetscTrFreeDefault (mtr.c:315)
> >>>> ==80209== by 0x100A91C5A: DMDestroy (dm.c:716)
> >>>> ==80209== by 0x1010E2478: PCDestroy (precon.c:123)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x1010BCBFC: PCDestroy_MG (mg.c:302)
> >>>> ==80209== by 0x1010E23F7: PCDestroy (precon.c:122)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x100001C4C: main (in ./ex65)
> >>>> ==80209==
> >>>> ==80209== Invalid read of size 8
> >>>> ==80209== at 0x100A9E2D5: DMCountNonCyclicReferences (dm.c:500)
> >>>> ==80209== by 0x100A8F70A: DMDestroy (dm.c:573)
> >>>> ==80209== by 0x1010E2478: PCDestroy (precon.c:123)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x1010BCBFC: PCDestroy_MG (mg.c:302)
> >>>> ==80209== by 0x1010E23F7: PCDestroy (precon.c:122)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x100001C4C: main (in ./ex65)
> >>>> ==80209== Address 0x10398fd68 is 5,864 bytes inside a block of size 6,196 free'd
> >>>> ==80209== at 0x10001595D: free (vg_replace_malloc.c:480)
> >>>> ==80209== by 0x1000FE393: PetscFreeAlign (mal.c:72)
> >>>> ==80209== by 0x100100D1E: PetscTrFreeDefault (mtr.c:315)
> >>>> ==80209== by 0x100A91C5A: DMDestroy (dm.c:716)
> >>>> ==80209== by 0x1010E2478: PCDestroy (precon.c:123)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x1010BCBFC: PCDestroy_MG (mg.c:302)
> >>>> ==80209== by 0x1010E23F7: PCDestroy (precon.c:122)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x100001C4C: main (in ./ex65)
> >>>> ==80209==
> >>>> ==80209== Invalid read of size 8
> >>>> ==80209== at 0x100A9E2D5: DMCountNonCyclicReferences (dm.c:500)
> >>>> ==80209== by 0x100A8F70A: DMDestroy (dm.c:573)
> >>>> ==80209== by 0x100001CBC: main (in ./ex65)
> >>>> ==80209== Address 0x10398fd68 is 5,864 bytes inside a block of size 6,196 free'd
> >>>> ==80209== at 0x10001595D: free (vg_replace_malloc.c:480)
> >>>> ==80209== by 0x1000FE393: PetscFreeAlign (mal.c:72)
> >>>> ==80209== by 0x100100D1E: PetscTrFreeDefault (mtr.c:315)
> >>>> ==80209== by 0x100A91C5A: DMDestroy (dm.c:716)
> >>>> ==80209== by 0x1010E2478: PCDestroy (precon.c:123)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x1010BCBFC: PCDestroy_MG (mg.c:302)
> >>>> ==80209== by 0x1010E23F7: PCDestroy (precon.c:122)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x100001C4C: main (in ./ex65)
> >>>> ==80209==
> >>>> ==80209== Invalid read of size 8
> >>>> ==80209== at 0x100A914C4: DMDestroy (dm.c:696)
> >>>> ==80209== by 0x100001CBC: main (in ./ex65)
> >>>> ==80209== Address 0x10398fd68 is 5,864 bytes inside a block of size 6,196 free'd
> >>>> ==80209== at 0x10001595D: free (vg_replace_malloc.c:480)
> >>>> ==80209== by 0x1000FE393: PetscFreeAlign (mal.c:72)
> >>>> ==80209== by 0x100100D1E: PetscTrFreeDefault (mtr.c:315)
> >>>> ==80209== by 0x100A91C5A: DMDestroy (dm.c:716)
> >>>> ==80209== by 0x1010E2478: PCDestroy (precon.c:123)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x1010BCBFC: PCDestroy_MG (mg.c:302)
> >>>> ==80209== by 0x1010E23F7: PCDestroy (precon.c:122)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x100001C4C: main (in ./ex65)
> >>>> ==80209==
> >>>> ==80209== Invalid read of size 4
> >>>> ==80209== at 0x1002319B4: PetscCheckPointer (checkptr.c:106)
> >>>> ==80209== by 0x100A8F5C6: DMDestroy (dm.c:570)
> >>>> ==80209== by 0x100A9156F: DMDestroy (dm.c:699)
> >>>> ==80209== by 0x100001CBC: main (in ./ex65)
> >>>> ==80209== Address 0x10398ece0 is 1,632 bytes inside a block of size 6,196 free'd
> >>>> ==80209== at 0x10001595D: free (vg_replace_malloc.c:480)
> >>>> ==80209== by 0x1000FE393: PetscFreeAlign (mal.c:72)
> >>>> ==80209== by 0x100100D1E: PetscTrFreeDefault (mtr.c:315)
> >>>> ==80209== by 0x100A91C5A: DMDestroy (dm.c:716)
> >>>> ==80209== by 0x1010E2478: PCDestroy (precon.c:123)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x1010BCBFC: PCDestroy_MG (mg.c:302)
> >>>> ==80209== by 0x1010E23F7: PCDestroy (precon.c:122)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x100001C4C: main (in ./ex65)
> >>>> ==80209==
> >>>> ==80209== Invalid read of size 4
> >>>> ==80209== at 0x100A8F630: DMDestroy (dm.c:570)
> >>>> ==80209== by 0x100A9156F: DMDestroy (dm.c:699)
> >>>> ==80209== by 0x100001CBC: main (in ./ex65)
> >>>> ==80209== Address 0x10398ece0 is 1,632 bytes inside a block of size 6,196 free'd
> >>>> ==80209== at 0x10001595D: free (vg_replace_malloc.c:480)
> >>>> ==80209== by 0x1000FE393: PetscFreeAlign (mal.c:72)
> >>>> ==80209== by 0x100100D1E: PetscTrFreeDefault (mtr.c:315)
> >>>> ==80209== by 0x100A91C5A: DMDestroy (dm.c:716)
> >>>> ==80209== by 0x1010E2478: PCDestroy (precon.c:123)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x1010BCBFC: PCDestroy_MG (mg.c:302)
> >>>> ==80209== by 0x1010E23F7: PCDestroy (precon.c:122)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x100001C4C: main (in ./ex65)
> >>>> ==80209==
> >>>> ==80209== Invalid read of size 4
> >>>> ==80209== at 0x100A8F641: DMDestroy (dm.c:570)
> >>>> ==80209== by 0x100A9156F: DMDestroy (dm.c:699)
> >>>> ==80209== by 0x100001CBC: main (in ./ex65)
> >>>> ==80209== Address 0x10398ece0 is 1,632 bytes inside a block of size 6,196 free'd
> >>>> ==80209== at 0x10001595D: free (vg_replace_malloc.c:480)
> >>>> ==80209== by 0x1000FE393: PetscFreeAlign (mal.c:72)
> >>>> ==80209== by 0x100100D1E: PetscTrFreeDefault (mtr.c:315)
> >>>> ==80209== by 0x100A91C5A: DMDestroy (dm.c:716)
> >>>> ==80209== by 0x1010E2478: PCDestroy (precon.c:123)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x1010BCBFC: PCDestroy_MG (mg.c:302)
> >>>> ==80209== by 0x1010E23F7: PCDestroy (precon.c:122)
> >>>> ==80209== by 0x101221C3A: KSPDestroy (itfunc.c:986)
> >>>> ==80209== by 0x100001C4C: main (in ./ex65)
> >>>> ==80209==
> >>>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> >>>> [0]PETSC ERROR: Invalid argument
> >>>> [0]PETSC ERROR: Wrong type of object: Parameter # 1
> >>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> >>>> [0]PETSC ERROR: Petsc Development GIT revision: pre-tsfc-829-g3974c78 GIT Date: 2016-03-11 17:51:48 -0600
> >>>> [0]PETSC ERROR: ./ex65 on a arch-basic named Barrys-MacBook-Pro.local by barrysmith Sun Mar 13 16:13:10 2016
> >>>> [0]PETSC ERROR: Configure options --with-mpi-dir=/Users/barrysmith/PetscLibraries PETSC_ARCH=arch-basic
> >>>> [0]PETSC ERROR: #1 DMDestroy() line 570 in /Users/barrysmith/Src/petsc/src/dm/interface/dm.c
> >>>> [0]PETSC ERROR: #2 DMDestroy() line 699 in /Users/barrysmith/Src/petsc/src/dm/interface/dm.c
> >>>> [0]PETSC ERROR: #3 main() line 67 in /Users/barrysmith/Src/petsc/src/ksp/ksp/examples/tutorials/ex65.c
> >>>> [0]PETSC ERROR: PETSc Option Table entries:
> >>>> [0]PETSC ERROR: -malloc_test
> >>>> [0]PETSC ERROR: -pc_mg_levels 2
> >>>> [0]PETSC ERROR: -pc_type mg
> >>>> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov----------
> >>>>
> >>>> The code is in the branch barry/add-dmshellcreaterestriction src/ksp/ksp/examples/tutorials/ex65.c which creates a DMSHELL that just uses an inner DMDA1 to create the objects. The code is virtually identical to ex25.c which just uses the DMDA1d directly but does not crash. It seems to me that having the DM objects be shells instead of DMDA should make absolutely no difference in your logic for tracking dm->coarseMesh etc but somehow something is fishy!!!! I could have a mistake in my example code but I do not think so.
> >>>>
> >>>> Could you please take a look at the problem, feel free to add fixes directly to the branch.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Barry
> >>>>
> >>>>
> >>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20160316/df669e3f/attachment.sig>
More information about the petsc-dev
mailing list