[petsc-dev] Memory corruption on 'master' (barry/improve-memory-logging)

Tue Aug 20 17:55:38 CDT 2013

  Some one asks: how much memory is that linear solver using? Your answer is "I don't know" and never will know. 

  Even if we allow multiple parents I think it is all manageable. Note that keeping track of parents and children is orthogonal to the reference counting we do now. So tracking parent and children is possible and straightforward.

   Tell you what. Remove the commit and delete the branch completely.

   I will make a new branch to track parent/children that is safe. Then we can debate it.

   Barry

Note that this happened because of the problem of pull requests festering in pull request hell for months; I completely forgot the issues with regard to it. We need a better way to manage pull request hell.

On Aug 20, 2013, at 5:31 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> Barry Smith <bsmith at mcs.anl.gov> writes:
> 
>>  So the problem doesn't appear in any test, you have to write a silly
>>  example to demonstrate it?
> 
> No, I wrote that snippet to have a _reduced_ test case, but it's not a
> hypothetical because existing tests were failing.  KSP tutorials
> runex55_Classical crashes reliably on my machine for this reason.  In
> some cases, the old memory doesn't get overwritten and the next parent
> traversal succeeds, so running in Valgrind would be a quicker way to
> find other examples.  The pattern is used in other places.
> 
>>  How about instead removing TSSetSNES, SNESSetKSP, KSPSetPC and any
>>  other weird shit that I put in that shouldn't be there away?
> 
> No, we use these to share an inner component without disrupting the
> outer context.  For example, we make a KSP for eigenvalue estimation and
> use the inner PC.  My eigen-analysis plugin makes a SLEPc EPS and uses
> the PC from the primary solve.  In TS, we create an inner TS as a
> starting method for DAE when using a method with explicit first stage,
> or with methods that always require a starting method.
> 
>> 
>>   I think it is important that we maintain internally information
>>   about relationships of objects. I would rather fix the problem
>>   rather than remove the problem by removing the information.
>> 
>>   Note that each object can have at most one parent at a time. 
> 
> This is not true because we reuse "child" components, see above.  The
> parent-child relationship is surely a DAG, and I'm only hopeful, but not
> confident, that it doesn't have cycles.
> 
> If you want to have a strong parent-child relationship, then it should
> be *enforced* by abandoning reference counting and instead giving each
> object a single exclusive "owner".  (I'm not actually serious; it's not
> feasible for objects to have unique "parents" in general.)
> 
>>   We could even restrict it so that an object can only have one
>>   parent ever.
>> 
>>   I think we can manage everything by simply having each object
>>   maintain a list of children and updating that list as children are
>>   destroyed and removing the parent reference when the parent is
>>   destroyed. I will prepare a branch to add this functionality.  Use
>>   of memory would only ever be assigned to one parent so it would not
>>   be double counted.
> 
> What is the great value that you will gain by holding this relationship?
> It makes the model lots more complicated, and debugging that counting,
> accidental cycles, and the like will be much worse than debugging the
> existing reference counting (which can be hairy, but at least debuggers
> do a decent job).