[petsc-dev] [petsc-users] SNESSetFunctionDomainError

Barry Smith bsmith at mcs.anl.gov
Thu Feb 19 13:41:41 CST 2015


All the Krylov methods are supposed to use the routines KSP_MatMult(), KSP_PCApply() etc and not the MatMult(), PCApply() etc directly (I'm fixing some that don't as we speak).

I propose that these KSP_xxx()  routines have code like

     ....
     MatGetFailure(mat,&failure);
     if (failure) {
        if (ksp->errorifnotconverged) SETERRQ()
        else   ksp->convergedreason = KSP_DIVERGED_MATMULT_FAILURE
    }
    and similar for PC failure.

    Then all the Krylov solvers have simple checks:  if (ksp->converged reason) PetscFunctionReturn(0); after eachl KSP_xxx() call.

   Meanwhile 

   PetscErrorCode MatSetFailure(Mat mat)
      mat->failureset = PETSC_TRUE;

   and 

    PetscErrorCode MatGetFailure(Mat mat,PetscBool *failure)
      *failure = mat->failureset;
       mat->failureset = PETSC_FALSE;

    Similar for PC.

    Making these changes in the code  is straightforward. Does it handle everything we need? 

   Barry




> On Feb 19, 2015, at 9:33 AM, Dmitry Karpeyev <karpeev at mcs.anl.gov> wrote:
> 
> I wanted to revive this thread and move it to petsc-dev. This problem seems to be harder than I realized.
> 
> Suppose MatMult inside KSPSolve() inside SNESSolve() cannot compute a valid output vector. 
> For example, it's a MatMFFD and as part of its function evaluation it  has to evaluate an implicitly-defined 
> constitutive model (e.g., solve an equation of state) and this inner solve diverges 
> (e.g., the time step is too big).  I want to be able to abort the linear 
> solve and the nonlinear solve, return a suitable "converged" reason and let the user retry, maybe with a 
> different timestep size.  This is for a hand-rolled time stepper, but TS would face similar issues. 
> 
> Based on the previous thread here http://lists.mcs.anl.gov/pipermail/petsc-users/2014-August/022597.html
> I tried marking the result of MatMult as "invalid" and let it propagate up to KSPSolve() where it can be handled.
> This quickly gets out of control, since the invalid Vec isn't returned to the KSP immediately.  It could be a work
> vector, which is fed into PCApply() along various code paths, depending on the side of the preconditioner, whether it's a
> transpose solve, etc.  Each of these transformations (e.g., PCApply()) would then have to check the validity of
> the input argument, clear its error condition and set it on the output argument, etc.  Very error-prone and fragile.
> Not to mention the large amount of code to sift through.
> 
> This is a general problem of exception handling -- we want to "unwind" the stack to the point where the problem should 
> be handled, but there doesn't seem to a good way to do it.  We also want to be able to clear all of the error conditions
> on the way up (e.g., mark vectors as valid again, but not too early), otherwise we leave the solver in an invalid state.
> 
> 
> Instead of passing an exception condition up the stack I could try storing that condition in one of the more globally-visible 
> objects (e.g., the Mat), but if the error occurs inside the evaluation of the residual that's being differenced, it doesn't really 
> have access to the Mat.  This probably raises various thread safety issues as well.
> 
> Using SNESSetFunctionDomainError() doesn't seem to be a solution: a MatMFFD created with MatCreateSNESMF()
> has a pointer to SNES, but the function evaluation code actually has no clue about that. More generally, I don't
> know whether we want to wait for the linear solve to complete before handling this exception: it is unnecessary, 
> it might be an expensive linear solve and the result of such a KSPSolve() is probably undefined and might blow up in 
> unexpected ways.  I suppose if there is a way to get a hold of SNES, each subsequent MatMult_MFFD has to check
> whether the domain error is set and return early in that case?  We would still have to wait for the linear solve to grind
> through the rest of its iterations.    I don't know, however, if there is a good way to guarantee that linear solver will get 
> through this quickly and without unintended consequences. Should MatMFFD also get a hold of the KSP and set a flag 
> there to abort?  I still don't know what the intervening code (e.g., the various PCApply()) will do before the KSP has a
> chance to deal with this.
> 
> I'm now thinking that setting some vector entries to NaN might be a good solution: I hope this NaN will propagate all the 
> way up through the subsequent arithmetic operations (does the IEEE floating-point arithmetic guarantees?), this "error
> condition" gets automatically cleared the next time the vector is recomputed, since its values are reset.  Finally, I want
> this exception to be detected globally but without incurring an extra reduction every time the residual is evaluated,
> and NaN will be show up in the norm that (most) KSPs would compute anyway.  That way KSP could diverge with a
> KSP_DIVERGED_NAN or a similar reason and the user would have an option to retry.  The problem with this approach
> is that VecValidEntries() in MatMult() and PCApply() will throw an error before this can work, so I'm trying to think about
> good ways of turning it off.  Any ideas about how to do this?
> 
> Incidentally, I realized that I don't understand how SNESFunctionDomainError can be handled gracefully in the current 
> set up: it's not set or checked collectively, so there isn't a good way to abort and retry across the whole comm, is there?
> 
> Dmitry.
> 
> 
> 
> 
> 
> 
> 
> 
> On Sun Aug 31 2014 at 10:12:53 PM Jed Brown <jed at jedbrown.org> wrote:
> Dmitry Karpeyev <karpeev at mcs.anl.gov> writes:
> 
> > Handling this at the KSP level (I actually think the Mat level is more
> > appropriate, since the operator, not the solver, knows its domain),
> 
> We are dynamically discovering the domain, but I don't think it's
> appropriate for Mat to refuse to evaluate any more matrix actions until
> some action is taken at the MatMFFD/SNESMF level.  Marking the Vec
> invalid is fine, but some action needs to be taken and if Derek wants
> the SNES to skip further evaluations, we need to propagate the
> information up the stack somehow.




More information about the petsc-dev mailing list