[petsc-dev] [petsc-users] SNESSetFunctionDomainError

Dmitry Karpeyev karpeev at mcs.anl.gov
Thu Feb 19 13:56:06 CST 2015


On Thu Feb 19 2015 at 12:41:59 PM Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> All the Krylov methods are supposed to use the routines KSP_MatMult(),
> KSP_PCApply() etc and not the MatMult(), PCApply() etc directly (I'm fixing
> some that don't as we speak).
>
> I propose that these KSP_xxx()  routines have code like
>
>      ....
>      MatGetFailure(mat,&failure);
>      if (failure) {
>         if (ksp->errorifnotconverged) SETERRQ()
>         else   ksp->convergedreason = KSP_DIVERGED_MATMULT_FAILURE
>     }
>     and similar for PC failure.
>
>     Then all the Krylov solvers have simple checks:  if (ksp->converged
> reason) PetscFunctionReturn(0); after eachl KSP_xxx() call.
>
>    Meanwhile
>
>    PetscErrorCode MatSetFailure(Mat mat)
>       mat->failureset = PETSC_TRUE;
>
>    and
>
>     PetscErrorCode MatGetFailure(Mat mat,PetscBool *failure)
>       *failure = mat->failureset;
>        mat->failureset = PETSC_FALSE;
>
>     Similar for PC.
>
>     Making these changes in the code  is straightforward. Does it handle
> everything we need?
>
Yeah, that sounds like a good fix, except for this: we have to make sure
all ranks diverge with this failure so that the user can retry the solve,
if necessary.  That would require an extra reduction every KSP iteration.
With Inf or NaN we could piggyback on the norm computation.

Dmitry.

>
>    Barry
>
>
>
>
> > On Feb 19, 2015, at 9:33 AM, Dmitry Karpeyev <karpeev at mcs.anl.gov>
> wrote:
> >
> > I wanted to revive this thread and move it to petsc-dev. This problem
> seems to be harder than I realized.
> >
> > Suppose MatMult inside KSPSolve() inside SNESSolve() cannot compute a
> valid output vector.
> > For example, it's a MatMFFD and as part of its function evaluation it
> has to evaluate an implicitly-defined
> > constitutive model (e.g., solve an equation of state) and this inner
> solve diverges
> > (e.g., the time step is too big).  I want to be able to abort the linear
> > solve and the nonlinear solve, return a suitable "converged" reason and
> let the user retry, maybe with a
> > different timestep size.  This is for a hand-rolled time stepper, but TS
> would face similar issues.
> >
> > Based on the previous thread here http://lists.mcs.anl.gov/
> pipermail/petsc-users/2014-August/022597.html
> > I tried marking the result of MatMult as "invalid" and let it propagate
> up to KSPSolve() where it can be handled.
> > This quickly gets out of control, since the invalid Vec isn't returned
> to the KSP immediately.  It could be a work
> > vector, which is fed into PCApply() along various code paths, depending
> on the side of the preconditioner, whether it's a
> > transpose solve, etc.  Each of these transformations (e.g., PCApply())
> would then have to check the validity of
> > the input argument, clear its error condition and set it on the output
> argument, etc.  Very error-prone and fragile.
> > Not to mention the large amount of code to sift through.
> >
> > This is a general problem of exception handling -- we want to "unwind"
> the stack to the point where the problem should
> > be handled, but there doesn't seem to a good way to do it.  We also want
> to be able to clear all of the error conditions
> > on the way up (e.g., mark vectors as valid again, but not too early),
> otherwise we leave the solver in an invalid state.
> >
> >
> > Instead of passing an exception condition up the stack I could try
> storing that condition in one of the more globally-visible
> > objects (e.g., the Mat), but if the error occurs inside the evaluation
> of the residual that's being differenced, it doesn't really
> > have access to the Mat.  This probably raises various thread safety
> issues as well.
> >
> > Using SNESSetFunctionDomainError() doesn't seem to be a solution: a
> MatMFFD created with MatCreateSNESMF()
> > has a pointer to SNES, but the function evaluation code actually has no
> clue about that. More generally, I don't
> > know whether we want to wait for the linear solve to complete before
> handling this exception: it is unnecessary,
> > it might be an expensive linear solve and the result of such a
> KSPSolve() is probably undefined and might blow up in
> > unexpected ways.  I suppose if there is a way to get a hold of SNES,
> each subsequent MatMult_MFFD has to check
> > whether the domain error is set and return early in that case?  We would
> still have to wait for the linear solve to grind
> > through the rest of its iterations.    I don't know, however, if there
> is a good way to guarantee that linear solver will get
> > through this quickly and without unintended consequences. Should MatMFFD
> also get a hold of the KSP and set a flag
> > there to abort?  I still don't know what the intervening code (e.g., the
> various PCApply()) will do before the KSP has a
> > chance to deal with this.
> >
> > I'm now thinking that setting some vector entries to NaN might be a good
> solution: I hope this NaN will propagate all the
> > way up through the subsequent arithmetic operations (does the IEEE
> floating-point arithmetic guarantees?), this "error
> > condition" gets automatically cleared the next time the vector is
> recomputed, since its values are reset.  Finally, I want
> > this exception to be detected globally but without incurring an extra
> reduction every time the residual is evaluated,
> > and NaN will be show up in the norm that (most) KSPs would compute
> anyway.  That way KSP could diverge with a
> > KSP_DIVERGED_NAN or a similar reason and the user would have an option
> to retry.  The problem with this approach
> > is that VecValidEntries() in MatMult() and PCApply() will throw an error
> before this can work, so I'm trying to think about
> > good ways of turning it off.  Any ideas about how to do this?
> >
> > Incidentally, I realized that I don't understand how
> SNESFunctionDomainError can be handled gracefully in the current
> > set up: it's not set or checked collectively, so there isn't a good way
> to abort and retry across the whole comm, is there?
> >
> > Dmitry.
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sun Aug 31 2014 at 10:12:53 PM Jed Brown <jed at jedbrown.org> wrote:
> > Dmitry Karpeyev <karpeev at mcs.anl.gov> writes:
> >
> > > Handling this at the KSP level (I actually think the Mat level is more
> > > appropriate, since the operator, not the solver, knows its domain),
> >
> > We are dynamically discovering the domain, but I don't think it's
> > appropriate for Mat to refuse to evaluate any more matrix actions until
> > some action is taken at the MatMFFD/SNESMF level.  Marking the Vec
> > invalid is fine, but some action needs to be taken and if Derek wants
> > the SNES to skip further evaluations, we need to propagate the
> > information up the stack somehow.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20150219/896baef7/attachment.html>


More information about the petsc-dev mailing list