[petsc-dev] [petsc-users] SNESSetFunctionDomainError

Dmitry Karpeyev karpeev at mcs.anl.gov
Thu Feb 19 14:06:26 CST 2015


On Thu Feb 19 2015 at 12:59:12 PM Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Feb 19, 2015, at 1:56 PM, Dmitry Karpeyev <karpeev at mcs.anl.gov>
> wrote:
> >
> >
> >
> > On Thu Feb 19 2015 at 12:41:59 PM Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> >
> > Yeah, that sounds like a good fix, except for this: we have to make sure
> all ranks diverge with this failure so that the user can retry the solve,
> if necessary.
>
>    This is the business of the person calling MatSetFailure(). We can
> require that this routine be a collective.
>
Yes, but this way any "resilient" code is required to carry out an extra
reduction every KSP iteration, and we are not giving them an opportunity to
coalesce it with anything else.

In any event, I can put this MatSetFailure()/MatSetFailureCollective() in.
Dmitry.

>
>  Barry
>
> > That would require an extra reduction every KSP iteration.
> > With Inf or NaN we could piggyback on the norm computation.
> >
> > Dmitry.
> >
> >    Barry
> >
> >
> >
> >
> > > On Feb 19, 2015, at 9:33 AM, Dmitry Karpeyev <karpeev at mcs.anl.gov>
> wrote:
> > >
> > > I wanted to revive this thread and move it to petsc-dev. This problem
> seems to be harder than I realized.
> > >
> > > Suppose MatMult inside KSPSolve() inside SNESSolve() cannot compute a
> valid output vector.
> > > For example, it's a MatMFFD and as part of its function evaluation it
> has to evaluate an implicitly-defined
> > > constitutive model (e.g., solve an equation of state) and this inner
> solve diverges
> > > (e.g., the time step is too big).  I want to be able to abort the
> linear
> > > solve and the nonlinear solve, return a suitable "converged" reason
> and let the user retry, maybe with a
> > > different timestep size.  This is for a hand-rolled time stepper, but
> TS would face similar issues.
> > >
> > > Based on the previous thread here http://lists.mcs.anl.gov/
> pipermail/petsc-users/2014-August/022597.html
> > > I tried marking the result of MatMult as "invalid" and let it
> propagate up to KSPSolve() where it can be handled.
> > > This quickly gets out of control, since the invalid Vec isn't returned
> to the KSP immediately.  It could be a work
> > > vector, which is fed into PCApply() along various code paths,
> depending on the side of the preconditioner, whether it's a
> > > transpose solve, etc.  Each of these transformations (e.g., PCApply())
> would then have to check the validity of
> > > the input argument, clear its error condition and set it on the output
> argument, etc.  Very error-prone and fragile.
> > > Not to mention the large amount of code to sift through.
> > >
> > > This is a general problem of exception handling -- we want to "unwind"
> the stack to the point where the problem should
> > > be handled, but there doesn't seem to a good way to do it.  We also
> want to be able to clear all of the error conditions
> > > on the way up (e.g., mark vectors as valid again, but not too early),
> otherwise we leave the solver in an invalid state.
> > >
> > >
> > > Instead of passing an exception condition up the stack I could try
> storing that condition in one of the more globally-visible
> > > objects (e.g., the Mat), but if the error occurs inside the evaluation
> of the residual that's being differenced, it doesn't really
> > > have access to the Mat.  This probably raises various thread safety
> issues as well.
> > >
> > > Using SNESSetFunctionDomainError() doesn't seem to be a solution: a
> MatMFFD created with MatCreateSNESMF()
> > > has a pointer to SNES, but the function evaluation code actually has
> no clue about that. More generally, I don't
> > > know whether we want to wait for the linear solve to complete before
> handling this exception: it is unnecessary,
> > > it might be an expensive linear solve and the result of such a
> KSPSolve() is probably undefined and might blow up in
> > > unexpected ways.  I suppose if there is a way to get a hold of SNES,
> each subsequent MatMult_MFFD has to check
> > > whether the domain error is set and return early in that case?  We
> would still have to wait for the linear solve to grind
> > > through the rest of its iterations.    I don't know, however, if there
> is a good way to guarantee that linear solver will get
> > > through this quickly and without unintended consequences. Should
> MatMFFD also get a hold of the KSP and set a flag
> > > there to abort?  I still don't know what the intervening code (e.g.,
> the various PCApply()) will do before the KSP has a
> > > chance to deal with this.
> > >
> > > I'm now thinking that setting some vector entries to NaN might be a
> good solution: I hope this NaN will propagate all the
> > > way up through the subsequent arithmetic operations (does the IEEE
> floating-point arithmetic guarantees?), this "error
> > > condition" gets automatically cleared the next time the vector is
> recomputed, since its values are reset.  Finally, I want
> > > this exception to be detected globally but without incurring an extra
> reduction every time the residual is evaluated,
> > > and NaN will be show up in the norm that (most) KSPs would compute
> anyway.  That way KSP could diverge with a
> > > KSP_DIVERGED_NAN or a similar reason and the user would have an option
> to retry.  The problem with this approach
> > > is that VecValidEntries() in MatMult() and PCApply() will throw an
> error before this can work, so I'm trying to think about
> > > good ways of turning it off.  Any ideas about how to do this?
> > >
> > > Incidentally, I realized that I don't understand how
> SNESFunctionDomainError can be handled gracefully in the current
> > > set up: it's not set or checked collectively, so there isn't a good
> way to abort and retry across the whole comm, is there?
> > >
> > > Dmitry.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Sun Aug 31 2014 at 10:12:53 PM Jed Brown <jed at jedbrown.org> wrote:
> > > Dmitry Karpeyev <karpeev at mcs.anl.gov> writes:
> > >
> > > > Handling this at the KSP level (I actually think the Mat level is
> more
> > > > appropriate, since the operator, not the solver, knows its domain),
> > >
> > > We are dynamically discovering the domain, but I don't think it's
> > > appropriate for Mat to refuse to evaluate any more matrix actions until
> > > some action is taken at the MatMFFD/SNESMF level.  Marking the Vec
> > > invalid is fine, but some action needs to be taken and if Derek wants
> > > the SNES to skip further evaluations, we need to propagate the
> > > information up the stack somehow.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20150219/5cdb369d/attachment.html>


More information about the petsc-dev mailing list