[petsc-dev] [petsc-users] SNESSetFunctionDomainError

Derek Gaston friedmud at gmail.com
Thu Feb 19 19:24:23 CST 2015


No problem on our end with MatSetFailure() being collective... we can
guarantee that easily.

Thanks for working on this! This has been a high priority feature request
for a while.

Derek
On Thu, Feb 19, 2015 at 3:41 PM Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> > On Feb 19, 2015, at 2:06 PM, Dmitry Karpeyev <karpeev at mcs.anl.gov>
> wrote:
> >
> >
> >
> > On Thu Feb 19 2015 at 12:59:12 PM Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> >
> > > On Feb 19, 2015, at 1:56 PM, Dmitry Karpeyev <karpeev at mcs.anl.gov>
> wrote:
> > >
> > >
> > >
> > > On Thu Feb 19 2015 at 12:41:59 PM Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > >
> > > Yeah, that sounds like a good fix, except for this: we have to make
> sure all ranks diverge with this failure so that the user can retry the
> solve, if necessary.
> >
> >    This is the business of the person calling MatSetFailure(). We can
> require that this routine be a collective.
> > Yes, but this way any "resilient" code is required to carry out an extra
> reduction every KSP iteration, and we are not giving them an opportunity to
> coalesce it with anything else.
>
>    For each Krylov method the "point" in the code where the coalescence
> can take place is different and with methods like pipelined GMRES the point
> where the coalescence can be done is AFTER the point where the results of
> the (failed) multiply are used so coding can be tricky.  For now I won't
> worry about coalescence and would just require MatSetFailure() to
> collective. We need more experience before we try to be fancy.
>
>    If you do any coding use my branch barry/use-ksp_matmult_pcapply-
> in-ksp-methods
>
> Barry
>
> >
> > In any event, I can put this MatSetFailure()/MatSetFailureCollective()
> in.
> > Dmitry.
> >
> >  Barry
> >
> > > That would require an extra reduction every KSP iteration.
> > > With Inf or NaN we could piggyback on the norm computation.
> > >
> > > Dmitry.
> > >
> > >    Barry
> > >
> > >
> > >
> > >
> > > > On Feb 19, 2015, at 9:33 AM, Dmitry Karpeyev <karpeev at mcs.anl.gov>
> wrote:
> > > >
> > > > I wanted to revive this thread and move it to petsc-dev. This
> problem seems to be harder than I realized.
> > > >
> > > > Suppose MatMult inside KSPSolve() inside SNESSolve() cannot compute
> a valid output vector.
> > > > For example, it's a MatMFFD and as part of its function evaluation
> it  has to evaluate an implicitly-defined
> > > > constitutive model (e.g., solve an equation of state) and this inner
> solve diverges
> > > > (e.g., the time step is too big).  I want to be able to abort the
> linear
> > > > solve and the nonlinear solve, return a suitable "converged" reason
> and let the user retry, maybe with a
> > > > different timestep size.  This is for a hand-rolled time stepper,
> but TS would face similar issues.
> > > >
> > > > Based on the previous thread here http://lists.mcs.anl.gov/
> pipermail/petsc-users/2014-August/022597.html
> > > > I tried marking the result of MatMult as "invalid" and let it
> propagate up to KSPSolve() where it can be handled.
> > > > This quickly gets out of control, since the invalid Vec isn't
> returned to the KSP immediately.  It could be a work
> > > > vector, which is fed into PCApply() along various code paths,
> depending on the side of the preconditioner, whether it's a
> > > > transpose solve, etc.  Each of these transformations (e.g.,
> PCApply()) would then have to check the validity of
> > > > the input argument, clear its error condition and set it on the
> output argument, etc.  Very error-prone and fragile.
> > > > Not to mention the large amount of code to sift through.
> > > >
> > > > This is a general problem of exception handling -- we want to
> "unwind" the stack to the point where the problem should
> > > > be handled, but there doesn't seem to a good way to do it.  We also
> want to be able to clear all of the error conditions
> > > > on the way up (e.g., mark vectors as valid again, but not too
> early), otherwise we leave the solver in an invalid state.
> > > >
> > > >
> > > > Instead of passing an exception condition up the stack I could try
> storing that condition in one of the more globally-visible
> > > > objects (e.g., the Mat), but if the error occurs inside the
> evaluation of the residual that's being differenced, it doesn't really
> > > > have access to the Mat.  This probably raises various thread safety
> issues as well.
> > > >
> > > > Using SNESSetFunctionDomainError() doesn't seem to be a solution: a
> MatMFFD created with MatCreateSNESMF()
> > > > has a pointer to SNES, but the function evaluation code actually has
> no clue about that. More generally, I don't
> > > > know whether we want to wait for the linear solve to complete before
> handling this exception: it is unnecessary,
> > > > it might be an expensive linear solve and the result of such a
> KSPSolve() is probably undefined and might blow up in
> > > > unexpected ways.  I suppose if there is a way to get a hold of SNES,
> each subsequent MatMult_MFFD has to check
> > > > whether the domain error is set and return early in that case?  We
> would still have to wait for the linear solve to grind
> > > > through the rest of its iterations.    I don't know, however, if
> there is a good way to guarantee that linear solver will get
> > > > through this quickly and without unintended consequences. Should
> MatMFFD also get a hold of the KSP and set a flag
> > > > there to abort?  I still don't know what the intervening code (e.g.,
> the various PCApply()) will do before the KSP has a
> > > > chance to deal with this.
> > > >
> > > > I'm now thinking that setting some vector entries to NaN might be a
> good solution: I hope this NaN will propagate all the
> > > > way up through the subsequent arithmetic operations (does the IEEE
> floating-point arithmetic guarantees?), this "error
> > > > condition" gets automatically cleared the next time the vector is
> recomputed, since its values are reset.  Finally, I want
> > > > this exception to be detected globally but without incurring an
> extra reduction every time the residual is evaluated,
> > > > and NaN will be show up in the norm that (most) KSPs would compute
> anyway.  That way KSP could diverge with a
> > > > KSP_DIVERGED_NAN or a similar reason and the user would have an
> option to retry.  The problem with this approach
> > > > is that VecValidEntries() in MatMult() and PCApply() will throw an
> error before this can work, so I'm trying to think about
> > > > good ways of turning it off.  Any ideas about how to do this?
> > > >
> > > > Incidentally, I realized that I don't understand how
> SNESFunctionDomainError can be handled gracefully in the current
> > > > set up: it's not set or checked collectively, so there isn't a good
> way to abort and retry across the whole comm, is there?
> > > >
> > > > Dmitry.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Sun Aug 31 2014 at 10:12:53 PM Jed Brown <jed at jedbrown.org>
> wrote:
> > > > Dmitry Karpeyev <karpeev at mcs.anl.gov> writes:
> > > >
> > > > > Handling this at the KSP level (I actually think the Mat level is
> more
> > > > > appropriate, since the operator, not the solver, knows its domain),
> > > >
> > > > We are dynamically discovering the domain, but I don't think it's
> > > > appropriate for Mat to refuse to evaluate any more matrix actions
> until
> > > > some action is taken at the MatMFFD/SNESMF level.  Marking the Vec
> > > > invalid is fine, but some action needs to be taken and if Derek wants
> > > > the SNES to skip further evaluations, we need to propagate the
> > > > information up the stack somehow.
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20150220/2fe7cb85/attachment.html>


More information about the petsc-dev mailing list