[petsc-dev] [petsc-users] SNESSetFunctionDomainError

Dmitry Karpeyev karpeev at mcs.anl.gov
Thu Feb 19 09:33:46 CST 2015

I wanted to revive this thread and move it to petsc-dev. This problem seems
to be harder than I realized.

Suppose MatMult inside KSPSolve() inside SNESSolve() cannot compute a valid
output vector.
For example, it's a MatMFFD and as part of its function evaluation it  has
to evaluate an implicitly-defined
constitutive model (e.g., solve an equation of state) and this inner solve
(e.g., the time step is too big).  I want to be able to abort the linear
solve and the nonlinear solve, return a suitable "converged" reason and let
the user retry, maybe with a
different timestep size.  This is for a hand-rolled time stepper, but TS
would face similar issues.

Based on the previous thread here
I tried marking the result of MatMult as "invalid" and let it propagate up
to KSPSolve() where it can be handled.
This quickly gets out of control, since the invalid Vec isn't returned to
the KSP immediately.  It could be a work
vector, which is fed into PCApply() along various code paths, depending on
the side of the preconditioner, whether it's a
transpose solve, etc.  Each of these transformations (e.g., PCApply())
would then have to check the validity of
the input argument, clear its error condition and set it on the output
argument, etc.  Very error-prone and fragile.
Not to mention the large amount of code to sift through.

This is a general problem of exception handling -- we want to "unwind" the
stack to the point where the problem should
be handled, but there doesn't seem to a good way to do it.  We also want to
be able to clear all of the error conditions
on the way up (e.g., mark vectors as valid again, but not too early),
otherwise we leave the solver in an invalid state.

Instead of passing an exception condition up the stack I could try storing
that condition in one of the more globally-visible
objects (e.g., the Mat), but if the error occurs inside the evaluation of
the residual that's being differenced, it doesn't really
have access to the Mat.  This probably raises various thread safety issues
as well.

Using SNESSetFunctionDomainError() doesn't seem to be a solution: a MatMFFD
created with MatCreateSNESMF()
has a pointer to SNES, but the function evaluation code actually has no
clue about that. More generally, I don't
know whether we want to wait for the linear solve to complete before
handling this exception: it is unnecessary,
it might be an expensive linear solve and the result of such a KSPSolve()
is probably undefined and might blow up in
unexpected ways.  I suppose if there is a way to get a hold of SNES, each
subsequent MatMult_MFFD has to check
whether the domain error is set and return early in that case?  We would
still have to wait for the linear solve to grind
through the rest of its iterations.    I don't know, however, if there is a
good way to guarantee that linear solver will get
through this quickly and without unintended consequences. Should MatMFFD
also get a hold of the KSP and set a flag
there to abort?  I still don't know what the intervening code (e.g., the
various PCApply()) will do before the KSP has a
chance to deal with this.

I'm now thinking that setting some vector entries to NaN might be a good
solution: I hope this NaN will propagate all the
way up through the subsequent arithmetic operations (does the IEEE
floating-point arithmetic guarantees?), this "error
condition" gets automatically cleared the next time the vector is
recomputed, since its values are reset.  Finally, I want
this exception to be detected globally but without incurring an extra
reduction every time the residual is evaluated,
and NaN will be show up in the norm that (most) KSPs would compute anyway.
That way KSP could diverge with a
KSP_DIVERGED_NAN or a similar reason and the user would have an option to
retry.  The problem with this approach
is that VecValidEntries() in MatMult() and PCApply() will throw an error
before this can work, so I'm trying to think about
good ways of turning it off.  Any ideas about how to do this?

Incidentally, I realized that I don't understand how
SNESFunctionDomainError can be handled gracefully in the current
set up: it's not set or checked collectively, so there isn't a good way to
abort and retry across the whole comm, is there?


On Sun Aug 31 2014 at 10:12:53 PM Jed Brown <jed at jedbrown.org> wrote:

> Dmitry Karpeyev <karpeev at mcs.anl.gov> writes:
> > Handling this at the KSP level (I actually think the Mat level is more
> > appropriate, since the operator, not the solver, knows its domain),
> We are dynamically discovering the domain, but I don't think it's
> appropriate for Mat to refuse to evaluate any more matrix actions until
> some action is taken at the MatMFFD/SNESMF level.  Marking the Vec
> invalid is fine, but some action needs to be taken and if Derek wants
> the SNES to skip further evaluations, we need to propagate the
> information up the stack somehow.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20150219/71c60572/attachment.html>

More information about the petsc-dev mailing list