<div dir="ltr"><br><br><div class="gmail_quote">On Thu Feb 19 2015 at 12:41:59 PM Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

All the Krylov methods are supposed to use the routines KSP_MatMult(), KSP_PCApply() etc and not the MatMult(), PCApply() etc directly (I'm fixing some that don't as we speak).<br>

<br>

I propose that these KSP_xxx()  routines have code like<br>

<br>

     ....<br>

     MatGetFailure(mat,&failure);<br>

     if (failure) {<br>

        if (ksp->errorifnotconverged) SETERRQ()<br>

        else   ksp->convergedreason = KSP_DIVERGED_MATMULT_FAILURE<br>

    }<br>

    and similar for PC failure.<br>

<br>

    Then all the Krylov solvers have simple checks:  if (ksp->converged reason) PetscFunctionReturn(0); after eachl KSP_xxx() call.<br>

<br>

   Meanwhile<br>

<br>

   PetscErrorCode MatSetFailure(Mat mat)<br>

      mat->failureset = PETSC_TRUE;<br>

<br>

   and<br>

<br>

    PetscErrorCode MatGetFailure(Mat mat,PetscBool *failure)<br>

      *failure = mat->failureset;<br>

       mat->failureset = PETSC_FALSE;<br>

<br>

    Similar for PC.<br>

<br>

    Making these changes in the code  is straightforward. Does it handle everything we need?<br></blockquote><div>Yeah, that sounds like a good fix, except for this: we have to make sure all ranks diverge with this failure so that the user can retry the solve, if necessary.  That would require an extra reduction every KSP iteration.</div><div>With Inf or NaN we could piggyback on the norm computation.</div><div><br></div><div>Dmitry.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

   Barry<br>

<br>

<br>

<br>

<br>

> On Feb 19, 2015, at 9:33 AM, Dmitry Karpeyev <<a href="mailto:karpeev@mcs.anl.gov" target="_blank">karpeev@mcs.anl.gov</a>> wrote:<br>

><br>

> I wanted to revive this thread and move it to petsc-dev. This problem seems to be harder than I realized.<br>

><br>

> Suppose MatMult inside KSPSolve() inside SNESSolve() cannot compute a valid output vector.<br>

> For example, it's a MatMFFD and as part of its function evaluation it  has to evaluate an implicitly-defined<br>

> constitutive model (e.g., solve an equation of state) and this inner solve diverges<br>

> (e.g., the time step is too big).  I want to be able to abort the linear<br>

> solve and the nonlinear solve, return a suitable "converged" reason and let the user retry, maybe with a<br>

> different timestep size.  This is for a hand-rolled time stepper, but TS would face similar issues.<br>

><br>

> Based on the previous thread here <a href="http://lists.mcs.anl.gov/pipermail/petsc-users/2014-August/022597.html" target="_blank">http://lists.mcs.anl.gov/<u></u>pipermail/petsc-users/2014-<u></u>August/022597.html</a><br>

> I tried marking the result of MatMult as "invalid" and let it propagate up to KSPSolve() where it can be handled.<br>

> This quickly gets out of control, since the invalid Vec isn't returned to the KSP immediately.  It could be a work<br>

> vector, which is fed into PCApply() along various code paths, depending on the side of the preconditioner, whether it's a<br>

> transpose solve, etc.  Each of these transformations (e.g., PCApply()) would then have to check the validity of<br>

> the input argument, clear its error condition and set it on the output argument, etc.  Very error-prone and fragile.<br>

> Not to mention the large amount of code to sift through.<br>

><br>

> This is a general problem of exception handling -- we want to "unwind" the stack to the point where the problem should<br>

> be handled, but there doesn't seem to a good way to do it.  We also want to be able to clear all of the error conditions<br>

> on the way up (e.g., mark vectors as valid again, but not too early), otherwise we leave the solver in an invalid state.<br>

><br>

><br>

> Instead of passing an exception condition up the stack I could try storing that condition in one of the more globally-visible<br>

> objects (e.g., the Mat), but if the error occurs inside the evaluation of the residual that's being differenced, it doesn't really<br>

> have access to the Mat.  This probably raises various thread safety issues as well.<br>

><br>

> Using SNESSetFunctionDomainError() doesn't seem to be a solution: a MatMFFD created with MatCreateSNESMF()<br>

> has a pointer to SNES, but the function evaluation code actually has no clue about that. More generally, I don't<br>

> know whether we want to wait for the linear solve to complete before handling this exception: it is unnecessary,<br>

> it might be an expensive linear solve and the result of such a KSPSolve() is probably undefined and might blow up in<br>

> unexpected ways.  I suppose if there is a way to get a hold of SNES, each subsequent MatMult_MFFD has to check<br>

> whether the domain error is set and return early in that case?  We would still have to wait for the linear solve to grind<br>

> through the rest of its iterations.    I don't know, however, if there is a good way to guarantee that linear solver will get<br>

> through this quickly and without unintended consequences. Should MatMFFD also get a hold of the KSP and set a flag<br>

> there to abort?  I still don't know what the intervening code (e.g., the various PCApply()) will do before the KSP has a<br>

> chance to deal with this.<br>

><br>

> I'm now thinking that setting some vector entries to NaN might be a good solution: I hope this NaN will propagate all the<br>

> way up through the subsequent arithmetic operations (does the IEEE floating-point arithmetic guarantees?), this "error<br>

> condition" gets automatically cleared the next time the vector is recomputed, since its values are reset.  Finally, I want<br>

> this exception to be detected globally but without incurring an extra reduction every time the residual is evaluated,<br>

> and NaN will be show up in the norm that (most) KSPs would compute anyway.  That way KSP could diverge with a<br>

> KSP_DIVERGED_NAN or a similar reason and the user would have an option to retry.  The problem with this approach<br>

> is that VecValidEntries() in MatMult() and PCApply() will throw an error before this can work, so I'm trying to think about<br>

> good ways of turning it off.  Any ideas about how to do this?<br>

><br>

> Incidentally, I realized that I don't understand how SNESFunctionDomainError can be handled gracefully in the current<br>

> set up: it's not set or checked collectively, so there isn't a good way to abort and retry across the whole comm, is there?<br>

><br>

> Dmitry.<br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

><br>

> On Sun Aug 31 2014 at 10:12:53 PM Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>> wrote:<br>

> Dmitry Karpeyev <<a href="mailto:karpeev@mcs.anl.gov" target="_blank">karpeev@mcs.anl.gov</a>> writes:<br>

><br>

> > Handling this at the KSP level (I actually think the Mat level is more<br>

> > appropriate, since the operator, not the solver, knows its domain),<br>

><br>

> We are dynamically discovering the domain, but I don't think it's<br>

> appropriate for Mat to refuse to evaluate any more matrix actions until<br>

> some action is taken at the MatMFFD/SNESMF level.  Marking the Vec<br>

> invalid is fine, but some action needs to be taken and if Derek wants<br>

> the SNES to skip further evaluations, we need to propagate the<br>

> information up the stack somehow.<br>

<br>

</blockquote></div></div>