[petsc-dev] handling user domain errors

Barry Smith bsmith at mcs.anl.gov
Wed Apr 29 22:11:54 CDT 2015


  Indeed you proposed the exact thing. I would be happy if you tried to make a branch of master that used this approach.

  Barry

> On Apr 29, 2015, at 9:28 PM, Dmitry Karpeyev <dkarpeev at gmail.com> wrote:
> 
> Barry,
> Sorry, I must have missed this -- I really ought to make a better filter for catching email like this.
> I think using NaNs is an excellent solution, in fact, I was proposing it a few months ago here :-)
> http://lists.mcs.anl.gov/pipermail/petsc-dev/2015-February/016958.html
> It ensures that the error is collective (the norm reduction will ensure every rank gets a NaN), 
> the "error condition" is cleared automatically on the next MatMult, etc.
> I'm all for it.
> Should I put it in?
> 
> Dmitry.
> 
> On Wed, Apr 29, 2015 at 8:26 PM Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>   Dmitry,
> 
>     I haven't heard back from you on this. Any thoughts?
> 
>   Barry
> 
> > On Apr 20, 2015, at 6:23 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >
> >  Dmitry,
> >
> >   Rather than introducing another whole complexity of flags for indicating domain errors in user functions just do the following.
> >
> >   1) just stick a Nan into the functions result
> >   2) remove the VecValidValues() at the END of routines like MatMult()
> >   3) when Nan or Inf pop up in Krylov methods (which will happen within VecNorm or VecDot() and thus we get free collective knowledge of the problem even if it happened on only one node), generate the appropriate KSP_DIVERGED_NANORINF. This is already handled sometimes (most of the time?), for example in KSPSolve_CG is code
> > ierr = VecXDot(Z,R,&beta);CHKERRQ(ierr);         /*  beta <- z'*r       */
> >    if (PetscIsInfOrNanScalar(beta)) {
> >      if (ksp->errorifnotconverged) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_NOT_CONVERGED,"KSPSolve has not converged due to Nan or Inf inner product");
> >      else {
> >        ksp->reason = KSP_DIVERGED_NANORINF;
> >        PetscFunctionReturn(0);
> >      }
> >    }
> >
> >   4) SNES already handles failed to converge KSP and
> >   5 ) TS already handles failed to converged SNES; by, for example, cutting the timestep.
> >
> >  Barry
> >
> >
> 




More information about the petsc-dev mailing list