[petsc-dev] SNESMonitorVI fix: maint or master?

Barry Smith bsmith at mcs.anl.gov
Wed Oct 7 18:23:33 CDT 2015


> On Oct 7, 2015, at 6:01 PM, Dmitry Karpeyev <karpeev at mcs.anl.gov> wrote:
> 
> Well, there is a RELAP7 (MOOSE-based code) that encounters that problem:
> the equation of state returns a NaN that gets handled correctly (retried with a 
> smaller timestep), but once we turn on -snes_vi_monitor, "Cannot get here" 
> is thrown.  

   Yeah that is bad.

> Apparently, the monitor is called before divergence is declared.
> 
> I don't think it's meaningless crap: it can tell the user how many NaNs there are,
> which can give them an idea of how many mesh points stay into the nonphysical
> regime.  If not there, it should be counted somewhere.  In any event, the code
> shouldn't die with a PLIB error in this case.  How should we handle it?

  I understand your point about providing useful information but I am afraid you are opening up a can of worms by wanting the to call the monitor function in this failed case. In all the other SNESSolve implementations    SNESCheckFunctionNorm(snes,fnorm); is called BEFORE anything else is done (monitor or converged test etc) so the monitor is not called on the "bad last iteration". It is just bad luck that in updating the code we forgot to put the SNESCheckFunctionNorm() into the vi solvers; it is missing in both SNESSolve_VINEWTONSSLS and SNESSolve_VINEWTONRSLS

   The correct fix is to add the SNESCheckFunctionNorm() this will prevent the current crash and should go into maint. It should go immediately after the call to    ierr  = SNESLineSearchGetNorms(snes->linesearch, &xnorm, &gnorm, &ynorm);CHKERRQ(ierr); in both routines.

  The "counting" of nonphysical points etc should not be handled by the VI monitor. It could be handled by SNESComputeFunction() perhaps.


  Barry

> 
> On Wed, Oct 7, 2015 at 4:58 PM Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> > On Oct 7, 2015, at 5:48 PM, Dmitry Karpeyev <karpeev at mcs.anl.gov> wrote:
> >
> > Now that we allow NaNs bubble up through the solver,
> > this can trip mysterious-looking errors in SNESVI:
> > PETSC_ERR_PLIB, "Can never get here"
> > is thrown from SNESMonitorVI(), because a NaN in
> > the residual can defeat all of the seemingly-exhaustive
> > if-then-else branches counting the number of active constraints.
> 
>   Shouldn't the SNESSolver already returned as "failed" before it ever gets to SNESMonitorVI in this case? Since the vector was marked with some Nan's it means that something has gone wrong already and any data in there is meaningless crap? Why count meaningless crap?
> >
> > I think the fix should be to count the number of NaNs separately
> > and report them alongside the legitimate active bounds to give
> > the user as much useful information as possible.  Since this entails
> > a substantial difference to the output format of -snes_vi_monitor,
> > should the fix go to maint or master?
> 
>   Not maint.
> 
> >
> > Dmitry.
> 




More information about the petsc-dev mailing list