[petsc-dev] Recover from TS failure

Mark Adams mfadams at lbl.gov
Wed Nov 1 16:54:22 CDT 2017


I am trying to recover from a failed time step with this code. But PETSc
does a hard abort with this code:

 } else {
     /* do not print error messages since process 0 will print them, sleep
before aborting so will not accidently kill process 0*/
     PetscSleep(10.0);
     abort();
   }

Can I get TS to not abort and return?

  do {
    TSConvergedReason reason;
    ierr = TSGetTime(ts, &old_time);CHKERRQ(ierr);
    ierr = TSSolve(ts, u);
    if (!ierr) break;
    ierr = TSGetConvergedReason(ts,&reason);CHKERRQ(ierr);
    if (reason==TS_DIVERGED_NONLINEAR_SOLVE) { /* recover */
      PetscReal dt;
      ierr = TSGetTime(ts, &t);CHKERRQ(ierr);
      if (t == old_time && failed) {
PetscPrintf(PETSC_COMM_WORLD,"Failed to recover\n");
break; /* failed */
      } else {
old_time = t;
failed = PETSC_TRUE;
      }
      ierr = TSGetTimeStep(ts, &dt);CHKERRQ(ierr);
      ierr = PetscPrintf(PETSC_COMM_WORLD,"recover from %s, reason=%d
time=%g, dt=%g --> ",TSConvergedReasons[reason],reason,t,dt);CHKERRQ(ierr);
      dt = dt/ctx.dt_reduction_factor;
      ierr = PetscPrintf(PETSC_COMM_WORLD,"%g\n",dt);CHKERRQ(ierr);
      ierr = TSSetTimeStep(ts, dt);CHKERRQ(ierr);
      ierr = VecCopy(ctx.u0,u);CHKERRQ(ierr); /* recover state */
    } else {
      SETERRQ1(PETSC_COMM_WORLD,PETSC_ERR_ARG_WRONG,"Unhandled error
%s",TSConvergedReasons[reason]);
    }

On Wed, Nov 1, 2017 at 2:45 PM, Mark Adams <mfadams at lbl.gov> wrote:

>
>
> On Wed, Nov 1, 2017 at 2:34 PM, Jed Brown <jed at jedbrown.org> wrote:
>
>> Mark Adams <mfadams at lbl.gov> writes:
>>
>> > Oh, Maybe the Jacobian has Nans or Infs even though the last time step
>> > survived. Maybe it was going crazy. I'll check
>>
>> If that is the case you would use TSSetFunctionDomainError().
>>
>
> Ah, very cool. Good to know. That does not seem to be the problem. Not
> sure what would cause this linear solver error. I'll try different solvers
> and see if I get any hints,
> Thanks again,
>
>
>>
>> > On Wed, Nov 1, 2017 at 2:13 PM, Mark Adams <mfadams at lbl.gov> wrote:
>> >
>> >> Yea, I don't understand the linear solve error:
>> >>
>> >> -ts_monitor -ts_type beuler -pc_type lu -pc_factor_mat_solver_package
>> >> mumps -ksp_type preonly -snes_monitor -snes_rtol 1.e-10 -snes_stol
>> 1.e-10
>> >> -snes_converged_reason -snes_atol 1.e-18 -snes_converged_reason
>> >>
>> >> Maybe mumps?
>> >>
>> >> On Wed, Nov 1, 2017 at 2:02 PM, Jed Brown <jed at jedbrown.org> wrote:
>> >>
>> >>> Mark Adams <mfadams at lbl.gov> writes:
>> >>>
>> >>> > On Wed, Nov 1, 2017 at 1:46 PM, Jed Brown <jed at jedbrown.org> wrote:
>> >>> >
>> >>> >> Mark Adams <mfadams at lbl.gov> writes:
>> >>> >>
>> >>> >> > I have added some code in a TS post step method to look at the
>> >>> number of
>> >>> >> > nonlinear iterations and cut the time step if it took too many
>> SNES
>> >>> >> > iterations.  That helped but now I want to go one step further
>> and
>> >>> >> recover
>> >>> >> > from a failed time step (see appended error message).
>> >>> >> >
>> >>> >> > How can/should I recover from a SNES failure inside of TS? I
>> could
>> >>> just
>> >>> >> put
>> >>> >> > TSSolve in a loop and if TS failed because of the solver, then
>> >>> reduce the
>> >>> >> > time step and try to recover the state, and try again.
>> >>> >> >
>> >>> >> > Thoughts?
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> > Mark
>> >>> >> >
>> >>> >> > Note, this is a direct solver and the linear solver is diverging?
>> >>> I've
>> >>> >> > seen the same thing with the maximum its in SNES and failed
>> >>> linesearch.
>> >>> >> >
>> >>> >> > 23 TS dt 0.0025 time 0.1025
>> >>> >> >     0 SNES Function norm 9.997473571601e+04
>> >>> >> >     1 SNES Function norm 2.295851692533e-01
>> >>> >> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> >> iterations 1
>> >>> >> > [0]PETSC ERROR: --------------------- Error Message
>> >>> >> > --------------------------------------------------------------
>> >>> >> > [0]PETSC ERROR:
>> >>> >> > [0]PETSC ERROR: TSStep has failed due to
>> DIVERGED_NONLINEAR_SOLVE,
>> >>> >> increase
>> >>> >> > -ts_max_snes_failures or make negative to attempt recovery
>> >>> >>
>> >>> >> Did you increase -ts_max_snes_failures (TSSetMaxSNESFailures)?
>> >>> >>
>> >>> >
>> >>> > Ah, read the error message :) with a negative value I get this.
>> >>> >
>> >>> > TS can not, or is not going to try, to reconstruct my state and
>> reduce
>> >>> the
>> >>> > time step. So I think I need to do this myself.
>> >>>
>> >>> I don't understand.  Why is the linear solve diverging
>> >>> (-ksp_converged_reason)?  What TS are you using?
>> >>>
>> >>> > I guess just making loop and testing in the TSConvergedReason, is
>> the
>> >>> way
>> >>> > to go. We already do something like this for mesh adpativity.
>> >>> >
>> >>> > Thanks,
>> >>> >
>> >>> > 23 TS dt 0.0025 time 0.1025
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >     1 SNES Function norm 2.295851692533e-01
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 1
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> >     0 SNES Function norm 9.997473571601e+04
>> >>> >   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE
>> >>> iterations 0
>> >>> > [0]PETSC ERROR: --------------------- Error Message
>> >>> > --------------------------------------------------------------
>> >>> > [0]PETSC ERROR:
>> >>> > [0]PETSC ERROR: TSStep has failed due to DIVERGED_STEP_REJECTED
>> >>> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d
>> ocumentation/faq.html
>> >>> for
>> >>> > trouble shooting.
>> >>> > [0]PETSC ERROR: Petsc Development GIT revision:
>> v3.8-53-ga2d93b2d43  GIT
>> >>> > Date: 2017-10-09 12:56:49 -0400
>> >>> > [0]PETSC ERROR: ./ex48 on a arch-macosx-gnu-O named
>> MarksMac-5.local by
>> >>> > markadams Wed Nov  1 13:51:24 2017
>> >>> > [0]PETSC ERROR: Configure options --with-cc=clang
>> --with-cc++=clang++
>> >>> > COPTFLAGS="-O2 -g -mavx2" CXXOPTFLAGS="-O2 -g -mavx2"
>> FOPTFLAGS="-O2 -g
>> >>> > -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1
>> >>> > --download-hypre=1 --download-ml=1 --download-triangle=1
>> >>> > --download-ctetgen=1 --download-p4est=1 --download-fftw
>> >>> --download-mumps=1
>> >>> > --download-scalapack=1 --with-x=0 --download-superlu_dist
>> >>> > --download-superlu --download-ctetgen --with-debugging=0
>> >>> --download-hdf5=1
>> >>> > PETSC_ARCH=arch-macosx-gnu-O --download-chaco
>> --with-viewfromoptions=1
>> >>> > [0]PETSC ERROR: #1 TSStep() line 4130 in
>> >>> > /Users/markadams/Codes/petsc/src/ts/interface/ts.c
>> >>> > [0]PETSC ERROR: #2 TSSolve() line 4374 in
>> >>> > /Users/markadams/Codes/petsc/src/ts/interface/ts.c
>> >>> > *********** DIVERGED_STEP_REJECTED ierr=91 reason=-2
>> >>> > L_2 Diff: 1.32241e+07 at time 0.1025
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >>
>> >>> >> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d
>> >>> ocumentation/faq.html
>> >>> >> for
>> >>> >> > trouble shooting.
>> >>> >> > [0]PETSC ERROR: Petsc Development GIT revision:
>> v3.8-53-ga2d93b2d43
>> >>> GIT
>> >>> >> > Date: 2017-10-09 12:56:49 -0400
>> >>> >> > [0]PETSC ERROR: ./ex48 on a arch-macosx-gnu-O named
>> MarksMac-5.local
>> >>> by
>> >>> >> > markadams Wed Nov  1 13:17:15 2017
>> >>> >> > [0]PETSC ERROR: Configure options --with-cc=clang
>> --with-cc++=clang++
>> >>> >> > COPTFLAGS="-O2 -g -mavx2" CXXOPTFLAGS="-O2 -g -mavx2"
>> FOPTFLAGS="-O2
>> >>> -g
>> >>> >> > -mavx2" --download-mpich=1 --download-parmetis=1
>> --download-metis=1
>> >>> >> > --download-hypre=1 --download-ml=1 --download-triangle=1
>> >>> >> > --download-ctetgen=1 --download-p4est=1 --download-fftw
>> >>> >> --download-mumps=1
>> >>> >> > --download-scalapack=1 --with-x=0 --download-superlu_dist
>> >>> >> > --download-superlu --download-ctetgen --with-debugging=0
>> >>> >> --download-hdf5=1
>> >>> >> > PETSC_ARCH=arch-macosx-gnu-O --download-chaco
>> >>> --with-viewfromoptions=1
>> >>> >> > [0]PETSC ERROR: #1 TSStep() line 4129 in
>> >>> >> > /Users/markadams/Codes/petsc/src/ts/interface/ts.c
>> >>> >> > [0]PETSC ERROR: #2 TSSolve() line 4374 in
>> >>> >> > /Users/markadams/Codes/petsc/src/ts/interface/ts.c
>> >>> >> > [0]PETSC ERROR: #3 main() line 1161 in
>> /Users/markadams/Codes/mhd/
>> >>> >> src/ex48.c
>> >>> >>
>> >>>
>> >>
>> >>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20171101/adeaa3fb/attachment-0001.html>


More information about the petsc-dev mailing list