[petsc-users] Problem with SNES convergence

Ling Zou Ling.Zou at inl.gov
Tue Sep 4 09:55:19 CDT 2018


If it is indeed 'zero pivot' issue, you could add an extra option to your simulation '-pc_factor_shift_type NONZERO -pc_factor_shift_amount 1.0e-10', and see if it helps.


http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot

PETSc: Documentation: FAQ - mcs.anl.gov<http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot>
www.mcs.anl.gov
Execution PETSc executables are SO big and take SO long to link. We find this annoying as well. On most machines PETSc can use shared libraries, so executables should be much smaller, run ./configure with the additional option --with-shared-libraries (this is the default).




________________________________
From: Rahul Samala <srahul_05 at yahoo.co.in>
Sent: Thursday, August 30, 2018 11:26:37 PM
To: Fande Kong; Smith, Barry F.; Ling Zou
Cc: PETSc users list
Subject: Re: [petsc-users] Problem with SNES convergence

Thank you Barry for the suggestion, I will definitely look in to TS implementation.
I have implemented my own version of reduced time step size using the SNESGetConvergedReason, but encountered the same problem after another 30 time-steps. So, I am experimenting with different set of initial conditions.
Ling, thank you for the code, I will see if my understanding is different from yours.

--
Rahul.

On Thursday, August 30, 2018, 11:25:35 PM GMT+5:30, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:




> On Aug 30, 2018, at 12:49 PM, Fande Kong <fdkong.jd at gmail.com<mailto:fdkong.jd at gmail.com>> wrote:
>
> Hi Barry,
>
> I haven't had time to look into TS so far. But it is definitely interesting. One simple question would like this: If I have a  simple loop for time steppers, and each step SNES is called. How hard to convert my code to use TS?

    You would need to look at the examples in ts/examples/tutorials but essentially you need to provide a slightly different version of FormFunction() and FormJacobian() and call TSSolve() etc instead of having your own loop.


    Barry

>
> Any suggestion? Where should I start from?
>
>
> Fande,
>
> On Thu, Aug 30, 2018 at 11:29 AM Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:
>
>    Note also that the PETSc TS component has a large variety of timesteppers with automatic adaptivity which adjust the time-step for accuracy and convergence. Depending on the exact needs of your time-stepper it might be easier in the long run to use PETSc's time-steppers rather than write your own.
>
>    Barry
>
>
> > On Aug 30, 2018, at 10:46 AM, Ling Zou <Ling.Zou at inl.gov<mailto:Ling.Zou at inl.gov>> wrote:
> >
> > Rahul, please see the logic I used to reduce time step size.
> > Hope this helps you.
> >
> > Ling
> >
> > for (timestep = 1; timestep <= N; timestep++) // loop over time steps
> > {
> >  // before trying to solve a time step, 1) it is still not successful; 2) we are not giving up; 3) haven't failed yet
> >  bool give_up = false; bool success = false; bool experienced_fail_this_time_step = false;
> >  // save solutions and old solutions into u_copy and u_old_copy, in case a time step fails and we restart from this saved solutions
> >  VecCopy(u, u_copy); VecCopy(u_old, u_old_copy);
> >
> >  while ((!give_up) && (!success)) // as long as not successful and not giving up, we solve again with smaller time step
> >  {
> >    if (time_step_size < dt_min) { give_up = true; break; } // ok, bad luck, give up due to time step size smaller than a preset value
> >    if (experienced_fail_this_time_step) { // get the vectors from backups if this is a re-try, i.e., already failed with a larger time step
> >      VecCopy(u_old_copy, u); VecCopy(u_old_copy, u_old);
> >    }
> >
> >    try {
> >      SNESSolve(snes, NULL, u);
> >      SNESGetConvergedReason(snes, &snes_converged_reason);
> >
> >      if (snes_converged_reason > 0)  success = true; // yes, snes converged
> >      else { // no, snes did not converge
> >        cutTimeStepSize(); // e.g., dt / 2
> >        experienced_fail_this_time_step = true;
> >      }
> >    }
> >    catch (int err) { // in case your own pieces of code throws an exception
> >      std::cout << "An exception occurred." << std::endl;
> >      success = false;
> >      cutTimeStepSize(); // e.g., dt / 2
> >      experienced_fail_this_time_step = true;
> >    }
> >  }
> >
> >  if (success) {
> >    // output, print, whatever
> >    // duplicate current solution to old solution in preparing next time step
> >    VecCopy(u, u_old);
> >    // you can increase time step size here, e.g. * 2
> >    increaseTimeStepSize();
> >  }
> >
> >  if (give_up) {
> >    simulationFailed = true;
> >    std::cerr << "Simulation failed.\n";
> >    //exit(1);// dont exit(1) now, just break the for-loop, let PETSc clean its workspace.
> >    break;
> >  }
> > }
> >
> > From: Rahul Samala <srahul_05 at yahoo.co.in<mailto:srahul_05 at yahoo.co.in>>
> > Sent: Wednesday, August 29, 2018 10:37:30 PM
> > To: Ling Zou; Smith, Barry F.
> > Cc: PETSc Users List
> > Subject: Re: [petsc-users] Problem with SNES convergence
> >
> > Thank you Ling, I would definitely like to look at your code for reducing timestep size.
> > Thank you Barry for your inputs.
> >
> > --
> > Rahul.
> >
> > On Wednesday, August 29, 2018, 9:02:00 PM GMT+5:30, Smith, Barry F. <bsmith at mcs.anl.gov<mailto:bsmith at mcs.anl.gov>> wrote:
> >
> >
> >
> > Current time (before start of timestep) 52.5048,    iter=5380    Timestep=864.000000
> >  0 SNES Function norm 1.650467412595e+05
> >    0 KSP preconditioned resid norm 3.979123221160e+03 true resid norm 1.650467412595e+05 ||r(i)||/||b|| 1.000000000000e+00
> >    1 KSP preconditioned resid norm 9.178246525982e-11 true resid norm 7.006473307032e-09 ||r(i)||/||b|| 4.245144892632e-14
> >  Linear solve converged due to CONVERGED_RTOL iterations 1
> >  1 SNES Function norm 6.722712947273e+02
> >  Linear solve did not converge due to DIVERGED_NANORINF iterations 0
> > Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 1
> >
> >    This usually is an indicator that the LU (ILU) factorization has hit a zero pivot (hence the linear solve has a divide by zero so gives the DIVERGED_NANORINF flag).
> >
> >    You can/should call SNESGetConvergedReason() immediately after each SNESSolve(), if the result is negative that means something has failed in the nonlinear solve and you can try cutting the time-step and trying again.
> >
> >    Good luck,
> >
> >    Barry
> >
> >
> > > On Aug 29, 2018, at 10:11 AM, Ling Zou <Ling.Zou at inl.gov<mailto:Ling.Zou at inl.gov>> wrote:
> > >
> > > 1) My experience is that this kind of bug or sudden death (everything is fine till suddenly something is broken) is very difficult to debug/fix. I looked at your txt files and could not give any quick comments. Maybe PETSc developers have better idea on this.
> > > 2) I do have successful experience on reducing time step size when PETSc fails solving or my own piece of code throws an exception. If you are interested, I can share them.
> > >
> > > -Ling
> > > From: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of Rahul Samala <srahul_05 at yahoo.co.in<mailto:srahul_05 at yahoo.co.in>>
> > > Sent: Wednesday, August 29, 2018 8:36:58 AM
> > > To: PETSc Users List
> > > Subject: [petsc-users] Problem with SNES convergence
> > >
> > > Hello PetSc users,
> > >
> > > 1)  I have problem with SNES convergence. I call SNESSolve  in a time loop and use the inbuilt Jacobian feature. The code works fine for about 5380 time steps after which it breaks down. The solution till that point looks fine. I have used newtonls of type l2. (newtontr and others aren't working). Since I have used inbuilt Jacobian feature and the code worked for about 5000 time steps I don't understand the reason for failure, is it an incorrect function evaluation?  Attached are the outputs with -pc_type lu and ilu along with -snes_linesearch_type l2 -snes_converged_reason -snes_monitor -snes_view -ksp_converged_reason -ksp_monitor_true_residual
> > >
> > > 2)  How to get hold of failure signal, like Nonlinear solve DIVERGED_MAX_IT or DIVERGED_LINEAR_SOLVE so that whenever it occurs I can use a reduced time step and see if the code converges.
> > >
> > > Thank you,
> > > Rahul.
> > >
> > > output_ilu.txt
> > >
> > >
> > > output_ilu.txt
> > >
> > >
> > >
> > > output_lu.txt
> > >
> > >
> > > output_lu.txt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180904/8dc06069/attachment-0001.html>


More information about the petsc-users mailing list