[petsc-users] Optimizing solver and consistent converging on non-linear solver

Smith, Barry F. bsmith at mcs.anl.gov
Tue Jul 16 18:21:22 CDT 2019


  Ahh, run with also --ksp_monitor -ksp_converged_reason -ksp_monitor_singular_value and send the new output.


  For smallish problems for debugging purposes you can run with -pc_type lu to force a direct solve of the linear system and see what happens (if the linear problems are solvable at all).


> On Jul 16, 2019, at 6:11 PM, Sean Hsu <hsu at kairospower.com> wrote:
> 
> Hi Barry,
> 
> I'm using a default time stepper from MOOSE, which is implicit-euler method. Here is the output from one of the examples where the timestep gets really small, I couldn't find a lot of information regarding DIVERGED LINEAR_SOLVE, thanks for your help:
> 
> Time Step 1, time = 0.1, dt = 0.1
> 0 Nonlinear |R| = 4.012105e-03
>  0 SNES Function norm 4.012104835449e-03
>      0 Linear |R| = 4.012105e-03
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 0.05, dt = 0.05
> 0 Nonlinear |R| = 2.005850e-03
>  0 SNES Function norm 2.005849521314e-03
>      0 Linear |R| = 2.005850e-03
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 0.025, dt = 0.025
> 0 Nonlinear |R| = 1.002874e-03
>  0 SNES Function norm 1.002873988124e-03
>      0 Linear |R| = 1.002874e-03
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 0.0125, dt = 0.0125
> 0 Nonlinear |R| = 5.014243e-04
>  0 SNES Function norm 5.014242948706e-04
>      0 Linear |R| = 5.014243e-04
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 0.00625, dt = 0.00625
> 0 Nonlinear |R| = 2.507090e-04
>  0 SNES Function norm 2.507089718805e-04
>      0 Linear |R| = 2.507090e-04
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 0.003125, dt = 0.003125
> 0 Nonlinear |R| = 1.253537e-04
>  0 SNES Function norm 1.253536919567e-04
>      0 Linear |R| = 1.253537e-04
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 0.0015625, dt = 0.0015625
> 0 Nonlinear |R| = 6.267665e-05
>  0 SNES Function norm 6.267664747058e-05
>      0 Linear |R| = 6.267665e-05
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 0.00078125, dt = 0.00078125
> 0 Nonlinear |R| = 3.133827e-05
>  0 SNES Function norm 3.133827410694e-05
>      0 Linear |R| = 3.133827e-05
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 0.000390625, dt = 0.000390625
> 0 Nonlinear |R| = 1.566912e-05
>  0 SNES Function norm 1.566912464487e-05
>      0 Linear |R| = 1.566912e-05
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> Time Step 1, time = 0.000195313, dt = 0.000195313
> 0 Nonlinear |R| = 7.834559e-06
>  0 SNES Function norm 7.834559222250e-06
>      0 Linear |R| = 7.834559e-06
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 9.76563e-05, dt = 9.76563e-05
> 0 Nonlinear |R| = 3.917279e-06
>  0 SNES Function norm 3.917278835540e-06
>      0 Linear |R| = 3.917279e-06
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> Time Step 1, time = 4.88281e-05, dt = 4.88281e-05
> 0 Nonlinear |R| = 1.958639e-06
>  0 SNES Function norm 1.958639223338e-06
>      0 Linear |R| = 1.958639e-06
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> Time Step 1, time = 2.44141e-05, dt = 2.44141e-05
> 0 Nonlinear |R| = 9.793196e-07
>  0 SNES Function norm 9.793195635958e-07
>      0 Linear |R| = 9.793196e-07
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 1.2207e-05, dt = 1.2207e-05
> 0 Nonlinear |R| = 4.896598e-07
>  0 SNES Function norm 4.896597688439e-07
>      0 Linear |R| = 4.896598e-07
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 6.10352e-06, dt = 6.10352e-06
> 0 Nonlinear |R| = 2.448299e-07
>  0 SNES Function norm 2.448298810498e-07
>      0 Linear |R| = 2.448299e-07
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> Time Step 1, time = 3.05176e-06, dt = 3.05176e-06
> 0 Nonlinear |R| = 1.224149e-07
>  0 SNES Function norm 1.224149403502e-07
>      0 Linear |R| = 1.224149e-07
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 1.52588e-06, dt = 1.52588e-06
> 0 Nonlinear |R| = 6.120747e-08
>  0 SNES Function norm 6.120747013139e-08
>      0 Linear |R| = 6.120747e-08
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 7.62939e-07, dt = 7.62939e-07
> 0 Nonlinear |R| = 3.060373e-08
>  0 SNES Function norm 3.060373492112e-08
>      0 Linear |R| = 3.060373e-08
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> Time Step 1, time = 3.8147e-07, dt = 3.8147e-07
> 0 Nonlinear |R| = 1.530187e-08
>  0 SNES Function norm 1.530186849366e-08
>      0 Linear |R| = 1.530187e-08
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 1.90735e-07, dt = 1.90735e-07
> 0 Nonlinear |R| = 7.650934e-09
>  0 SNES Function norm 7.650934237796e-09
>      0 Linear |R| = 7.650934e-09
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 9.53674e-08, dt = 9.53674e-08
> 0 Nonlinear |R| = 3.825467e-09
>  0 SNES Function norm 3.825467116639e-09
>      0 Linear |R| = 3.825467e-09
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 4.76837e-08, dt = 4.76837e-08
> 0 Nonlinear |R| = 1.912734e-09
>  0 SNES Function norm 1.912733557755e-09
>      0 Linear |R| = 1.912734e-09
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 2.38419e-08, dt = 2.38419e-08
> 0 Nonlinear |R| = 9.563657e-10
>  0 SNES Function norm 9.563657094854e-10
>      0 Linear |R| = 9.563657e-10
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 1.19209e-08, dt = 1.19209e-08
> 0 Nonlinear |R| = 4.781839e-10
>  0 SNES Function norm 4.781839239582e-10
>      0 Linear |R| = 4.781839e-10
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 5.96046e-09, dt = 5.96046e-09
> 0 Nonlinear |R| = 2.390909e-10
>  0 SNES Function norm 2.390908927195e-10
>      0 Linear |R| = 2.390909e-10
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> Time Step 1, time = 2.98023e-09, dt = 2.98023e-09
> 0 Nonlinear |R| = 1.195465e-10
>  0 SNES Function norm 1.195465156083e-10
>      0 Linear |R| = 1.195465e-10
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Solve Did NOT Converge!
> 
> Time Step 1, time = 1.49012e-09, dt = 1.49012e-09
> 0 Nonlinear |R| = 5.977326e-11
>  0 SNES Function norm 5.977325780361e-11
> Nonlinear solve converged due to CONVERGED_FNORM_ABS iterations 0
> Solve Converged!
> Outlier Variable Residual Norms:
>  disp_z: 5.907986e-11
> 
> Best,
> 
> Sean
> 
> 
> 
> 
> <https://kairospower.com/>
> Sean Hsu   Mechanics of Materials Intern
> p 510.808.5265   e hsu at kairospower.com <mailto:user at kairospower.com>
> 707 W Tower Ave, Alameda, CA 94501
> www.kairospower.com <http://www.kairospower.com>    <https://www.linkedin.com/company/kairos-power-llc/>
> 
> 
> 
> On 7/16/19, 3:52 PM, "Smith, Barry F." <bsmith at mcs.anl.gov> wrote:
> 
> 
>      What time stepper are you using, something in PETSc or your own? If PETSc send -ts_view. If PETSc you can run with -ts_monitor -ts_adapt_monitor to see why it is making the tilmestep choices it is making. 
> 
>       Unfortunately I don't think MOOSE uses PETSc time-steppers so you will need to consult with the MOOSE team on how they choose time-steps, how to monitor them and determine why it is selecting such small values.
> 
>       You can run a case that starts to have small time steps with -snes_monitor -snes_converged_reason and send the output. At least this will give some information on how the nonlinear solver is doing. And maybe we can make suggestions on how to proceed.
> 
> 
>       Barry
> 
> 
>> On Jul 16, 2019, at 3:02 PM, Sean Hsu <hsu at kairospower.com> wrote:
>> 
>> Hi Barry,
>> 
>> I think one of the reasons that causes this is not prescribing enough boundary conditions in this case, resulting nan values during calculations. The linear solver issue doesn't persist anymore when I added in enough boundary conditions, as the linear solver doesn't have issue solving the system. Now my issue lies in dt dropping too low during solving as the solver failed due to timestep dropping below dtmin (2e-14). Do you have any insight into this? Again, thanks a lot for your help!
>> 
>> Best,
>> 
>> Sean
>> 
>> 
>> 
>> 
>> <https://kairospower.com/>
>> Sean Hsu   Mechanics of Materials Intern
>> p 510.808.5265   e hsu at kairospower.com <mailto:user at kairospower.com>
>> 707 W Tower Ave, Alameda, CA 94501
>> www.kairospower.com <http://www.kairospower.com>    <https://www.linkedin.com/company/kairos-power-llc/>
>> 
>> 
>> 
>> On 7/15/19, 5:26 PM, "Smith, Barry F." <bsmith at mcs.anl.gov> wrote:
>> 
>> 
>>     What is causing the Inf or Nan in the function 
>> 
>>     1) determine if it is inf or Nan because that can mean different things
>> 
>>     2) determine what the physical meaning of the inf or Nan is, that is physically what is going wrong to cause it.
>> 
>>     I would back off from worrying about convergence rates a bit and instead worry about resolving this issue first because it shows something is really funky with the modeling aspect. That changing the solution by the 14th digit can cause it to go from 1.e-2 to infinite or Nan is problematic.
>> 
>> 
>>      Barry
>> 
>> 
>> 
>>> On Jul 15, 2019, at 7:10 PM, Sean Hsu <hsu at kairospower.com> wrote:
>>> 
>>> Hi Barry,
>>> 
>>> Thanks for the quick response. 
>>> 
>>> This behavior shows up occasionally on coarser meshes, I don't believe I have non-physical constraints, the mesh doesn't change in the simulation and it is simply a simulation with a fix boundary on one side and a constant strain on the other side of a plate. 
>>> 
>>> I also don't have bounds on my solution.
>>> 
>>> Usually when the line search fails, decrease time step size can usually yield converged results, but if the time step size gets too small the simulation takes a very long time to run. Please let me know if you need any more information regarding this particular issue, I will be more than happy to provide you details of my simulation. Thanks!
>>> 
>>> Best,
>>> 
>>> Sean
>>> 
>>> 
>>> 
>>> <https://kairospower.com/>
>>> Sean Hsu   Mechanics of Materials Intern
>>> p 510.808.5265   e hsu at kairospower.com <mailto:user at kairospower.com>
>>> 707 W Tower Ave, Alameda, CA 94501
>>> www.kairospower.com <http://www.kairospower.com>    <https://www.linkedin.com/company/kairos-power-llc/>
>>> 
>>> 
>>> 
>>> On 7/15/19, 5:00 PM, "Smith, Barry F." <bsmith at mcs.anl.gov> wrote:
>>> 
>>> 
>>>    Function norm was  1.078014e-02
>>>    Linear solve is great, 
>>> 
>>>> Line search: objective function at lambdas = 1. is Inf or Nan, cutting lambda .... 
>>>  ...
>>>> Line search: objective function at lambdas = 9.09495e-13 is Inf or Nan, cutting lambda
>>> 
>>>     You take a microscopically small step in space and your objective function is Inf or Nan. 
>>> 
>>>     Do you have constraints on your solution where it becomes non-physical? For example divide by an element of the solution (that would require the solution not be zero)? Is the initial solution on a constraint boundary? 
>>> 
>>>     If you use a coarser mesh do you get this same type of behavior? 
>>> 
>>>     If you have bounds on your solution you might need to consider solving it as a differential variational inequality (DVI)
>>> 
>>>     Barry
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Jul 15, 2019, at 6:44 PM, Sean Hsu via petsc-users <petsc-users at mcs.anl.gov> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I have been using blackbear (a moose framework app) to do a simple tensile test simulation on a complex material model with high amount of elements. I was able to get consistent and quick solution results with low amount of elements, however once the amount of element start increasing, the solver won’t converge consistently and dt usually drops to a very low value, causing the simulation to run for a very long time. I am seeking recommendations for tips to optimize the solver so I can get consistent and faster convergence rate. Here is the petsc option that I am using (along with SMP preconditioner):
>>>> 
>>>> l_max_its = 15
>>>> l_tol = 1e-8
>>>> nl_max_its = 50
>>>> nl_rel_tol = 1e-7
>>>> nl_abs_tol = 1e-9
>>>> petsc_options = '-snes_ksp_ew'
>>>> petsc_options_iname = '-pc_type -snes_linesearch_type'
>>>> petsc_options_value = 'lu bt'
>>>> end_time = 50.0
>>>> dt = 0.5
>>>> 
>>>> Here is a example output from the first few timesteps of the simulation:
>>>> 
>>>> Time Step 2, time = 0.75, dt = 0.5
>>>> 0 Nonlinear |R| = 1.078014e-02
>>>> 0 SNES Function norm 1.078014340559e-02
>>>>    0 Linear |R| = 1.078014e-02
>>>>  0 KSP unpreconditioned resid norm 1.078014340559e-02 true resid norm 1.078014340559e-02 ||r(i)||/||b|| 1.000000000000e+00
>>>>    1 Linear |R| = 2.319831e-13
>>>>  1 KSP unpreconditioned resid norm 2.319831277078e-13 true resid norm 2.255163534674e-13 ||r(i)||/||b|| 2.091960607412e-11
>>>> Linear solve converged due to CONVERGED_RTOL iterations 1
>>>> NEML stress update failed!
>>>>    Line search: objective function at lambdas = 1. is Inf or Nan, cutting lambda
>>>> NEML stress update failed!
>>>>    Line search: objective function at lambdas = 0.5 is Inf or Nan, cutting lambda
>>>> NEML stress update failed!
>>>>    Line search: objective function at lambdas = 0.25 is Inf or Nan, cutting lambda
>>>> NEML stress update failed!
>>>>    Line search: objective function at lambdas = 0.125 is Inf or Nan, cutting lambda
>>>> NEML stress update failed!
>>>>    Line search: objective function at lambdas = 0.0625 is Inf or Nan, cutting lambda
>>>> NEML stress update failed!
>>>>    Line search: objective function at lambdas = 0.03125 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 0.015625 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 0.0078125 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 0.00390625 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 0.00195312 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 0.000976562 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 0.000488281 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 0.000244141 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 0.00012207 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 6.10352e-05 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 3.05176e-05 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 1.52588e-05 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 7.62939e-06 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 3.8147e-06 is Inf or Nan, cutting lambda
>>>> 
>>>>    Line search: objective function at lambdas = 1.90735e-06 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 9.53674e-07 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 4.76837e-07 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 2.38419e-07 is Inf or Nan, cutting lambda
>>>> 
>>>>    Line search: objective function at lambdas = 1.19209e-07 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 5.96046e-08 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 2.98023e-08 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 1.49012e-08 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 7.45058e-09 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 3.72529e-09 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 1.86265e-09 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 9.31323e-10 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 4.65661e-10 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 2.32831e-10 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 1.16415e-10 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 5.82077e-11 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 2.91038e-11 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 1.45519e-11 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 7.27596e-12 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 3.63798e-12 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 1.81899e-12 is Inf or Nan, cutting lambda
>>>>    Line search: objective function at lambdas = 9.09495e-13 is Inf or Nan, cutting lambda
>>>> Nonlinear solve did not converge due to DIVERGED_LINE_SEARCH iterations 0
>>>> Solve Did NOT Converge!
>>>> 
>>>> I really appreciate any inputs or insights, thanks for your time and help.
>>>> 
>>>> Best,
>>>> 
>>>> Sean
>>>> 
>>>> 
>>>> <image001.png>
>>>> Sean Hsu   Mechanics of Materials Intern
>>>> p 510.808.5265   e hsu at kairospower.com
>>>> 707 W Tower Ave, Alameda, CA 94501
>>>> www.kairospower.com   <image002.png>
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> 
> 



More information about the petsc-users mailing list