[petsc-users] snes failures

Wed May 18 16:17:53 CDT 2016

On Wednesday 18 May 2016 13:48:52 Matthew Knepley wrote:
> On Wed, May 18, 2016 at 1:38 PM, Juha Jaykka <juhaj at iki.fi> wrote:
> > Dear list,
> > 
> > I'm designing a short training course on HPC, and decided to use PETSc as
> > an
> > example of a good way of getting things done quick, easy, and with good
> > performance, and without needing to write one's own code for things like
> > linear or non-linear solvers etc.
> > 
> > However, my SNES example turned out to be problematic: I chose the
> > (static)
> > sine-Gordon equation for my example, mostly because its exact solution is
> > known so it is easy to compare with numerics and also because it is, after
> > all, a dead simple equation. Yet my code refuses to converge most of the
> > time!
> > 
> > Using -snes_type ngs always succeeds, but is also very slow. Any other
> > type
> > will fail once I increase the domain size from ~100 points (the actual
> > number
> > depends on the type). I always keep the lattice spacing at 0.1. The
> > failure is
> > also always the same: DIVERGED_LINE_SEARCH. Some types manage to take one
> > step
> > and get stuck, some types manage to decrease the norm once and then
> > continue
> > forever without decreasing the norm but not complaining about divergence
> > either (unless they hit one of the max_it-type limits), and ncg is the
> > worst
> > of all: it always (with any lattice size!) fails at the very first step.
> > 
> > I've checked the Jacobian, and I suspect it is ok as ngs converges and the
> > other types except ncg also converge nicely unless the domain is too big.
> 
> Nope, ngs does not use the Jacobian, and small problems can converge with
> wrong Jacobians.
> 
> Any ideas of where this could go wrong?
> 
> 
> 1) Just run with -snes_fd_color  -snes_fd_color_use_mat -mat_coloring_type
> greedy and
>     see if it converges.

It does not. And I should have mentioned earlier, that I tried -snes_mf, -
snes_mf_operator, -snes_fd and -snes_fd_color already and none of those 
converges. Your suggested options result in 

  0 SNES Function norm 1.002496882788e+00 
      Line search: lambdas = [1., 0.], ftys = [1.01105, 1.005]
      Line search terminated: lambda = 168.018, fnorms = 1.58978
  1 SNES Function norm 1.589779063742e+00 
      Line search: lambdas = [1., 0.], ftys = [5.57144, 4.11598]
      Line search terminated: lambda = 4.82796, fnorms = 8.93164
  2 SNES Function norm 8.931639387159e+00 
      Line search: lambdas = [1., 0.], ftys = [2504.72, 385.612]
      Line search terminated: lambda = 2.18197, fnorms = 157.043
  3 SNES Function norm 1.570434892800e+02 
      Line search: lambdas = [1., 0.], ftys = [1.89092e+08, 1.48956e+06]
      Line search terminated: lambda = 2.00794, fnorms = 40941.5
  4 SNES Function norm 4.094149042511e+04 
      Line search: lambdas = [1., 0.], ftys = [8.60081e+17, 2.56063e+13]
      Line search terminated: lambda = 2.00003, fnorms = 2.75067e+09
  5 SNES Function norm 2.750671622274e+09 
      Line search: lambdas = [1., 0.], ftys = [1.75232e+37, 7.76449e+27]
      Line search terminated: lambda = 2., fnorms = 1.24157e+19
  6 SNES Function norm 1.241565256983e+19 
      Line search: lambdas = [1., 0.], ftys = [7.27339e+75, 7.14012e+56]
      Line search terminated: lambda = 2., fnorms = 2.52948e+38
  7 SNES Function norm 2.529479470902e+38 
      Line search: lambdas = [1., 0.], ftys = [1.25309e+153, 6.03796e+114]
      Line search terminated: lambda = 2., fnorms = 1.04992e+77
  8 SNES Function norm 1.049915566775e+77 
      Line search: lambdas = [1., 0.], ftys = [3.71943e+307, 4.31777e+230]
      Line search terminated: lambda = 2., fnorms = inf.
  9 SNES Function norm            inf 

Which is very similar (perhaps even identical) to what ncg does with cp 
linesearch even without your suggestions. And yes, I also forgot to say, all 
the results I referred to were with -snes_linesearch_type bt.

While testing a bit more, though, I noticed that when using -snes_type ngs the 
norm first goes UP before starting to decrease:

  0 SNES Function norm 1.002496882788e+00 
  1 SNES Function norm 1.264791228033e+00 
  2 SNES Function norm 1.296062264876e+00 
  3 SNES Function norm 1.290207363235e+00 
  4 SNES Function norm 1.289395207346e+00 
etc until
1952 SNES Function norm 9.975720236748e-09 

> http://scicomp.stackexchange.com/questions/30/why-is-newtons-method-not-conv
> erging

None of this flags up any problems and -snes_check_jacobian consistently gives 
something like

9.55762e-09 = ||J - Jfd||/||J|| 3.97595e-06  = ||J - Jfd||

and looking at the values themselves with -snes_check_jacobian_view does not 
flag any odd points which might be wrong but not show up in the above norm.

There is just one point which I found in all this testing. Running with a 
normal run but with -mat_mffd_type ds added, fails with

  Linear solve did not converge due to DIVERGED_INDEFINITE_PC iterations 2
Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0

instead of failing the line search. Where did the indefinite PC suddenly come 
from?

Another point perhaps worth noting is that at a particular grid size, all the 
failing solves always produce the same result with the same function norm 
(which at 200 points equals 4.6458600451067145e-01), so at least they are 
failing somewhat consistently. This is except the mffd above, of course. The 
resulting iterate in the failing cases has an oscillatory nature, with the 
number of oscillations increasing with the domain increasing: if my domain is 
smaller than about -6 to +6 all the methods converge. If the domain is about 
-13 to +13, the "solution" starts to pick up another oscillation etc.

Could there be something hairy in the sin() term of the sine-Gordon, somehow? 
An oscillatory solution seems to point the finger towards an oscillatory term 
in the equation, but I cannot see how or why it should cause oscillations.

This is also irrespective of whether my Jacobian gets called, so I think I can 
be pretty confident the problem is not in the Jacobian, but someplace else 
instead. (That said, the Jacobian may still of course have some other 
problem.)

Cheers,
Juha