[petsc-users] Debugging SNES when "everything" looks okay
Matthew Knepley
knepley at gmail.com
Wed Aug 22 13:50:12 CDT 2018
On Wed, Aug 22, 2018 at 1:33 PM Ellen Price <ellen.price at cfa.harvard.edu>
wrote:
> For Barry:
>
> Okay, I'm stumped. Some of my Jacobian terms match up with the finite
> difference one. Others are off by orders of magnitude. This led me to think
> that maybe there was something wrong with my analytic Jacobian, so I went
> back and started trying to track down which term causes the problem. The
> weird thing is that Mathematica gives the same numerical result as my code
> (within a small tolerance), whether i substitute numbers into its analytic
> Jacobian or numerically differentiate the function I'm trying to solve
> using Mathematica. I can't think of anything else to try for debugging the
> Jacobian. Even if the function has a bug in it, Mathematica seems pretty
> sure that I'm using the right Jacobian for what I put in... At the
> suggestion of the FAQ, I also tried "-mat_fd_type ds" to see if that made
> any difference, but it doesn't. Like I said, I'm stumped, but I'll keep
> trying.
>
> For Matt:
>
> Using options "-pc_type lu -snes_view -snes_monitor -snes_converged_reason
> -ksp_monitor_true_residual -ksp_converged_reason -snes_type newtontr", the
> output is below. I thought that the step length convergence message had
> something to do with using NEWTONTR, but I could be wrong about that. It
> doesn't give that message when I switch to NEWTONLS. It also only gives
> that message after the first iteration -- the first one is always
> CONVERGED_RTOL.
>
> 7.533921e-03, 2.054699e+02
> 0 SNES Function norm 6.252612941119e+04
> 0 KSP preconditioned resid norm 1.027204021595e-01 true resid norm
> 6.252612941119e+04 ||r(i)||/||b|| 1.000000000000e+00
> 1 KSP preconditioned resid norm 2.641617517320e-17 true resid norm
> 3.941610607093e-11 ||r(i)||/||b|| 6.303941478885e-16
> Linear solve converged due to CONVERGED_RTOL iterations 1
>
Shoot, I forgot to have you add -snes_linesearch_monitor, but what that
would show is that you cut down the step
scaling lambda until its 1e-10 or something, so nothing changes in your
residual norm. This means your Jacobian
does not match your function.
> 1 SNES Function norm 6.252085930204e+04
> 0 KSP preconditioned resid norm 3.857372064539e-01 true resid norm
> 6.252085930204e+04 ||r(i)||/||b|| 1.000000000000e+00
> 1 KSP preconditioned resid norm 3.664921095158e-16 true resid norm
> 2.434875289829e-10 ||r(i)||/||b|| 3.894500678672e-15
> Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>
This is another indication that there is mismatch. Now the solution to your
linear equation is almost 0, so Newton thinks you
should not move at all.
Matt
> 2 SNES Function norm 6.251886926050e+04
> 0 KSP preconditioned resid norm 4.085000006938e-01 true resid norm
> 6.251886926050e+04 ||r(i)||/||b|| 1.000000000000e+00
> 1 KSP preconditioned resid norm 5.318519617927e-16 true resid norm
> 8.189931340015e-10 ||r(i)||/||b|| 1.309993516660e-14
> Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
> .....
> 48 SNES Function norm 6.229932776333e+04
> 0 KSP preconditioned resid norm 4.872526135345e+00 true resid norm
> 6.229932776333e+04 ||r(i)||/||b|| 1.000000000000e+00
> 1 KSP preconditioned resid norm 7.281571852581e-15 true resid norm
> 8.562155918387e-09 ||r(i)||/||b|| 1.374357673796e-13
> Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
> 49 SNES Function norm 6.229663411026e+04
> 0 KSP preconditioned resid norm 4.953303401996e+00 true resid norm
> 6.229663411026e+04 ||r(i)||/||b|| 1.000000000000e+00
> 1 KSP preconditioned resid norm 1.537319689814e-15 true resid norm
> 4.783442569088e-09 ||r(i)||/||b|| 7.678492806886e-14
> Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
> 50 SNES Function norm 6.229442686875e+04
> Nonlinear solve did not converge due to DIVERGED_MAX_IT iterations 50
> SNES Object: 2 MPI processes
> type: newtontr
> Trust region tolerance (-snes_trtol)
> mu=0.25, eta=0.75, sigma=0.0001
> delta0=0.2, delta1=0.3, delta2=0.75, delta3=2.
> maximum iterations=50, maximum function evaluations=10000
> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
> total number of linear solver iterations=50
> total number of function evaluations=58
> norm schedule ALWAYS
> SNESLineSearch Object: 2 MPI processes
> type: basic
> maxstep=1.000000e+08, minlambda=1.000000e-12
> tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
> maximum iterations=1
> KSP Object: 2 MPI processes
> type: gmres
> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> happy breakdown tolerance 1e-30
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object: 2 MPI processes
> type: lu
> out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0., needed 0.
> Factored matrix follows:
> Mat Object: 2 MPI processes
> type: mumps
> rows=20, cols=20
> package used to perform factorization: mumps
> total: nonzeros=112, allocated nonzeros=112
> total number of mallocs used during MatSetValues calls =0
> MUMPS run parameters:
> SYM (matrix type): 0
> PAR (host participation): 1
> ICNTL(1) (output for error): 6
> ICNTL(2) (output of diagnostic msg): 0
> ICNTL(3) (output for global info): 0
> ICNTL(4) (level of printing): 0
> ICNTL(5) (input mat struct): 0
> ICNTL(6) (matrix prescaling): 7
> ICNTL(7) (sequential matrix ordering):7
> ICNTL(8) (scaling strategy): 77
> ICNTL(10) (max num of refinements): 0
> ICNTL(11) (error analysis): 0
> ICNTL(12) (efficiency control): 1
> ICNTL(13) (efficiency control): 0
> ICNTL(14) (percentage of estimated workspace increase): 20
> ICNTL(18) (input mat struct): 3
> ICNTL(19) (Schur complement info): 0
> ICNTL(20) (rhs sparse pattern): 0
> ICNTL(21) (solution struct): 1
> ICNTL(22) (in-core/out-of-core facility): 0
> ICNTL(23) (max size of memory can be allocated locally):0
> ICNTL(24) (detection of null pivot rows): 0
> ICNTL(25) (computation of a null space basis): 0
> ICNTL(26) (Schur options for rhs or solution): 0
> ICNTL(27) (experimental parameter):
> -32
> ICNTL(28) (use parallel or sequential ordering): 1
> ICNTL(29) (parallel ordering): 0
> ICNTL(30) (user-specified set of entries in inv(A)): 0
> ICNTL(31) (factors is discarded in the solve phase): 0
> ICNTL(33) (compute determinant): 0
> ICNTL(35) (activate BLR based factorization): 0
> CNTL(1) (relative pivoting threshold): 0.01
> CNTL(2) (stopping criterion of refinement): 1.49012e-08
> CNTL(3) (absolute pivoting threshold): 0.
> CNTL(4) (value of static pivoting): -1.
> CNTL(5) (fixation for null pivots): 0.
> CNTL(7) (dropping parameter for BLR): 0.
> RINFO(1) (local estimated flops for the elimination after
> analysis):
> [0] 127.
> [1] 155.
> RINFO(2) (local estimated flops for the assembly after
> factorization):
> [0] 16.
> [1] 16.
> RINFO(3) (local estimated flops for the elimination after
> factorization):
> [0] 127.
> [1] 155.
> INFO(15) (estimated size of (in MB) MUMPS internal data
> for running numerical factorization):
> [0] 1
> [1] 1
> INFO(16) (size of (in MB) MUMPS internal data used during
> numerical factorization):
> [0] 1
> [1] 1
> INFO(23) (num of pivots eliminated on this processor after
> factorization):
> [0] 10
> [1] 10
> RINFOG(1) (global estimated flops for the elimination
> after analysis): 282.
> RINFOG(2) (global estimated flops for the assembly after
> factorization): 32.
> RINFOG(3) (global estimated flops for the elimination
> after factorization): 282.
> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
> (0.,0.)*(2^0)
> INFOG(3) (estimated real workspace for factors on all
> processors after analysis): 112
> INFOG(4) (estimated integer workspace for factors on all
> processors after analysis): 225
> INFOG(5) (estimated maximum front size in the complete
> tree): 4
> INFOG(6) (number of nodes in the complete tree): 9
> INFOG(7) (ordering option effectively use after analysis):
> 2
> INFOG(8) (structural symmetry in percent of the permuted
> matrix after analysis): 100
> INFOG(9) (total real/complex workspace to store the matrix
> factors after factorization): 112
> INFOG(10) (total integer space store the matrix factors
> after factorization): 225
> INFOG(11) (order of largest frontal matrix after
> factorization): 4
> INFOG(12) (number of off-diagonal pivots): 0
> INFOG(13) (number of delayed pivots after factorization):
> 0
> INFOG(14) (number of memory compress after factorization):
> 0
> INFOG(15) (number of steps of iterative refinement after
> solution): 0
> INFOG(16) (estimated size (in MB) of all MUMPS internal
> data for factorization after analysis: value on the most memory consuming
> processor): 1
> INFOG(17) (estimated size of all MUMPS internal data for
> factorization after analysis: sum over all processors): 2
> INFOG(18) (size of all MUMPS internal data allocated
> during factorization: value on the most memory consuming processor): 1
> INFOG(19) (size of all MUMPS internal data allocated
> during factorization: sum over all processors): 2
> INFOG(20) (estimated number of entries in the factors):
> 112
> INFOG(21) (size in MB of memory effectively used during
> factorization - value on the most memory consuming processor): 1
> INFOG(22) (size in MB of memory effectively used during
> factorization - sum over all processors): 2
> INFOG(23) (after analysis: value of ICNTL(6) effectively
> used): 0
> INFOG(24) (after analysis: value of ICNTL(12) effectively
> used): 1
> INFOG(25) (after factorization: number of pivots modified
> by static pivoting): 0
> INFOG(28) (after factorization: number of null pivots
> encountered): 0
> INFOG(29) (after factorization: effective number of
> entries in the factors (sum over all processors)): 112
> INFOG(30, 31) (after solution: size in Mbytes of memory
> used during solution phase): 0, 0
> INFOG(32) (after analysis: type of analysis done): 1
> INFOG(33) (value used for ICNTL(8)): 7
> INFOG(34) (exponent of the determinant if determinant is
> requested): 0
> linear system matrix = precond matrix:
> Mat Object: 2 MPI processes
> type: mpiaij
> rows=20, cols=20, bs=2
> total: nonzeros=112, allocated nonzeros=240
> total number of mallocs used during MatSetValues calls =0
>
> Ellen
>
> On Tue, Aug 21, 2018 at 10:16 PM Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Tue, Aug 21, 2018 at 10:00 PM Ellen M. Price <
>> ellen.price at cfa.harvard.edu> wrote:
>>
>>> Hi PETSc users,
>>>
>>> I'm having trouble applying SNES to a new problem I'm working on. I'll
>>> try to be as complete as possible but can't post the full code because
>>> it's ongoing research and is pretty long anyway.
>>>
>>> The nonlinear problem arises from trying to solve a set of two coupled
>>> ODEs using a Galerkin method. I am using Mathematica to generate the
>>> objective function to solve and the Jacobian, so I *think* I can rule
>>> out human error on that front.
>>>
>>> There are four things I can easily change:
>>>
>>> - number of DMDA grid points N (I've tried 100 and 1000)
>>> - preconditioner (I've tried LU and SVD, LU appears to work better, and
>>> SVD is too slow for N = 1000)
>>> - linear solver (haven't played with this much, but sometimes smaller
>>> tolerances work better)
>>> - nonlinear solver (what I'm having trouble with)
>>>
>>> Trying different solvers should, as far as I'm aware, give comparable
>>> answers, but that's not the case here. NEWTONTR works best of the ones
>>> I've tried, but I'm suspicious that the function value barely decreases
>>> before SNES "converges" -- and none of the options I've tried changing
>>> seem to affect this, as it just finds another reason to converge without
>>> making any real progress. For example:
>>>
>>> 0 SNES Function norm 7.197788479418e+00
>>> 0 KSP Residual norm 1.721996766619e+01
>>> 1 KSP Residual norm 5.186021549059e-14
>>> Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>>>
>>
>> This makes no sense. LU should converge due to atol or rtol. Send the
>> output of
>>
>> -snes_view -snes_monitor -snes_converged_reason
>> -ksp_monitor_true_residual -ksp_converged_reason
>>
>> Thanks,
>>
>> Matt
>>
>>
>>> 1 SNES Function norm 7.197777674987e+00
>>> 0 KSP Residual norm 3.296112185897e+01
>>> 1 KSP Residual norm 2.713415432045e-13
>>> Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>>> .....
>>> 50 SNES Function norm 7.197376803542e+00
>>> 0 KSP Residual norm 6.222518656302e+02
>>> 1 KSP Residual norm 9.630659996504e-12
>>> Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>>> 51 SNES Function norm 7.197376803542e+00
>>> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 50
>>>
>>> I've tried everything I can think of and the FAQ suggestions, including
>>> non-dimensionalizing the problem; I observe the same behavior either
>>> way. The particularly strange thing I can't understand is why some of
>>> the SNES methods fail outright, after just one iteration (NCG, NGMRES,
>>> and others) with DIVERGED_DTOL. Unless I've misunderstood, it seems
>>> like, for the most part, I should be able to substitute in one method
>>> for another, possibly adjusting a few parameters along the way.
>>> Furthermore, the default method, NEWTONLS, diverges with
>>> DIVERGED_LINE_SEARCH, which I'm not sure how to debug.
>>>
>>> So the only viable method is NEWTONTR, and that doesn't appear to
>>> "really" converge. Any suggestions for further things to try are
>>> appreciated. My current options are:
>>>
>>> -pc_type lu -snes_monitor -snes_converged_reason -ksp_converged_reason
>>> -snes_max_it 10000 -ksp_monitor -snes_type newtonls
>>>
>>> Thanks in advance,
>>> Ellen Price
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>
>>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20180822/eeb45cf3/attachment-0001.html>
More information about the petsc-users
mailing list