<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Wed, Aug 22, 2018 at 1:33 PM Ellen Price <<a href="mailto:ellen.price@cfa.harvard.edu">ellen.price@cfa.harvard.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>For Barry:</div><div><br></div><div>Okay, I'm stumped. Some of my Jacobian terms match up with the finite difference one. Others are off by orders of magnitude. </div></div></blockquote><div><br></div><div>Then your Jacobian does not match your function. This is easy to track down:</div><div><br></div><div>  1) Reduce your problem size to 10 or so. Lots of people try to avoid doing this, but that is crazy. Always have a problem of size 10.</div><div><br></div><div>  2) PETSc uses the all ones (1, 1, ..., 1) and all minus ones (-1, -1, ..., -1) states to check the Jacobian. Feed those into</div><div>      your PETSc function and your Mathematics function. Do you get the same vector out?</div><div><br></div><div>  3) Feed them into your PETSc Jacobian and Mathematics Jacobian. Do you get the same matrix out?</div><div><br></div><div>There is nothing mystical here. PETSc is saying that the analytic Jacobian must match finite differences in the limit.</div><div><br></div><div>   Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>This led me to think that maybe there was something wrong with my analytic Jacobian, so I went back and started trying to track down which term causes the problem. The weird thing is that Mathematica gives the same numerical result as my code (within a small tolerance), whether i substitute numbers into its analytic Jacobian or numerically differentiate the function I'm trying to solve using Mathematica. I can't think of anything else to try for debugging the Jacobian. Even if the function has a bug in it, Mathematica seems pretty sure that I'm using the right Jacobian for what I put in... At the suggestion of the FAQ, I also tried "-mat_fd_type ds" to see if that made any difference, but it doesn't. Like I said, I'm stumped, but I'll keep trying.</div><div><br></div>For Matt:<div><br></div><div>Using options "-pc_type lu -snes_view -snes_monitor -snes_converged_reason -ksp_monitor_true_residual -ksp_converged_reason -snes_type newtontr", the output is below. I thought that the step length convergence message had something to do with using NEWTONTR, but I could be wrong about that. It doesn't give that message when I switch to NEWTONLS. It also only gives that message after the first iteration -- the first one is always CONVERGED_RTOL.</div><div><br></div><div><div>7.533921e-03, 2.054699e+02</div><div>  0 SNES Function norm 6.252612941119e+04 </div><div>    0 KSP preconditioned resid norm 1.027204021595e-01 true resid norm 6.252612941119e+04 ||r(i)||/||b|| 1.000000000000e+00</div><div>    1 KSP preconditioned resid norm 2.641617517320e-17 true resid norm 3.941610607093e-11 ||r(i)||/||b|| 6.303941478885e-16</div><div>  Linear solve converged due to CONVERGED_RTOL iterations 1</div><div>  1 SNES Function norm 6.252085930204e+04 </div><div>    0 KSP preconditioned resid norm 3.857372064539e-01 true resid norm 6.252085930204e+04 ||r(i)||/||b|| 1.000000000000e+00</div><div>    1 KSP preconditioned resid norm 3.664921095158e-16 true resid norm 2.434875289829e-10 ||r(i)||/||b|| 3.894500678672e-15</div><div>  Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1</div><div>  2 SNES Function norm 6.251886926050e+04 </div><div>    0 KSP preconditioned resid norm 4.085000006938e-01 true resid norm 6.251886926050e+04 ||r(i)||/||b|| 1.000000000000e+00</div><div>    1 KSP preconditioned resid norm 5.318519617927e-16 true resid norm 8.189931340015e-10 ||r(i)||/||b|| 1.309993516660e-14</div><div>  Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1</div><div>.....</div><div> 48 SNES Function norm 6.229932776333e+04 </div><div>    0 KSP preconditioned resid norm 4.872526135345e+00 true resid norm 6.229932776333e+04 ||r(i)||/||b|| 1.000000000000e+00</div><div>    1 KSP preconditioned resid norm 7.281571852581e-15 true resid norm 8.562155918387e-09 ||r(i)||/||b|| 1.374357673796e-13</div><div>  Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1</div><div> 49 SNES Function norm 6.229663411026e+04 </div><div>    0 KSP preconditioned resid norm 4.953303401996e+00 true resid norm 6.229663411026e+04 ||r(i)||/||b|| 1.000000000000e+00</div><div>    1 KSP preconditioned resid norm 1.537319689814e-15 true resid norm 4.783442569088e-09 ||r(i)||/||b|| 7.678492806886e-14</div><div>  Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1</div><div> 50 SNES Function norm 6.229442686875e+04 </div><div>Nonlinear solve did not converge due to DIVERGED_MAX_IT iterations 50</div><div>SNES Object: 2 MPI processes</div><div>  type: newtontr</div><div>    Trust region tolerance (-snes_trtol)</div><div>    mu=0.25, eta=0.75, sigma=0.0001</div><div>    delta0=0.2, delta1=0.3, delta2=0.75, delta3=2.</div><div>  maximum iterations=50, maximum function evaluations=10000</div><div>  tolerances: relative=1e-08, absolute=1e-50, solution=1e-08</div><div>  total number of linear solver iterations=50</div><div>  total number of function evaluations=58</div><div>  norm schedule ALWAYS</div><div>  SNESLineSearch Object: 2 MPI processes</div><div>    type: basic</div><div>    maxstep=1.000000e+08, minlambda=1.000000e-12</div><div>    tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08</div><div>    maximum iterations=1</div><div>  KSP Object: 2 MPI processes</div><div>    type: gmres</div><div>      restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div>      happy breakdown tolerance 1e-30</div><div>    maximum iterations=10000, initial guess is zero</div><div>    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div>    left preconditioning</div><div>    using PRECONDITIONED norm type for convergence test</div><div>  PC Object: 2 MPI processes</div><div>    type: lu</div><div>      out-of-place factorization</div><div>      tolerance for zero pivot 2.22045e-14</div><div>      matrix ordering: natural</div><div>      factor fill ratio given 0., needed 0.</div><div>        Factored matrix follows:</div><div>          Mat Object: 2 MPI processes</div><div>            type: mumps</div><div>            rows=20, cols=20</div><div>            package used to perform factorization: mumps</div><div>            total: nonzeros=112, allocated nonzeros=112</div><div>            total number of mallocs used during MatSetValues calls =0</div><div>              MUMPS run parameters:</div><div>                SYM (matrix type):                   0 </div><div>                PAR (host participation):            1 </div><div>                ICNTL(1) (output for error):         6 </div><div>                ICNTL(2) (output of diagnostic msg): 0 </div><div>                ICNTL(3) (output for global info):   0 </div><div>                ICNTL(4) (level of printing):        0 </div><div>                ICNTL(5) (input mat struct):         0 </div><div>                ICNTL(6) (matrix prescaling):        7 </div><div>                ICNTL(7) (sequential matrix ordering):7 </div><div>                ICNTL(8) (scaling strategy):        77 </div><div>                ICNTL(10) (max num of refinements):  0 </div><div>                ICNTL(11) (error analysis):          0 </div><div>                ICNTL(12) (efficiency control):                         1 </div><div>                ICNTL(13) (efficiency control):                         0 </div><div>                ICNTL(14) (percentage of estimated workspace increase): 20 </div><div>                ICNTL(18) (input mat struct):                           3 </div><div>                ICNTL(19) (Schur complement info):                       0 </div><div>                ICNTL(20) (rhs sparse pattern):                         0 </div><div>                ICNTL(21) (solution struct):                            1 </div><div>                ICNTL(22) (in-core/out-of-core facility):               0 </div><div>                ICNTL(23) (max size of memory can be allocated locally):0 </div><div>                ICNTL(24) (detection of null pivot rows):               0 </div><div>                ICNTL(25) (computation of a null space basis):          0 </div><div>                ICNTL(26) (Schur options for rhs or solution):          0 </div><div>                ICNTL(27) (experimental parameter):                     -32 </div><div>                ICNTL(28) (use parallel or sequential ordering):        1 </div><div>                ICNTL(29) (parallel ordering):                          0 </div><div>                ICNTL(30) (user-specified set of entries in inv(A)):    0 </div><div>                ICNTL(31) (factors is discarded in the solve phase):    0 </div><div>                ICNTL(33) (compute determinant):                        0 </div><div>                ICNTL(35) (activate BLR based factorization):           0 </div><div>                CNTL(1) (relative pivoting threshold):      0.01 </div><div>                CNTL(2) (stopping criterion of refinement): 1.49012e-08 </div><div>                CNTL(3) (absolute pivoting threshold):      0. </div><div>                CNTL(4) (value of static pivoting):         -1. </div><div>                CNTL(5) (fixation for null pivots):         0. </div><div>                CNTL(7) (dropping parameter for BLR):       0. </div><div>                RINFO(1) (local estimated flops for the elimination after analysis): </div><div>                  [0] 127. </div><div>                  [1] 155. </div><div>                RINFO(2) (local estimated flops for the assembly after factorization): </div><div>                  [0]  16. </div><div>                  [1]  16. </div><div>                RINFO(3) (local estimated flops for the elimination after factorization): </div><div>                  [0]  127. </div><div>                  [1]  155. </div><div>                INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization): </div><div>                [0] 1 </div><div>                [1] 1 </div><div>                INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization): </div><div>                  [0] 1 </div><div>                  [1] 1 </div><div>                INFO(23) (num of pivots eliminated on this processor after factorization): </div><div>                  [0] 10 </div><div>                  [1] 10 </div><div>                RINFOG(1) (global estimated flops for the elimination after analysis): 282. </div><div>                RINFOG(2) (global estimated flops for the assembly after factorization): 32. </div><div>                RINFOG(3) (global estimated flops for the elimination after factorization): 282. </div><div>                (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)</div><div>                INFOG(3) (estimated real workspace for factors on all processors after analysis): 112 </div><div>                INFOG(4) (estimated integer workspace for factors on all processors after analysis): 225 </div><div>                INFOG(5) (estimated maximum front size in the complete tree): 4 </div><div>                INFOG(6) (number of nodes in the complete tree): 9 </div><div>                INFOG(7) (ordering option effectively use after analysis): 2 </div><div>                INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100 </div><div>                INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 112 </div><div>                INFOG(10) (total integer space store the matrix factors after factorization): 225 </div><div>                INFOG(11) (order of largest frontal matrix after factorization): 4 </div><div>                INFOG(12) (number of off-diagonal pivots): 0 </div><div>                INFOG(13) (number of delayed pivots after factorization): 0 </div><div>                INFOG(14) (number of memory compress after factorization): 0 </div><div>                INFOG(15) (number of steps of iterative refinement after solution): 0 </div><div>                INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 1 </div><div>                INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 2 </div><div>                INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 1 </div><div>                INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 2 </div><div>                INFOG(20) (estimated number of entries in the factors): 112 </div><div>                INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 1 </div><div>                INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 2 </div><div>                INFOG(23) (after analysis: value of ICNTL(6) effectively used): 0 </div><div>                INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1 </div><div>                INFOG(25) (after factorization: number of pivots modified by static pivoting): 0 </div><div>                INFOG(28) (after factorization: number of null pivots encountered): 0</div><div>                INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 112</div><div>                INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0</div><div>                INFOG(32) (after analysis: type of analysis done): 1</div><div>                INFOG(33) (value used for ICNTL(8)): 7</div><div>                INFOG(34) (exponent of the determinant if determinant is requested): 0</div><div>    linear system matrix = precond matrix:</div><div>    Mat Object: 2 MPI processes</div><div>      type: mpiaij</div><div>      rows=20, cols=20, bs=2</div><div>      total: nonzeros=112, allocated nonzeros=240</div><div>      total number of mallocs used during MatSetValues calls =0</div></div><div><br></div><div>Ellen</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Aug 21, 2018 at 10:16 PM Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Aug 21, 2018 at 10:00 PM Ellen M. Price <<a href="mailto:ellen.price@cfa.harvard.edu" target="_blank">ellen.price@cfa.harvard.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi PETSc users,<br>

<br>

I'm having trouble applying SNES to a new problem I'm working on. I'll<br>

try to be as complete as possible but can't post the full code because<br>

it's ongoing research and is pretty long anyway.<br>

<br>

The nonlinear problem arises from trying to solve a set of two coupled<br>

ODEs using a Galerkin method. I am using Mathematica to generate the<br>

objective function to solve and the Jacobian, so I *think* I can rule<br>

out human error on that front.<br>

<br>

There are four things I can easily change:<br>

<br>

- number of DMDA grid points N (I've tried 100 and 1000)<br>

- preconditioner (I've tried LU and SVD, LU appears to work better, and<br>

SVD is too slow for N = 1000)<br>

- linear solver (haven't played with this much, but sometimes smaller<br>

tolerances work better)<br>

- nonlinear solver (what I'm having trouble with)<br>

<br>

Trying different solvers should, as far as I'm aware, give comparable<br>

answers, but that's not the case here. NEWTONTR works best of the ones<br>

I've tried, but I'm suspicious that the function value barely decreases<br>

before SNES "converges" -- and none of the options I've tried changing<br>

seem to affect this, as it just finds another reason to converge without<br>

making any real progress. For example:<br>

<br>

  0 SNES Function norm 7.197788479418e+00<br>

    0 KSP Residual norm 1.721996766619e+01<br>

    1 KSP Residual norm 5.186021549059e-14<br>

  Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1<br></blockquote><div><br></div><div>This makes no sense. LU should converge due to atol or rtol. Send the output of</div><div><br></div><div>  -snes_view -snes_monitor -snes_converged_reason -ksp_monitor_true_residual -ksp_converged_reason</div><div><br></div><div>  Thanks,</div><div><br></div><div>    Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  1 SNES Function norm 7.197777674987e+00<br>

    0 KSP Residual norm 3.296112185897e+01<br>

    1 KSP Residual norm 2.713415432045e-13<br>

  Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1<br>

.....<br>

 50 SNES Function norm 7.197376803542e+00<br>

    0 KSP Residual norm 6.222518656302e+02<br>

    1 KSP Residual norm 9.630659996504e-12<br>

  Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1<br>

 51 SNES Function norm 7.197376803542e+00<br>

Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 50<br>

<br>

I've tried everything I can think of and the FAQ suggestions, including<br>

non-dimensionalizing the problem; I observe the same behavior either<br>

way. The particularly strange thing I can't understand is why some of<br>

the SNES methods fail outright, after just one iteration (NCG, NGMRES,<br>

and others) with DIVERGED_DTOL. Unless I've misunderstood, it seems<br>

like, for the most part, I should be able to substitute in one method<br>

for another, possibly adjusting a few parameters along the way.<br>

Furthermore, the default method, NEWTONLS, diverges with<br>

DIVERGED_LINE_SEARCH, which I'm not sure how to debug.<br>

<br>

So the only viable method is NEWTONTR, and that doesn't appear to<br>

"really" converge. Any suggestions for further things to try are<br>

appreciated. My current options are:<br>

<br>

-pc_type lu -snes_monitor -snes_converged_reason -ksp_converged_reason<br>

-snes_max_it 10000 -ksp_monitor -snes_type newtonls<br>

<br>

Thanks in advance,<br>

Ellen Price<br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="m_-3874671242052232164m_4401524286654090173gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.caam.rice.edu/~mk51/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div>

</blockquote></div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.caam.rice.edu/~mk51/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div>