[petsc-users] Newton LS - better results on single processor
Smith, Barry F.
bsmith at mcs.anl.gov
Sat Nov 11 14:12:14 CST 2017
> On Nov 10, 2017, at 4:18 PM, zakaryah . <zakaryah at gmail.com> wrote:
>
> Thanks for the advice.
>
> I put a VecView() on the solution vector, both before and after all the solves, and in the SNES Function and Jacobian, as well as a MatView() in the Jacobian, and a VecView() on the residual after the solves. Then I run a tiny problem with -pc_type redundant -redundant_pc_type lu, as Stefano suggested, and I compare the vectors and matrices with -n 1 to -n 2. Although the monitor shows somewhat different residuals for the KSP and for the SNES, the differences are very small. For example, after the first SNESSolve(), the SNES residual is 6.5e-18 with -n 1 and 6.0e-18 with -n 2, and of course I don't care about that tiny difference unless it indicates some mistake in my code. The VecView() and MatView() show that the state vector, the function vector, and the function Jacobian are identical (up to the default output precision of the view routines). The residuals are of course slightly different.
>
> For this problem it took 8 Riks iterations for the loading coefficient to reach 1 (i.e. to finish the iterations). For the last solve, the residuals and their differences were larger: 8.4e-15 with -n 1 and 8.7e-15 with -n 2. I think this supports my hypothesis that the iterations which feed one SNESSolve() solution into the initial guess for the next solve can amplify small differences.
>
> To check a bit deeper, I removed all of the view calls except to the SNES residual, and ran on a more realistic problem size, with the SNES defaults (and KSP defaults, PC defaults, etc).
>
> I will call the residual after the first SNESSolve Ri, and the residual after the last SNESSolve Rf. With -n 1, Ri and Rf are both spatially smooth (as I expect). I think the standard deviation (over space) of the residual is a useful way to quantify its amplitude; for -n 1, sd(Ri) = 5.1e-13 and sd(Rf) = 6.8e-12. With -n 2, both Ri and Rf have a discontinuity in x, at x=0, x=20, x=21, and x=41 (i.e. at the global boundary and at the boundary between the two processes). For -n 2, sd(Ri) = 1.2e-12 and sd(Rf) = 5.7e-12, with all of the additional fluctuations coming from those boundary coordinates. In other words, the fluctuations of the residual are an order of magnitude larger at the boundary between processes.
So the residual is order 5e-11 at the boundaries and order 5e-12 elsewhere?
This is nothing. Residuals that small are close to meaningless anyways.
If the residuals are 5e-6 at the boundaries and 5e-12 elsewhere then I would be worried.
>
> If I consider the residuals as a function of the other dimensions (y or z instead of x), the entire range of which is owned by each processor, I don't see any discontinuity.
>
> I suppose that all of this has something to do with the spectrums of the matrices involved in the solve but I don't know enough to improve the results I'm obtaining.
>
> On Thu, Nov 9, 2017 at 11:09 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>
>
> > On Nov 9, 2017, at 3:33 PM, zakaryah . <zakaryah at gmail.com> wrote:
> >
> > Hi Stefano - when I referred to the iterations, I was trying to point out that my method solves a series of nonlinear systems, with the solution to the first problem being used to initialize the state vector for the second problem, etc. The reason I mentioned that was I thought perhaps I can expect the residuals from single process solve to differ from the residuals from multiprocess solve by a very small amount, say machine precision or the tolerance of the KSP/SNES, that would be fine normally. But, if there is a possibility that those differences are somehow amplified by each of the iterations (solution->initial state), that could explain what I see.
> >
> > I agree that it is more likely that I have a bug in my code but I'm having trouble finding it.
>
> Run a tiny problem on one and two processes with LU linear solver and the same mesh. So in the first case all values live on the first process and in the second the same first half live on one process and the second half on the second process.
>
> Now track the values in the actual vectors and matrices. For example you can just put in VecView() and MatView() on all objects you pass into the solver and then put them in the SNESComputeFunction/Jacobian routines. Print both the vectors inputed to these routines and the vectors/matrices created in the routines. The output differences from the two runs should be small, determine when they significantly vary. This will tell you the likely location of the bug in your source code. (For example if certain values of the Jacobian differ)
>
> Good luck, I've done this plenty of times and if it is a "parallelization" bug this will help you find it much faster than guessing where the problem is and trying code inspect to find the bug.
>
> Barry
>
> >
> > I ran a small problem with -pc_type redundant -redundant_pc_type lu, as you suggested. What I think is the relevant portion of the output is here (i.e. there are small differences in the KSP residuals and SNES residuals):
> >
> > -n 1, first "iteration" as described above:
> >
> > 0 SNES Function norm 6.053565720454e-02
> > 0 KSP Residual norm 4.883115701982e-05
> >
> > 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 8.173640409069e-20
> >
> > 1 KSP preconditioned resid norm 8.173640409069e-20 true resid norm 1.742143029296e-16 ||r(i)||/||b|| 2.877879104227e-15
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 6.053565720454e-02 gnorm 2.735518570862e-07
> >
> > 1 SNES Function norm 2.735518570862e-07
> >
> > 0 KSP Residual norm 1.298536630766e-10
> >
> > 0 KSP preconditioned resid norm 1.298536630766e-10 true resid norm 2.735518570862e-07 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 2.152782096751e-25
> >
> > 1 KSP preconditioned resid norm 2.152782096751e-25 true resid norm 4.755555202641e-22 ||r(i)||/||b|| 1.738447420279e-15
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 2.735518570862e-07 gnorm 1.917989238989e-17
> >
> > 2 SNES Function norm 1.917989238989e-17
> >
> > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2
> >
> >
> >
> > -n 2, first "iteration" as described above:
> >
> > 0 SNES Function norm 6.053565720454e-02
> >
> > 0 KSP Residual norm 4.883115701982e-05
> >
> > 0 KSP preconditioned resid norm 4.883115701982e-05 true resid norm 6.053565720454e-02 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 1.007084240718e-19
> >
> > 1 KSP preconditioned resid norm 1.007084240718e-19 true resid norm 1.868472589717e-16 ||r(i)||/||b|| 3.086565300520e-15
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 6.053565720454e-02 gnorm 2.735518570379e-07
> >
> > 1 SNES Function norm 2.735518570379e-07
> >
> > 0 KSP Residual norm 1.298536630342e-10
> >
> > 0 KSP preconditioned resid norm 1.298536630342e-10 true resid norm 2.735518570379e-07 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 1.885083482938e-25
> >
> > 1 KSP preconditioned resid norm 1.885083482938e-25 true resid norm 4.735707460766e-22 ||r(i)||/||b|| 1.731191852267e-15
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 2.735518570379e-07 gnorm 1.851472273258e-17
> >
> >
> > 2 SNES Function norm 1.851472273258e-17
> >
> >
> > -n 1, final "iteration":
> > 0 SNES Function norm 9.695669610792e+01
> >
> > 0 KSP Residual norm 7.898912593878e-03
> >
> > 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 1.720960785852e-17
> >
> > 1 KSP preconditioned resid norm 1.720960785852e-17 true resid norm 1.237111121391e-13 ||r(i)||/||b|| 1.275941911237e-15
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 9.695669610792e+01 gnorm 1.026572731653e-01
> >
> > 1 SNES Function norm 1.026572731653e-01
> >
> > 0 KSP Residual norm 1.382450412926e-04
> >
> > 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm 1.026572731653e-01 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 5.018078565710e-20
> >
> > 1 KSP preconditioned resid norm 5.018078565710e-20 true resid norm 9.031463071676e-17 ||r(i)||/||b|| 8.797684560673e-16
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 1.026572731653e-01 gnorm 7.982937980399e-06
> >
> > 2 SNES Function norm 7.982937980399e-06
> >
> > 0 KSP Residual norm 4.223898196692e-08
> >
> > 0 KSP preconditioned resid norm 4.223898196692e-08 true resid norm 7.982937980399e-06 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 1.038123933240e-22
> >
> > 1 KSP preconditioned resid norm 1.038123933240e-22 true resid norm 3.213931469966e-20 ||r(i)||/||b|| 4.026000800530e-15
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 7.982937980399e-06 gnorm 9.776066323463e-13
> >
> > 3 SNES Function norm 9.776066323463e-13
> >
> > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3
> >
> > -n 2, final "iteration":
> >
> > 0 SNES Function norm 9.695669610792e+01
> >
> > 0 KSP Residual norm 7.898912593878e-03
> >
> > 0 KSP preconditioned resid norm 7.898912593878e-03 true resid norm 9.695669610792e+01 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 1.752819851736e-17
> >
> > 1 KSP preconditioned resid norm 1.752819851736e-17 true resid norm 1.017605437996e-13 ||r(i)||/||b|| 1.049546322064e-15
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 9.695669610792e+01 gnorm 1.026572731655e-01
> >
> > 1 SNES Function norm 1.026572731655e-01
> >
> > 0 KSP Residual norm 1.382450412926e-04
> >
> > 0 KSP preconditioned resid norm 1.382450412926e-04 true resid norm 1.026572731655e-01 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 1.701690118486e-19
> >
> > 1 KSP preconditioned resid norm 1.701690118486e-19 true resid norm 9.077679331860e-17 ||r(i)||/||b|| 8.842704517606e-16
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 1.026572731655e-01 gnorm 7.982937883350e-06
> >
> > 2 SNES Function norm 7.982937883350e-06
> >
> > 0 KSP Residual norm 4.223898196594e-08
> >
> > 0 KSP preconditioned resid norm 4.223898196594e-08 true resid norm 7.982937883350e-06 ||r(i)||/||b|| 1.000000000000e+00
> >
> > 1 KSP Residual norm 1.471638984554e-23
> >
> > 1 KSP preconditioned resid norm 1.471638984554e-23 true resid norm 2.483672977401e-20 ||r(i)||/||b|| 3.111226735938e-15
> >
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >
> > Line search: Using full step: fnorm 7.982937883350e-06 gnorm 1.019121417798e-12
> >
> > 3 SNES Function norm 1.019121417798e-12
> >
> >
> > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 3
> >
> >
> >
> > Of course these differences are still very small, but this is only true for such a small problem size. For a regular sized problem, the differences at the final iteration can exceed 1 and even 100 at a particular grid point (i.e. in a sense that doesn't scale with problem size).
> >
> > I also compared -n 1 and -n 2 with the -snes_monitor_solution -ksp_view_rhs -ksp_view_mat -ksp_view_solution options on a tiny problem (5x5x5), and I was not able to find any differences in the Jacobian or the vectors, but I'm suspicious that this could be due to the output format, because even for the tiny problem there are non-trivial differences in the residuals of both the SNES and the KSP.
> >
> > In all cases, the differences in the residuals are localized to the boundary between parts of the displacement vector owned by the two processes. The SNES residual with -n 2 typically looks discontinuous across that boundary.
> >
> >
> > On Thu, Nov 9, 2017 at 11:16 AM, zakaryah . <zakaryah at gmail.com> wrote:
> > Thanks Stefano, I will try what you suggest.
> >
> > Matt - my DM is a composite between the redundant field (loading coefficient, which is included in the Newton solve in Riks' method) and the displacements, which are represented by a 3D DA with 3 dof. I am using finite difference.
> >
> > Probably my problem comes from confusion over how the composite DM is organized. I am using FormFunction(), and within that I call DMCompositeGetLocalVectors(), DMCompositeScatter(), DMDAVecGetArray(), and for the Jacobian, DMCompositeGetLocalISs() and MatGetLocalSubmatrix() to split J into Jbb, Jbh, Jhb, and Jhh, where b is the loading coefficient, and h is the displacements). The values of each submatrix are set using MatSetValuesLocal().
> >
> > I'm most suspicious of the part of the Jacobian routine where I calculate the rows of Jhb, the columns of Jbh, and the corresponding values. I take the DA coordinates and ix,iy,iz, then calculate the row of Jhb as ((((iz-info->gzs)*info->gym + (iy-info->gys))*info->gxm + (ix-info->gxs))*info->dof+c), where info is the DA local info and c is the degree of freedom. The same calculation is performed for the column of Jbh. I suspect that the indexing of the DA vector is not so simple, but I don't know for a fact that I'm doing this incorrectly nor how to do this properly.
> >
> > Thanks for all the help!
> >
> >
> > On Nov 9, 2017 8:44 AM, "Matthew Knepley" <knepley at gmail.com> wrote:
> > On Thu, Nov 9, 2017 at 12:14 AM, zakaryah . <zakaryah at gmail.com> wrote:
> > Well the saga of my problem continues. As I described previously in an epic thread, I'm using the SNES to solve problems involving an elastic material on a rectangular grid, subjected to external forces. In any case, I'm occasionally getting poor convergence using Newton's method with line search. In troubleshooting by visualizing the residual, I saw that in data sets which had good convergence, the residual was nevertheless significantly larger along the boundary between different processors. Likewise, in data sets with poor convergence, the residual became very large on the boundary between different processors. The residual is not significantly larger on the physical boundary, i.e. the global boundary. When I run on a single process, convergence seems to be good on all data sets.
> >
> > Any clues to fix this?
> >
> > It sounds like something is wrong with communication across domains:
> >
> > - If this is FEM, it sounds like you are not adding contributions from the other domain to shared vertices/edges/faces
> >
> > - If this is FDM/FVM, maybe the ghosts are not updated
> >
> > What DM are you using? Are you using the Local assembly functions (FormFunctionLocal), or just FormFunction()?
> >
> > Thanks,
> >
> > Matt
> >
> > --
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> >
> >
> >
>
>
More information about the petsc-users
mailing list