[petsc-users] Convergence issues for SNES NASM
Jed Brown
jed at jedbrown.org
Wed May 11 22:43:56 CDT 2022
Can you add -snes_linesearch_monitor -sub_snes_linesearch_monitor -ksp_converged_reason and send the output??
"Takahashi, Tadanaga" <tt73 at njit.edu> writes:
> Hello,
>
> We are working on a finite difference solver for a 2D nonlinear PDE with
> Dirichlet Boundary conditions on a rectangular domain. Our goal is to solve
> the problem with parallel nonlinear additive Schwarz (NASM) as the outer
> solver. Our code is similar to SNES example 5
> <https://petsc.org/release/src/snes/tutorials/ex5.c.html>. In example 5,
> the parallel NASM can be executed with a command like `mpiexec -n 4 ./ex5
> -mms 3 -snes_type nasm -snes_nasm_type restrict -da_overlap 2` which gives
> a convergent result. We assume this is the correct usage. A comment in the
> source code for NASM mentions that NASM should be a preconditioner but
> there's no documentation on the usage. The Brune paper does not cover
> parallel NASM either. We observed that increasing the overlap leads to
> fewer Schwarz iterations. The parallelization works seamlessly for an
> arbitrary number of subdomains. This is the type of behavior we were
> expecting from our code.
>
> Our method uses box-style stencil width d = ceil(N^(1/3)) on a N by N DMDA.
> The finite difference stencil consists of 4d+1 points spread out in a
> diamond formation. If a stencil point is out of bounds, then it is
> projected onto the boundary curve. Since the nodes on the boundary curve
> would result in an irregular mesh, we chose not treat boundary nodes as
> unknowns as in Example 5. We use DMDACreate2d to create the DA for the
> interior points and DMDASNESSetFunctionLocal to associate the residue
> function to the SNES object.
>
> Our code works serially. We have also tested our code
> with Newton-Krylov-Schwarz (NKS) by running something akin to `mpiexec -n
> <n> ./solve -snes_type newtonls`. We have tested the NKS for several
> quantities of subdomains and overlap and the code works as expected. We
> have some confidence in the correctness of our code. The overlapping NASM
> was implemented in MATLAB so we know the method converges. However, the
> parallel NASM will not converge with our PETSc code. We don't understand
> why NKS works while NASM does not. The F-norm residue monotonically
> decreases and then stagnates.
>
> Here is an example of the output when attempting to run NASM in parallel:
> takahashi at ubuntu:~/Desktop/MA-DDM/Cpp/Rectangle$ mpiexec -n 4 ./test1 -t1_N
> 20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type nasm
> -snes_nasm_type restrict
> 0 SNES Function norm 7.244681057908e+02
> 1 SNES Function norm 1.237688062971e+02
> 2 SNES Function norm 1.068926073552e+02
> 3 SNES Function norm 1.027563237834e+02
> 4 SNES Function norm 1.022184806736e+02
> 5 SNES Function norm 1.020818227640e+02
> 6 SNES Function norm 1.020325629121e+02
> 7 SNES Function norm 1.020149036595e+02
> 8 SNES Function norm 1.020088110545e+02
> 9 SNES Function norm 1.020067198030e+02
> 10 SNES Function norm 1.020060034469e+02
> 11 SNES Function norm 1.020057582380e+02
> 12 SNES Function norm 1.020056743241e+02
> 13 SNES Function norm 1.020056456101e+02
> 14 SNES Function norm 1.020056357849e+02
> 15 SNES Function norm 1.020056324231e+02
> 16 SNES Function norm 1.020056312727e+02
> 17 SNES Function norm 1.020056308791e+02
> 18 SNES Function norm 1.020056307444e+02
> 19 SNES Function norm 1.020056306983e+02
> 20 SNES Function norm 1.020056306826e+02
> 21 SNES Function norm 1.020056306772e+02
> 22 SNES Function norm 1.020056306753e+02
> 23 SNES Function norm 1.020056306747e+02
> 24 SNES Function norm 1.020056306745e+02
> 25 SNES Function norm 1.020056306744e+02
> 26 SNES Function norm 1.020056306744e+02
> 27 SNES Function norm 1.020056306744e+02
> 28 SNES Function norm 1.020056306744e+02
> 29 SNES Function norm 1.020056306744e+02
> 30 SNES Function norm 1.020056306744e+02
> 31 SNES Function norm 1.020056306744e+02
> 32 SNES Function norm 1.020056306744e+02
> 33 SNES Function norm 1.020056306744e+02
> 34 SNES Function norm 1.020056306744e+02
> 35 SNES Function norm 1.020056306744e+02
> 36 SNES Function norm 1.020056306744e+02
> 37 SNES Function norm 1.020056306744e+02
> 38 SNES Function norm 1.020056306744e+02
> 39 SNES Function norm 1.020056306744e+02
> 40 SNES Function norm 1.020056306744e+02
> 41 SNES Function norm 1.020056306744e+02
> 42 SNES Function norm 1.020056306744e+02
> 43 SNES Function norm 1.020056306744e+02
> 44 SNES Function norm 1.020056306744e+02
> 45 SNES Function norm 1.020056306744e+02
> 46 SNES Function norm 1.020056306744e+02
> 47 SNES Function norm 1.020056306744e+02
> 48 SNES Function norm 1.020056306744e+02
> 49 SNES Function norm 1.020056306744e+02
> 50 SNES Function norm 1.020056306744e+02
> SNES Object: 4 MPI processes
> type: nasm
> total subdomain blocks = 4
> Local solver information for first block on rank 0:
> Use -snes_view ::ascii_info_detail to display information for all blocks
> SNES Object: (sub_) 1 MPI processes
> type: newtonls
> maximum iterations=50, maximum function evaluations=10000
> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
> total number of linear solver iterations=22
> total number of function evaluations=40
> norm schedule ALWAYS
> Jacobian is built using a DMDA local Jacobian
> SNESLineSearch Object: (sub_) 1 MPI processes
> type: bt
> interpolation: cubic
> alpha=1.000000e-04
> maxstep=1.000000e+08, minlambda=1.000000e-12
> tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
> maximum iterations=40
> KSP Object: (sub_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (sub_) 1 MPI processes
> type: lu
> out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: nd
> factor fill ratio given 5., needed 2.13732
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=169, cols=169
> package used to perform factorization: petsc
> total: nonzeros=13339, allocated nonzeros=13339
> using I-node routines: found 104 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=169, cols=169
> total: nonzeros=6241, allocated nonzeros=6241
> total number of mallocs used during MatSetValues calls=0
> not using I-node routines
> maximum iterations=50, maximum function evaluations=10000
> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
> total number of function evaluations=51
> norm schedule ALWAYS
> Jacobian is built using a DMDA local Jacobian
> problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:
> error |u-uexact|_inf = 3.996e-01, |u-uexact|_h = 2.837e-01
>
> We have been stuck on this for a while now. We do not know how to debug
> this issue. Please let us know if you have any insights.
More information about the petsc-users
mailing list