[petsc-users] Convergence issues for SNES NASM

Wed May 11 22:43:56 CDT 2022

Can you add -snes_linesearch_monitor -sub_snes_linesearch_monitor -ksp_converged_reason and send the output??

"Takahashi, Tadanaga" <tt73 at njit.edu> writes:

> Hello,
>
> We are working on a finite difference solver for a 2D nonlinear PDE with
> Dirichlet Boundary conditions on a rectangular domain. Our goal is to solve
> the problem with parallel nonlinear additive Schwarz (NASM) as the outer
> solver. Our code is similar to SNES example 5
> <https://petsc.org/release/src/snes/tutorials/ex5.c.html>. In example 5,
> the parallel NASM can be executed with a command like `mpiexec -n 4 ./ex5
> -mms 3 -snes_type nasm -snes_nasm_type restrict -da_overlap 2` which gives
> a convergent result. We assume this is the correct usage. A comment in the
> source code for NASM mentions that NASM should be a preconditioner but
> there's no documentation on the usage. The Brune paper does not cover
> parallel NASM either. We observed that increasing the overlap leads to
> fewer Schwarz iterations. The parallelization works seamlessly for an
> arbitrary number of subdomains. This is the type of behavior we were
> expecting from our code.
>
> Our method uses box-style stencil width d = ceil(N^(1/3)) on a N by N DMDA.
> The finite difference stencil consists of 4d+1 points spread out in a
> diamond formation. If a stencil point is out of bounds, then it is
> projected onto the boundary curve. Since the nodes on the boundary curve
> would result in an irregular mesh, we chose not treat boundary nodes as
> unknowns as in Example 5. We use DMDACreate2d to create the DA for the
> interior points and DMDASNESSetFunctionLocal to associate the residue
> function to the SNES object.
>
> Our code works serially. We have also tested our code
> with Newton-Krylov-Schwarz (NKS) by running something akin to `mpiexec -n
> <n> ./solve -snes_type newtonls`. We have tested the NKS for several
> quantities of subdomains and overlap and the code works as expected. We
> have some confidence in the correctness of our code. The overlapping NASM
> was implemented in MATLAB so we know the method converges. However, the
> parallel NASM will not converge with our PETSc code. We don't understand
> why NKS works while NASM does not. The F-norm residue monotonically
> decreases and then stagnates.
>
> Here is an example of the output when attempting to run NASM in parallel:
> takahashi at ubuntu:~/Desktop/MA-DDM/Cpp/Rectangle$ mpiexec -n 4 ./test1 -t1_N
> 20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type nasm
> -snes_nasm_type restrict
>   0 SNES Function norm 7.244681057908e+02
>   1 SNES Function norm 1.237688062971e+02
>   2 SNES Function norm 1.068926073552e+02
>   3 SNES Function norm 1.027563237834e+02
>   4 SNES Function norm 1.022184806736e+02
>   5 SNES Function norm 1.020818227640e+02
>   6 SNES Function norm 1.020325629121e+02
>   7 SNES Function norm 1.020149036595e+02
>   8 SNES Function norm 1.020088110545e+02
>   9 SNES Function norm 1.020067198030e+02
>  10 SNES Function norm 1.020060034469e+02
>  11 SNES Function norm 1.020057582380e+02
>  12 SNES Function norm 1.020056743241e+02
>  13 SNES Function norm 1.020056456101e+02
>  14 SNES Function norm 1.020056357849e+02
>  15 SNES Function norm 1.020056324231e+02
>  16 SNES Function norm 1.020056312727e+02
>  17 SNES Function norm 1.020056308791e+02
>  18 SNES Function norm 1.020056307444e+02
>  19 SNES Function norm 1.020056306983e+02
>  20 SNES Function norm 1.020056306826e+02
>  21 SNES Function norm 1.020056306772e+02
>  22 SNES Function norm 1.020056306753e+02
>  23 SNES Function norm 1.020056306747e+02
>  24 SNES Function norm 1.020056306745e+02
>  25 SNES Function norm 1.020056306744e+02
>  26 SNES Function norm 1.020056306744e+02
>  27 SNES Function norm 1.020056306744e+02
>  28 SNES Function norm 1.020056306744e+02
>  29 SNES Function norm 1.020056306744e+02
>  30 SNES Function norm 1.020056306744e+02
>  31 SNES Function norm 1.020056306744e+02
>  32 SNES Function norm 1.020056306744e+02
>  33 SNES Function norm 1.020056306744e+02
>  34 SNES Function norm 1.020056306744e+02
>  35 SNES Function norm 1.020056306744e+02
>  36 SNES Function norm 1.020056306744e+02
>  37 SNES Function norm 1.020056306744e+02
>  38 SNES Function norm 1.020056306744e+02
>  39 SNES Function norm 1.020056306744e+02
>  40 SNES Function norm 1.020056306744e+02
>  41 SNES Function norm 1.020056306744e+02
>  42 SNES Function norm 1.020056306744e+02
>  43 SNES Function norm 1.020056306744e+02
>  44 SNES Function norm 1.020056306744e+02
>  45 SNES Function norm 1.020056306744e+02
>  46 SNES Function norm 1.020056306744e+02
>  47 SNES Function norm 1.020056306744e+02
>  48 SNES Function norm 1.020056306744e+02
>  49 SNES Function norm 1.020056306744e+02
>  50 SNES Function norm 1.020056306744e+02
> SNES Object: 4 MPI processes
>   type: nasm
>     total subdomain blocks = 4
>     Local solver information for first block on rank 0:
>     Use -snes_view ::ascii_info_detail to display information for all blocks
>     SNES Object: (sub_) 1 MPI processes
>       type: newtonls
>       maximum iterations=50, maximum function evaluations=10000
>       tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>       total number of linear solver iterations=22
>       total number of function evaluations=40
>       norm schedule ALWAYS
>       Jacobian is built using a DMDA local Jacobian
>       SNESLineSearch Object: (sub_) 1 MPI processes
>         type: bt
>           interpolation: cubic
>           alpha=1.000000e-04
>         maxstep=1.000000e+08, minlambda=1.000000e-12
>         tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
>         maximum iterations=40
>       KSP Object: (sub_) 1 MPI processes
>         type: preonly
>         maximum iterations=10000, initial guess is zero
>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>         left preconditioning
>         using NONE norm type for convergence test
>       PC Object: (sub_) 1 MPI processes
>         type: lu
>           out-of-place factorization
>           tolerance for zero pivot 2.22045e-14
>           matrix ordering: nd
>           factor fill ratio given 5., needed 2.13732
>             Factored matrix follows:
>               Mat Object: 1 MPI processes
>                 type: seqaij
>                 rows=169, cols=169
>                 package used to perform factorization: petsc
>                 total: nonzeros=13339, allocated nonzeros=13339
>                   using I-node routines: found 104 nodes, limit used is 5
>         linear system matrix = precond matrix:
>         Mat Object: 1 MPI processes
>           type: seqaij
>           rows=169, cols=169
>           total: nonzeros=6241, allocated nonzeros=6241
>           total number of mallocs used during MatSetValues calls=0
>             not using I-node routines
>   maximum iterations=50, maximum function evaluations=10000
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of function evaluations=51
>   norm schedule ALWAYS
>   Jacobian is built using a DMDA local Jacobian
> problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:
>   error |u-uexact|_inf = 3.996e-01, |u-uexact|_h = 2.837e-01
>
> We have been stuck on this for a while now. We do not know how to debug
> this issue. Please let us know if you have any insights.