<div dir="ltr">I ran the code with the additional options but the raw output is about 75,000 lines. I cannot paste it directly in the email. The output is in the attached file. </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 11, 2022 at 11:44 PM Jed Brown <<a href="mailto:jed@jedbrown.org">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Can you add -snes_linesearch_monitor -sub_snes_linesearch_monitor -ksp_converged_reason and send the output??<br>
<br>
"Takahashi, Tadanaga" <<a href="mailto:tt73@njit.edu" target="_blank">tt73@njit.edu</a>> writes:<br>
<br>
> Hello,<br>
><br>
> We are working on a finite difference solver for a 2D nonlinear PDE with<br>
> Dirichlet Boundary conditions on a rectangular domain. Our goal is to solve<br>
> the problem with parallel nonlinear additive Schwarz (NASM) as the outer<br>
> solver. Our code is similar to SNES example 5<br>
> <<a href="https://petsc.org/release/src/snes/tutorials/ex5.c.html" rel="noreferrer" target="_blank">https://petsc.org/release/src/snes/tutorials/ex5.c.html</a>>. In example 5,<br>
> the parallel NASM can be executed with a command like `mpiexec -n 4 ./ex5<br>
> -mms 3 -snes_type nasm -snes_nasm_type restrict -da_overlap 2` which gives<br>
> a convergent result. We assume this is the correct usage. A comment in the<br>
> source code for NASM mentions that NASM should be a preconditioner but<br>
> there's no documentation on the usage. The Brune paper does not cover<br>
> parallel NASM either. We observed that increasing the overlap leads to<br>
> fewer Schwarz iterations. The parallelization works seamlessly for an<br>
> arbitrary number of subdomains. This is the type of behavior we were<br>
> expecting from our code.<br>
><br>
> Our method uses box-style stencil width d = ceil(N^(1/3)) on a N by N DMDA.<br>
> The finite difference stencil consists of 4d+1 points spread out in a<br>
> diamond formation. If a stencil point is out of bounds, then it is<br>
> projected onto the boundary curve. Since the nodes on the boundary curve<br>
> would result in an irregular mesh, we chose not treat boundary nodes as<br>
> unknowns as in Example 5. We use DMDACreate2d to create the DA for the<br>
> interior points and DMDASNESSetFunctionLocal to associate the residue<br>
> function to the SNES object.<br>
><br>
> Our code works serially. We have also tested our code<br>
> with Newton-Krylov-Schwarz (NKS) by running something akin to `mpiexec -n<br>
> <n> ./solve -snes_type newtonls`. We have tested the NKS for several<br>
> quantities of subdomains and overlap and the code works as expected. We<br>
> have some confidence in the correctness of our code. The overlapping NASM<br>
> was implemented in MATLAB so we know the method converges. However, the<br>
> parallel NASM will not converge with our PETSc code. We don't understand<br>
> why NKS works while NASM does not. The F-norm residue monotonically<br>
> decreases and then stagnates.<br>
><br>
> Here is an example of the output when attempting to run NASM in parallel:<br>
> takahashi@ubuntu:~/Desktop/MA-DDM/Cpp/Rectangle$ mpiexec -n 4 ./test1 -t1_N<br>
> 20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type nasm<br>
> -snes_nasm_type restrict<br>
> 0 SNES Function norm 7.244681057908e+02<br>
> 1 SNES Function norm 1.237688062971e+02<br>
> 2 SNES Function norm 1.068926073552e+02<br>
> 3 SNES Function norm 1.027563237834e+02<br>
> 4 SNES Function norm 1.022184806736e+02<br>
> 5 SNES Function norm 1.020818227640e+02<br>
> 6 SNES Function norm 1.020325629121e+02<br>
> 7 SNES Function norm 1.020149036595e+02<br>
> 8 SNES Function norm 1.020088110545e+02<br>
> 9 SNES Function norm 1.020067198030e+02<br>
> 10 SNES Function norm 1.020060034469e+02<br>
> 11 SNES Function norm 1.020057582380e+02<br>
> 12 SNES Function norm 1.020056743241e+02<br>
> 13 SNES Function norm 1.020056456101e+02<br>
> 14 SNES Function norm 1.020056357849e+02<br>
> 15 SNES Function norm 1.020056324231e+02<br>
> 16 SNES Function norm 1.020056312727e+02<br>
> 17 SNES Function norm 1.020056308791e+02<br>
> 18 SNES Function norm 1.020056307444e+02<br>
> 19 SNES Function norm 1.020056306983e+02<br>
> 20 SNES Function norm 1.020056306826e+02<br>
> 21 SNES Function norm 1.020056306772e+02<br>
> 22 SNES Function norm 1.020056306753e+02<br>
> 23 SNES Function norm 1.020056306747e+02<br>
> 24 SNES Function norm 1.020056306745e+02<br>
> 25 SNES Function norm 1.020056306744e+02<br>
> 26 SNES Function norm 1.020056306744e+02<br>
> 27 SNES Function norm 1.020056306744e+02<br>
> 28 SNES Function norm 1.020056306744e+02<br>
> 29 SNES Function norm 1.020056306744e+02<br>
> 30 SNES Function norm 1.020056306744e+02<br>
> 31 SNES Function norm 1.020056306744e+02<br>
> 32 SNES Function norm 1.020056306744e+02<br>
> 33 SNES Function norm 1.020056306744e+02<br>
> 34 SNES Function norm 1.020056306744e+02<br>
> 35 SNES Function norm 1.020056306744e+02<br>
> 36 SNES Function norm 1.020056306744e+02<br>
> 37 SNES Function norm 1.020056306744e+02<br>
> 38 SNES Function norm 1.020056306744e+02<br>
> 39 SNES Function norm 1.020056306744e+02<br>
> 40 SNES Function norm 1.020056306744e+02<br>
> 41 SNES Function norm 1.020056306744e+02<br>
> 42 SNES Function norm 1.020056306744e+02<br>
> 43 SNES Function norm 1.020056306744e+02<br>
> 44 SNES Function norm 1.020056306744e+02<br>
> 45 SNES Function norm 1.020056306744e+02<br>
> 46 SNES Function norm 1.020056306744e+02<br>
> 47 SNES Function norm 1.020056306744e+02<br>
> 48 SNES Function norm 1.020056306744e+02<br>
> 49 SNES Function norm 1.020056306744e+02<br>
> 50 SNES Function norm 1.020056306744e+02<br>
> SNES Object: 4 MPI processes<br>
> type: nasm<br>
> total subdomain blocks = 4<br>
> Local solver information for first block on rank 0:<br>
> Use -snes_view ::ascii_info_detail to display information for all blocks<br>
> SNES Object: (sub_) 1 MPI processes<br>
> type: newtonls<br>
> maximum iterations=50, maximum function evaluations=10000<br>
> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08<br>
> total number of linear solver iterations=22<br>
> total number of function evaluations=40<br>
> norm schedule ALWAYS<br>
> Jacobian is built using a DMDA local Jacobian<br>
> SNESLineSearch Object: (sub_) 1 MPI processes<br>
> type: bt<br>
> interpolation: cubic<br>
> alpha=1.000000e-04<br>
> maxstep=1.000000e+08, minlambda=1.000000e-12<br>
> tolerances: relative=1.000000e-08, absolute=1.000000e-15,<br>
> lambda=1.000000e-08<br>
> maximum iterations=40<br>
> KSP Object: (sub_) 1 MPI processes<br>
> type: preonly<br>
> maximum iterations=10000, initial guess is zero<br>
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br>
> left preconditioning<br>
> using NONE norm type for convergence test<br>
> PC Object: (sub_) 1 MPI processes<br>
> type: lu<br>
> out-of-place factorization<br>
> tolerance for zero pivot 2.22045e-14<br>
> matrix ordering: nd<br>
> factor fill ratio given 5., needed 2.13732<br>
> Factored matrix follows:<br>
> Mat Object: 1 MPI processes<br>
> type: seqaij<br>
> rows=169, cols=169<br>
> package used to perform factorization: petsc<br>
> total: nonzeros=13339, allocated nonzeros=13339<br>
> using I-node routines: found 104 nodes, limit used is 5<br>
> linear system matrix = precond matrix:<br>
> Mat Object: 1 MPI processes<br>
> type: seqaij<br>
> rows=169, cols=169<br>
> total: nonzeros=6241, allocated nonzeros=6241<br>
> total number of mallocs used during MatSetValues calls=0<br>
> not using I-node routines<br>
> maximum iterations=50, maximum function evaluations=10000<br>
> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08<br>
> total number of function evaluations=51<br>
> norm schedule ALWAYS<br>
> Jacobian is built using a DMDA local Jacobian<br>
> problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:<br>
> error |u-uexact|_inf = 3.996e-01, |u-uexact|_h = 2.837e-01<br>
><br>
> We have been stuck on this for a while now. We do not know how to debug<br>
> this issue. Please let us know if you have any insights.<br>
</blockquote></div>