<div dir="ltr">Thank you for the feedback. We figured out what was causing the issue. We were using <a href="https://petsc.org/main/docs/manualpages/DM/DMGetBoundingBox/">DMGetBoundingBox</a> in order to get the limits of the global domain, but gmin and gmax contained limits for the local subdomains when we ran the code with NASM. Hence, our local coordinates xi and yj were completely wrong. The documentation states that DMGetBoundingBox gets the global limits. I believe this is a mistake. <div><br></div><div>This is our new output: </div><div><font face="monospace">$ mpiexec -n 4 ./test1 -t1_N 20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type nasm -snes_nasm_type restrict </font></div><div><font face="monospace"> 0 SNES Function norm 7.244681057908e+02 <br> 1 SNES Function norm 4.394913250889e+01 <br> 2 SNES Function norm 1.823326663029e+01 <br> 3 SNES Function norm 7.033938512358e+00 <br> 4 SNES Function norm 2.797351504285e+00 <br> 5 SNES Function norm 1.130613777736e+00 <br> 6 SNES Function norm 4.605418417192e-01 <br> 7 SNES Function norm 1.882307001920e-01 <br> 8 SNES Function norm 7.704148683921e-02 <br> 9 SNES Function norm 3.155090858782e-02 <br> 10 SNES Function norm 1.292418188473e-02 <br> 11 SNES Function norm 5.294645671797e-03 <br> 12 SNES Function norm 2.169143207557e-03 <br> 13 SNES Function norm 8.886826738192e-04 <br> 14 SNES Function norm 3.640894847145e-04 <br> 15 SNES Function norm 1.491663153414e-04 <br> 16 SNES Function norm 6.111303899450e-05 <br> 17 SNES Function norm 2.503785968501e-05 <br> 18 SNES Function norm 1.025795062417e-05 <br> 19 SNES Function norm 4.202657921479e-06 <br>SNES Object: 4 MPI processes<br> type: nasm<br> total subdomain blocks = 4<br> Local solver information for first block on rank 0:<br> Use -snes_view ::ascii_info_detail to display information for all blocks<br> SNES Object: (sub_) 1 MPI processes<br> type: newtonls<br> maximum iterations=50, maximum function evaluations=10000<br> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08<br> total number of linear solver iterations=2<br> total number of function evaluations=3<br> norm schedule ALWAYS<br> Jacobian is built using a DMDA local Jacobian<br> SNESLineSearch Object: (sub_) 1 MPI processes<br> type: bt<br> interpolation: cubic<br> alpha=1.000000e-04<br> maxstep=1.000000e+08, minlambda=1.000000e-12<br> tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08<br> maximum iterations=40<br> KSP Object: (sub_) 1 MPI processes<br> type: preonly<br> maximum iterations=10000, initial guess is zero<br> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br> left preconditioning<br> using NONE norm type for convergence test<br> PC Object: (sub_) 1 MPI processes<br> type: lu<br> out-of-place factorization<br> tolerance for zero pivot 2.22045e-14<br> matrix ordering: nd<br> factor fill ratio given 5., needed 2.13732<br> Factored matrix follows:<br> Mat Object: 1 MPI processes<br> type: seqaij<br> rows=169, cols=169<br> package used to perform factorization: petsc<br> total: nonzeros=13339, allocated nonzeros=13339<br> using I-node routines: found 104 nodes, limit used is 5<br> linear system matrix = precond matrix:<br> Mat Object: 1 MPI processes<br> type: seqaij<br> rows=169, cols=169<br> total: nonzeros=6241, allocated nonzeros=6241<br> total number of mallocs used during MatSetValues calls=0<br> not using I-node routines<br> maximum iterations=50, maximum function evaluations=10000<br> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08<br> total number of function evaluations=20<br> norm schedule ALWAYS<br> Jacobian is built using a DMDA local Jacobian<br>problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:<br> error |u-uexact|_inf = 2.879e-02, |u-uexact|_h = 1.707e-02</font><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, May 12, 2022 at 9:37 AM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Your subdomain solves do not appear to be producing descent whatsoever. Possible reasons:<div><br></div><div> 1) Your subdomain Jacobians are wrong (this is usually the problem)</div><div><br></div><div> 2) You have some global coupling field for which local solves give no descent. (For this you want nonlinear elimination I think)</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, May 12, 2022 at 9:02 AM Takahashi, Tadanaga <<a href="mailto:tt73@njit.edu" target="_blank">tt73@njit.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I ran the code with the additional options but the raw output is about 75,000 lines. I cannot paste it directly in the email. The output is in the attached file. </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 11, 2022 at 11:44 PM Jed Brown <<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Can you add -snes_linesearch_monitor -sub_snes_linesearch_monitor -ksp_converged_reason and send the output??<br>
<br>
"Takahashi, Tadanaga" <<a href="mailto:tt73@njit.edu" target="_blank">tt73@njit.edu</a>> writes:<br>
<br>
> Hello,<br>
><br>
> We are working on a finite difference solver for a 2D nonlinear PDE with<br>
> Dirichlet Boundary conditions on a rectangular domain. Our goal is to solve<br>
> the problem with parallel nonlinear additive Schwarz (NASM) as the outer<br>
> solver. Our code is similar to SNES example 5<br>
> <<a href="https://petsc.org/release/src/snes/tutorials/ex5.c.html" rel="noreferrer" target="_blank">https://petsc.org/release/src/snes/tutorials/ex5.c.html</a>>. In example 5,<br>
> the parallel NASM can be executed with a command like `mpiexec -n 4 ./ex5<br>
> -mms 3 -snes_type nasm -snes_nasm_type restrict -da_overlap 2` which gives<br>
> a convergent result. We assume this is the correct usage. A comment in the<br>
> source code for NASM mentions that NASM should be a preconditioner but<br>
> there's no documentation on the usage. The Brune paper does not cover<br>
> parallel NASM either. We observed that increasing the overlap leads to<br>
> fewer Schwarz iterations. The parallelization works seamlessly for an<br>
> arbitrary number of subdomains. This is the type of behavior we were<br>
> expecting from our code.<br>
><br>
> Our method uses box-style stencil width d = ceil(N^(1/3)) on a N by N DMDA.<br>
> The finite difference stencil consists of 4d+1 points spread out in a<br>
> diamond formation. If a stencil point is out of bounds, then it is<br>
> projected onto the boundary curve. Since the nodes on the boundary curve<br>
> would result in an irregular mesh, we chose not treat boundary nodes as<br>
> unknowns as in Example 5. We use DMDACreate2d to create the DA for the<br>
> interior points and DMDASNESSetFunctionLocal to associate the residue<br>
> function to the SNES object.<br>
><br>
> Our code works serially. We have also tested our code<br>
> with Newton-Krylov-Schwarz (NKS) by running something akin to `mpiexec -n<br>
> <n> ./solve -snes_type newtonls`. We have tested the NKS for several<br>
> quantities of subdomains and overlap and the code works as expected. We<br>
> have some confidence in the correctness of our code. The overlapping NASM<br>
> was implemented in MATLAB so we know the method converges. However, the<br>
> parallel NASM will not converge with our PETSc code. We don't understand<br>
> why NKS works while NASM does not. The F-norm residue monotonically<br>
> decreases and then stagnates.<br>
><br>
> Here is an example of the output when attempting to run NASM in parallel:<br>
> takahashi@ubuntu:~/Desktop/MA-DDM/Cpp/Rectangle$ mpiexec -n 4 ./test1 -t1_N<br>
> 20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type nasm<br>
> -snes_nasm_type restrict<br>
> 0 SNES Function norm 7.244681057908e+02<br>
> 1 SNES Function norm 1.237688062971e+02<br>
> 2 SNES Function norm 1.068926073552e+02<br>
> 3 SNES Function norm 1.027563237834e+02<br>
> 4 SNES Function norm 1.022184806736e+02<br>
> 5 SNES Function norm 1.020818227640e+02<br>
> 6 SNES Function norm 1.020325629121e+02<br>
> 7 SNES Function norm 1.020149036595e+02<br>
> 8 SNES Function norm 1.020088110545e+02<br>
> 9 SNES Function norm 1.020067198030e+02<br>
> 10 SNES Function norm 1.020060034469e+02<br>
> 11 SNES Function norm 1.020057582380e+02<br>
> 12 SNES Function norm 1.020056743241e+02<br>
> 13 SNES Function norm 1.020056456101e+02<br>
> 14 SNES Function norm 1.020056357849e+02<br>
> 15 SNES Function norm 1.020056324231e+02<br>
> 16 SNES Function norm 1.020056312727e+02<br>
> 17 SNES Function norm 1.020056308791e+02<br>
> 18 SNES Function norm 1.020056307444e+02<br>
> 19 SNES Function norm 1.020056306983e+02<br>
> 20 SNES Function norm 1.020056306826e+02<br>
> 21 SNES Function norm 1.020056306772e+02<br>
> 22 SNES Function norm 1.020056306753e+02<br>
> 23 SNES Function norm 1.020056306747e+02<br>
> 24 SNES Function norm 1.020056306745e+02<br>
> 25 SNES Function norm 1.020056306744e+02<br>
> 26 SNES Function norm 1.020056306744e+02<br>
> 27 SNES Function norm 1.020056306744e+02<br>
> 28 SNES Function norm 1.020056306744e+02<br>
> 29 SNES Function norm 1.020056306744e+02<br>
> 30 SNES Function norm 1.020056306744e+02<br>
> 31 SNES Function norm 1.020056306744e+02<br>
> 32 SNES Function norm 1.020056306744e+02<br>
> 33 SNES Function norm 1.020056306744e+02<br>
> 34 SNES Function norm 1.020056306744e+02<br>
> 35 SNES Function norm 1.020056306744e+02<br>
> 36 SNES Function norm 1.020056306744e+02<br>
> 37 SNES Function norm 1.020056306744e+02<br>
> 38 SNES Function norm 1.020056306744e+02<br>
> 39 SNES Function norm 1.020056306744e+02<br>
> 40 SNES Function norm 1.020056306744e+02<br>
> 41 SNES Function norm 1.020056306744e+02<br>
> 42 SNES Function norm 1.020056306744e+02<br>
> 43 SNES Function norm 1.020056306744e+02<br>
> 44 SNES Function norm 1.020056306744e+02<br>
> 45 SNES Function norm 1.020056306744e+02<br>
> 46 SNES Function norm 1.020056306744e+02<br>
> 47 SNES Function norm 1.020056306744e+02<br>
> 48 SNES Function norm 1.020056306744e+02<br>
> 49 SNES Function norm 1.020056306744e+02<br>
> 50 SNES Function norm 1.020056306744e+02<br>
> SNES Object: 4 MPI processes<br>
> type: nasm<br>
> total subdomain blocks = 4<br>
> Local solver information for first block on rank 0:<br>
> Use -snes_view ::ascii_info_detail to display information for all blocks<br>
> SNES Object: (sub_) 1 MPI processes<br>
> type: newtonls<br>
> maximum iterations=50, maximum function evaluations=10000<br>
> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08<br>
> total number of linear solver iterations=22<br>
> total number of function evaluations=40<br>
> norm schedule ALWAYS<br>
> Jacobian is built using a DMDA local Jacobian<br>
> SNESLineSearch Object: (sub_) 1 MPI processes<br>
> type: bt<br>
> interpolation: cubic<br>
> alpha=1.000000e-04<br>
> maxstep=1.000000e+08, minlambda=1.000000e-12<br>
> tolerances: relative=1.000000e-08, absolute=1.000000e-15,<br>
> lambda=1.000000e-08<br>
> maximum iterations=40<br>
> KSP Object: (sub_) 1 MPI processes<br>
> type: preonly<br>
> maximum iterations=10000, initial guess is zero<br>
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.<br>
> left preconditioning<br>
> using NONE norm type for convergence test<br>
> PC Object: (sub_) 1 MPI processes<br>
> type: lu<br>
> out-of-place factorization<br>
> tolerance for zero pivot 2.22045e-14<br>
> matrix ordering: nd<br>
> factor fill ratio given 5., needed 2.13732<br>
> Factored matrix follows:<br>
> Mat Object: 1 MPI processes<br>
> type: seqaij<br>
> rows=169, cols=169<br>
> package used to perform factorization: petsc<br>
> total: nonzeros=13339, allocated nonzeros=13339<br>
> using I-node routines: found 104 nodes, limit used is 5<br>
> linear system matrix = precond matrix:<br>
> Mat Object: 1 MPI processes<br>
> type: seqaij<br>
> rows=169, cols=169<br>
> total: nonzeros=6241, allocated nonzeros=6241<br>
> total number of mallocs used during MatSetValues calls=0<br>
> not using I-node routines<br>
> maximum iterations=50, maximum function evaluations=10000<br>
> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08<br>
> total number of function evaluations=51<br>
> norm schedule ALWAYS<br>
> Jacobian is built using a DMDA local Jacobian<br>
> problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:<br>
> error |u-uexact|_inf = 3.996e-01, |u-uexact|_h = 2.837e-01<br>
><br>
> We have been stuck on this for a while now. We do not know how to debug<br>
> this issue. Please let us know if you have any insights.<br>
</blockquote></div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div><div><br></div><div><a href="http://www.cse.buffalo.edu/~knepley/" target="_blank">https://www.cse.buffalo.edu/~knepley/</a><br></div></div></div></div></div></div></div>
</blockquote></div>