[petsc-users] Convergence issues for SNES NASM
Matthew Knepley
knepley at gmail.com
Thu May 12 13:02:13 CDT 2022
On Thu, May 12, 2022 at 1:03 PM Takahashi, Tadanaga <tt73 at njit.edu> wrote:
> Thank you for the feedback. We figured out what was causing the issue. We
> were using DMGetBoundingBox
> <https://petsc.org/main/docs/manualpages/DM/DMGetBoundingBox/> in order
> to get the limits of the global domain, but gmin and gmax contained limits
> for the local subdomains when we ran the code with NASM. Hence, our local
> coordinates xi and yj were completely wrong. The documentation states
> that DMGetBoundingBox gets the global limits. I believe this is a mistake.
>
I think I can explain this, and maybe you can tell us how to improve the
documentation.
I believe we make a new DM that comprises only the subdomain. Then the
bounding box for this subdomain will only contain itself, not the original
domain.
Where should we say this?
Thanks,
Matt
> This is our new output:
> $ mpiexec -n 4 ./test1 -t1_N 20 -snes_max_it 50 -snes_monitor -snes_view
> -da_overlap 3 -snes_type nasm -snes_nasm_type restrict
> 0 SNES Function norm 7.244681057908e+02
> 1 SNES Function norm 4.394913250889e+01
> 2 SNES Function norm 1.823326663029e+01
> 3 SNES Function norm 7.033938512358e+00
> 4 SNES Function norm 2.797351504285e+00
> 5 SNES Function norm 1.130613777736e+00
> 6 SNES Function norm 4.605418417192e-01
> 7 SNES Function norm 1.882307001920e-01
> 8 SNES Function norm 7.704148683921e-02
> 9 SNES Function norm 3.155090858782e-02
> 10 SNES Function norm 1.292418188473e-02
> 11 SNES Function norm 5.294645671797e-03
> 12 SNES Function norm 2.169143207557e-03
> 13 SNES Function norm 8.886826738192e-04
> 14 SNES Function norm 3.640894847145e-04
> 15 SNES Function norm 1.491663153414e-04
> 16 SNES Function norm 6.111303899450e-05
> 17 SNES Function norm 2.503785968501e-05
> 18 SNES Function norm 1.025795062417e-05
> 19 SNES Function norm 4.202657921479e-06
> SNES Object: 4 MPI processes
> type: nasm
> total subdomain blocks = 4
> Local solver information for first block on rank 0:
> Use -snes_view ::ascii_info_detail to display information for all
> blocks
> SNES Object: (sub_) 1 MPI processes
> type: newtonls
> maximum iterations=50, maximum function evaluations=10000
> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
> total number of linear solver iterations=2
> total number of function evaluations=3
> norm schedule ALWAYS
> Jacobian is built using a DMDA local Jacobian
> SNESLineSearch Object: (sub_) 1 MPI processes
> type: bt
> interpolation: cubic
> alpha=1.000000e-04
> maxstep=1.000000e+08, minlambda=1.000000e-12
> tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
> maximum iterations=40
> KSP Object: (sub_) 1 MPI processes
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> using NONE norm type for convergence test
> PC Object: (sub_) 1 MPI processes
> type: lu
> out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: nd
> factor fill ratio given 5., needed 2.13732
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=169, cols=169
> package used to perform factorization: petsc
> total: nonzeros=13339, allocated nonzeros=13339
> using I-node routines: found 104 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=169, cols=169
> total: nonzeros=6241, allocated nonzeros=6241
> total number of mallocs used during MatSetValues calls=0
> not using I-node routines
> maximum iterations=50, maximum function evaluations=10000
> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
> total number of function evaluations=20
> norm schedule ALWAYS
> Jacobian is built using a DMDA local Jacobian
> problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:
> error |u-uexact|_inf = 2.879e-02, |u-uexact|_h = 1.707e-02
>
> On Thu, May 12, 2022 at 9:37 AM Matthew Knepley <knepley at gmail.com> wrote:
>
>> Your subdomain solves do not appear to be producing descent whatsoever.
>> Possible reasons:
>>
>> 1) Your subdomain Jacobians are wrong (this is usually the problem)
>>
>> 2) You have some global coupling field for which local solves give no
>> descent. (For this you want nonlinear elimination I think)
>>
>> Thanks,
>>
>> Matt
>>
>> On Thu, May 12, 2022 at 9:02 AM Takahashi, Tadanaga <tt73 at njit.edu>
>> wrote:
>>
>>> I ran the code with the additional options but the raw output is about
>>> 75,000 lines. I cannot paste it directly in the email. The output is in the
>>> attached file.
>>>
>>> On Wed, May 11, 2022 at 11:44 PM Jed Brown <jed at jedbrown.org> wrote:
>>>
>>>> Can you add -snes_linesearch_monitor -sub_snes_linesearch_monitor
>>>> -ksp_converged_reason and send the output??
>>>>
>>>> "Takahashi, Tadanaga" <tt73 at njit.edu> writes:
>>>>
>>>> > Hello,
>>>> >
>>>> > We are working on a finite difference solver for a 2D nonlinear PDE
>>>> with
>>>> > Dirichlet Boundary conditions on a rectangular domain. Our goal is to
>>>> solve
>>>> > the problem with parallel nonlinear additive Schwarz (NASM) as the
>>>> outer
>>>> > solver. Our code is similar to SNES example 5
>>>> > <https://petsc.org/release/src/snes/tutorials/ex5.c.html>. In
>>>> example 5,
>>>> > the parallel NASM can be executed with a command like `mpiexec -n 4
>>>> ./ex5
>>>> > -mms 3 -snes_type nasm -snes_nasm_type restrict -da_overlap 2` which
>>>> gives
>>>> > a convergent result. We assume this is the correct usage. A comment
>>>> in the
>>>> > source code for NASM mentions that NASM should be a preconditioner but
>>>> > there's no documentation on the usage. The Brune paper does not cover
>>>> > parallel NASM either. We observed that increasing the overlap leads to
>>>> > fewer Schwarz iterations. The parallelization works seamlessly for an
>>>> > arbitrary number of subdomains. This is the type of behavior we were
>>>> > expecting from our code.
>>>> >
>>>> > Our method uses box-style stencil width d = ceil(N^(1/3)) on a N by N
>>>> DMDA.
>>>> > The finite difference stencil consists of 4d+1 points spread out in a
>>>> > diamond formation. If a stencil point is out of bounds, then it is
>>>> > projected onto the boundary curve. Since the nodes on the boundary
>>>> curve
>>>> > would result in an irregular mesh, we chose not treat boundary nodes
>>>> as
>>>> > unknowns as in Example 5. We use DMDACreate2d to create the DA for the
>>>> > interior points and DMDASNESSetFunctionLocal to associate the residue
>>>> > function to the SNES object.
>>>> >
>>>> > Our code works serially. We have also tested our code
>>>> > with Newton-Krylov-Schwarz (NKS) by running something akin to
>>>> `mpiexec -n
>>>> > <n> ./solve -snes_type newtonls`. We have tested the NKS for several
>>>> > quantities of subdomains and overlap and the code works as expected.
>>>> We
>>>> > have some confidence in the correctness of our code. The overlapping
>>>> NASM
>>>> > was implemented in MATLAB so we know the method converges. However,
>>>> the
>>>> > parallel NASM will not converge with our PETSc code. We don't
>>>> understand
>>>> > why NKS works while NASM does not. The F-norm residue monotonically
>>>> > decreases and then stagnates.
>>>> >
>>>> > Here is an example of the output when attempting to run NASM in
>>>> parallel:
>>>> > takahashi at ubuntu:~/Desktop/MA-DDM/Cpp/Rectangle$ mpiexec -n 4
>>>> ./test1 -t1_N
>>>> > 20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type
>>>> nasm
>>>> > -snes_nasm_type restrict
>>>> > 0 SNES Function norm 7.244681057908e+02
>>>> > 1 SNES Function norm 1.237688062971e+02
>>>> > 2 SNES Function norm 1.068926073552e+02
>>>> > 3 SNES Function norm 1.027563237834e+02
>>>> > 4 SNES Function norm 1.022184806736e+02
>>>> > 5 SNES Function norm 1.020818227640e+02
>>>> > 6 SNES Function norm 1.020325629121e+02
>>>> > 7 SNES Function norm 1.020149036595e+02
>>>> > 8 SNES Function norm 1.020088110545e+02
>>>> > 9 SNES Function norm 1.020067198030e+02
>>>> > 10 SNES Function norm 1.020060034469e+02
>>>> > 11 SNES Function norm 1.020057582380e+02
>>>> > 12 SNES Function norm 1.020056743241e+02
>>>> > 13 SNES Function norm 1.020056456101e+02
>>>> > 14 SNES Function norm 1.020056357849e+02
>>>> > 15 SNES Function norm 1.020056324231e+02
>>>> > 16 SNES Function norm 1.020056312727e+02
>>>> > 17 SNES Function norm 1.020056308791e+02
>>>> > 18 SNES Function norm 1.020056307444e+02
>>>> > 19 SNES Function norm 1.020056306983e+02
>>>> > 20 SNES Function norm 1.020056306826e+02
>>>> > 21 SNES Function norm 1.020056306772e+02
>>>> > 22 SNES Function norm 1.020056306753e+02
>>>> > 23 SNES Function norm 1.020056306747e+02
>>>> > 24 SNES Function norm 1.020056306745e+02
>>>> > 25 SNES Function norm 1.020056306744e+02
>>>> > 26 SNES Function norm 1.020056306744e+02
>>>> > 27 SNES Function norm 1.020056306744e+02
>>>> > 28 SNES Function norm 1.020056306744e+02
>>>> > 29 SNES Function norm 1.020056306744e+02
>>>> > 30 SNES Function norm 1.020056306744e+02
>>>> > 31 SNES Function norm 1.020056306744e+02
>>>> > 32 SNES Function norm 1.020056306744e+02
>>>> > 33 SNES Function norm 1.020056306744e+02
>>>> > 34 SNES Function norm 1.020056306744e+02
>>>> > 35 SNES Function norm 1.020056306744e+02
>>>> > 36 SNES Function norm 1.020056306744e+02
>>>> > 37 SNES Function norm 1.020056306744e+02
>>>> > 38 SNES Function norm 1.020056306744e+02
>>>> > 39 SNES Function norm 1.020056306744e+02
>>>> > 40 SNES Function norm 1.020056306744e+02
>>>> > 41 SNES Function norm 1.020056306744e+02
>>>> > 42 SNES Function norm 1.020056306744e+02
>>>> > 43 SNES Function norm 1.020056306744e+02
>>>> > 44 SNES Function norm 1.020056306744e+02
>>>> > 45 SNES Function norm 1.020056306744e+02
>>>> > 46 SNES Function norm 1.020056306744e+02
>>>> > 47 SNES Function norm 1.020056306744e+02
>>>> > 48 SNES Function norm 1.020056306744e+02
>>>> > 49 SNES Function norm 1.020056306744e+02
>>>> > 50 SNES Function norm 1.020056306744e+02
>>>> > SNES Object: 4 MPI processes
>>>> > type: nasm
>>>> > total subdomain blocks = 4
>>>> > Local solver information for first block on rank 0:
>>>> > Use -snes_view ::ascii_info_detail to display information for all
>>>> blocks
>>>> > SNES Object: (sub_) 1 MPI processes
>>>> > type: newtonls
>>>> > maximum iterations=50, maximum function evaluations=10000
>>>> > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>>>> > total number of linear solver iterations=22
>>>> > total number of function evaluations=40
>>>> > norm schedule ALWAYS
>>>> > Jacobian is built using a DMDA local Jacobian
>>>> > SNESLineSearch Object: (sub_) 1 MPI processes
>>>> > type: bt
>>>> > interpolation: cubic
>>>> > alpha=1.000000e-04
>>>> > maxstep=1.000000e+08, minlambda=1.000000e-12
>>>> > tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>>> > lambda=1.000000e-08
>>>> > maximum iterations=40
>>>> > KSP Object: (sub_) 1 MPI processes
>>>> > type: preonly
>>>> > maximum iterations=10000, initial guess is zero
>>>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>>> > left preconditioning
>>>> > using NONE norm type for convergence test
>>>> > PC Object: (sub_) 1 MPI processes
>>>> > type: lu
>>>> > out-of-place factorization
>>>> > tolerance for zero pivot 2.22045e-14
>>>> > matrix ordering: nd
>>>> > factor fill ratio given 5., needed 2.13732
>>>> > Factored matrix follows:
>>>> > Mat Object: 1 MPI processes
>>>> > type: seqaij
>>>> > rows=169, cols=169
>>>> > package used to perform factorization: petsc
>>>> > total: nonzeros=13339, allocated nonzeros=13339
>>>> > using I-node routines: found 104 nodes, limit used
>>>> is 5
>>>> > linear system matrix = precond matrix:
>>>> > Mat Object: 1 MPI processes
>>>> > type: seqaij
>>>> > rows=169, cols=169
>>>> > total: nonzeros=6241, allocated nonzeros=6241
>>>> > total number of mallocs used during MatSetValues calls=0
>>>> > not using I-node routines
>>>> > maximum iterations=50, maximum function evaluations=10000
>>>> > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>>>> > total number of function evaluations=51
>>>> > norm schedule ALWAYS
>>>> > Jacobian is built using a DMDA local Jacobian
>>>> > problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:
>>>> > error |u-uexact|_inf = 3.996e-01, |u-uexact|_h = 2.837e-01
>>>> >
>>>> > We have been stuck on this for a while now. We do not know how to
>>>> debug
>>>> > this issue. Please let us know if you have any insights.
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220512/a319caaf/attachment-0001.html>
More information about the petsc-users
mailing list