[petsc-users] Convergence issues for SNES NASM

Thu May 12 15:03:30 CDT 2022

I'm still relatively new to PETSc. I was using DMDASetUniformCoordinates
and DMGetBoundingBox together. In hindsight, it was a very
unnecessary thing to do. I think the simplest way to prevent anyone else
from making the same mistake is to add a caveat to the DMGetBoundingBox
documentation page.

On Thu, May 12, 2022 at 2:02 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Thu, May 12, 2022 at 1:03 PM Takahashi, Tadanaga <tt73 at njit.edu> wrote:
>
>> Thank you for the feedback. We figured out what was causing the issue. We
>> were using DMGetBoundingBox
>> <https://petsc.org/main/docs/manualpages/DM/DMGetBoundingBox/> in order
>> to get the limits of the global domain, but gmin and gmax contained limits
>> for the local subdomains when we ran the code with NASM. Hence, our local
>> coordinates xi and yj were completely wrong. The documentation states
>> that DMGetBoundingBox gets the global limits. I believe this is a mistake.
>>
>
> I think I can explain this, and maybe you can tell us how to improve the
> documentation.
>
> I believe we make a new DM that comprises only the subdomain. Then the
> bounding box for this subdomain will only contain itself, not the original
> domain.
> Where should we say this?
>
>   Thanks,
>
>      Matt
>
>
>> This is our new output:
>> $ mpiexec -n 4 ./test1 -t1_N 20 -snes_max_it 50 -snes_monitor -snes_view
>> -da_overlap 3 -snes_type nasm -snes_nasm_type restrict
>>   0 SNES Function norm 7.244681057908e+02
>>   1 SNES Function norm 4.394913250889e+01
>>   2 SNES Function norm 1.823326663029e+01
>>   3 SNES Function norm 7.033938512358e+00
>>   4 SNES Function norm 2.797351504285e+00
>>   5 SNES Function norm 1.130613777736e+00
>>   6 SNES Function norm 4.605418417192e-01
>>   7 SNES Function norm 1.882307001920e-01
>>   8 SNES Function norm 7.704148683921e-02
>>   9 SNES Function norm 3.155090858782e-02
>>  10 SNES Function norm 1.292418188473e-02
>>  11 SNES Function norm 5.294645671797e-03
>>  12 SNES Function norm 2.169143207557e-03
>>  13 SNES Function norm 8.886826738192e-04
>>  14 SNES Function norm 3.640894847145e-04
>>  15 SNES Function norm 1.491663153414e-04
>>  16 SNES Function norm 6.111303899450e-05
>>  17 SNES Function norm 2.503785968501e-05
>>  18 SNES Function norm 1.025795062417e-05
>>  19 SNES Function norm 4.202657921479e-06
>> SNES Object: 4 MPI processes
>>   type: nasm
>>     total subdomain blocks = 4
>>     Local solver information for first block on rank 0:
>>     Use -snes_view ::ascii_info_detail to display information for all
>> blocks
>>     SNES Object: (sub_) 1 MPI processes
>>       type: newtonls
>>       maximum iterations=50, maximum function evaluations=10000
>>       tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>>       total number of linear solver iterations=2
>>       total number of function evaluations=3
>>       norm schedule ALWAYS
>>       Jacobian is built using a DMDA local Jacobian
>>       SNESLineSearch Object: (sub_) 1 MPI processes
>>         type: bt
>>           interpolation: cubic
>>           alpha=1.000000e-04
>>         maxstep=1.000000e+08, minlambda=1.000000e-12
>>         tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>> lambda=1.000000e-08
>>         maximum iterations=40
>>       KSP Object: (sub_) 1 MPI processes
>>         type: preonly
>>         maximum iterations=10000, initial guess is zero
>>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>         left preconditioning
>>         using NONE norm type for convergence test
>>       PC Object: (sub_) 1 MPI processes
>>         type: lu
>>           out-of-place factorization
>>           tolerance for zero pivot 2.22045e-14
>>           matrix ordering: nd
>>           factor fill ratio given 5., needed 2.13732
>>             Factored matrix follows:
>>               Mat Object: 1 MPI processes
>>                 type: seqaij
>>                 rows=169, cols=169
>>                 package used to perform factorization: petsc
>>                 total: nonzeros=13339, allocated nonzeros=13339
>>                   using I-node routines: found 104 nodes, limit used is 5
>>         linear system matrix = precond matrix:
>>         Mat Object: 1 MPI processes
>>           type: seqaij
>>           rows=169, cols=169
>>           total: nonzeros=6241, allocated nonzeros=6241
>>           total number of mallocs used during MatSetValues calls=0
>>             not using I-node routines
>>   maximum iterations=50, maximum function evaluations=10000
>>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>>   total number of function evaluations=20
>>   norm schedule ALWAYS
>>   Jacobian is built using a DMDA local Jacobian
>> problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:
>>   error |u-uexact|_inf = 2.879e-02, |u-uexact|_h = 1.707e-02
>>
>> On Thu, May 12, 2022 at 9:37 AM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> Your subdomain solves do not appear to be producing descent whatsoever.
>>> Possible reasons:
>>>
>>>   1) Your subdomain Jacobians are wrong (this is usually the problem)
>>>
>>>   2) You have some global coupling field for which local solves give no
>>> descent. (For this you want nonlinear elimination I think)
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>> On Thu, May 12, 2022 at 9:02 AM Takahashi, Tadanaga <tt73 at njit.edu>
>>> wrote:
>>>
>>>> I ran the code with the additional options but the raw output is about
>>>> 75,000 lines. I cannot paste it directly in the email. The output is in the
>>>> attached file.
>>>>
>>>> On Wed, May 11, 2022 at 11:44 PM Jed Brown <jed at jedbrown.org> wrote:
>>>>
>>>>> Can you add -snes_linesearch_monitor -sub_snes_linesearch_monitor
>>>>> -ksp_converged_reason and send the output??
>>>>>
>>>>> "Takahashi, Tadanaga" <tt73 at njit.edu> writes:
>>>>>
>>>>> > Hello,
>>>>> >
>>>>> > We are working on a finite difference solver for a 2D nonlinear PDE
>>>>> with
>>>>> > Dirichlet Boundary conditions on a rectangular domain. Our goal is
>>>>> to solve
>>>>> > the problem with parallel nonlinear additive Schwarz (NASM) as the
>>>>> outer
>>>>> > solver. Our code is similar to SNES example 5
>>>>> > <https://petsc.org/release/src/snes/tutorials/ex5.c.html>. In
>>>>> example 5,
>>>>> > the parallel NASM can be executed with a command like `mpiexec -n 4
>>>>> ./ex5
>>>>> > -mms 3 -snes_type nasm -snes_nasm_type restrict -da_overlap 2` which
>>>>> gives
>>>>> > a convergent result. We assume this is the correct usage. A comment
>>>>> in the
>>>>> > source code for NASM mentions that NASM should be a preconditioner
>>>>> but
>>>>> > there's no documentation on the usage. The Brune paper does not cover
>>>>> > parallel NASM either. We observed that increasing the overlap leads
>>>>> to
>>>>> > fewer Schwarz iterations. The parallelization works seamlessly for an
>>>>> > arbitrary number of subdomains. This is the type of behavior we were
>>>>> > expecting from our code.
>>>>> >
>>>>> > Our method uses box-style stencil width d = ceil(N^(1/3)) on a N by
>>>>> N DMDA.
>>>>> > The finite difference stencil consists of 4d+1 points spread out in a
>>>>> > diamond formation. If a stencil point is out of bounds, then it is
>>>>> > projected onto the boundary curve. Since the nodes on the boundary
>>>>> curve
>>>>> > would result in an irregular mesh, we chose not treat boundary nodes
>>>>> as
>>>>> > unknowns as in Example 5. We use DMDACreate2d to create the DA for
>>>>> the
>>>>> > interior points and DMDASNESSetFunctionLocal to associate the residue
>>>>> > function to the SNES object.
>>>>> >
>>>>> > Our code works serially. We have also tested our code
>>>>> > with Newton-Krylov-Schwarz (NKS) by running something akin to
>>>>> `mpiexec -n
>>>>> > <n> ./solve -snes_type newtonls`. We have tested the NKS for several
>>>>> > quantities of subdomains and overlap and the code works as expected.
>>>>> We
>>>>> > have some confidence in the correctness of our code. The overlapping
>>>>> NASM
>>>>> > was implemented in MATLAB so we know the method converges. However,
>>>>> the
>>>>> > parallel NASM will not converge with our PETSc code. We don't
>>>>> understand
>>>>> > why NKS works while NASM does not. The F-norm residue monotonically
>>>>> > decreases and then stagnates.
>>>>> >
>>>>> > Here is an example of the output when attempting to run NASM in
>>>>> parallel:
>>>>> > takahashi at ubuntu:~/Desktop/MA-DDM/Cpp/Rectangle$ mpiexec -n 4
>>>>> ./test1 -t1_N
>>>>> > 20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type
>>>>> nasm
>>>>> > -snes_nasm_type restrict
>>>>> >   0 SNES Function norm 7.244681057908e+02
>>>>> >   1 SNES Function norm 1.237688062971e+02
>>>>> >   2 SNES Function norm 1.068926073552e+02
>>>>> >   3 SNES Function norm 1.027563237834e+02
>>>>> >   4 SNES Function norm 1.022184806736e+02
>>>>> >   5 SNES Function norm 1.020818227640e+02
>>>>> >   6 SNES Function norm 1.020325629121e+02
>>>>> >   7 SNES Function norm 1.020149036595e+02
>>>>> >   8 SNES Function norm 1.020088110545e+02
>>>>> >   9 SNES Function norm 1.020067198030e+02
>>>>> >  10 SNES Function norm 1.020060034469e+02
>>>>> >  11 SNES Function norm 1.020057582380e+02
>>>>> >  12 SNES Function norm 1.020056743241e+02
>>>>> >  13 SNES Function norm 1.020056456101e+02
>>>>> >  14 SNES Function norm 1.020056357849e+02
>>>>> >  15 SNES Function norm 1.020056324231e+02
>>>>> >  16 SNES Function norm 1.020056312727e+02
>>>>> >  17 SNES Function norm 1.020056308791e+02
>>>>> >  18 SNES Function norm 1.020056307444e+02
>>>>> >  19 SNES Function norm 1.020056306983e+02
>>>>> >  20 SNES Function norm 1.020056306826e+02
>>>>> >  21 SNES Function norm 1.020056306772e+02
>>>>> >  22 SNES Function norm 1.020056306753e+02
>>>>> >  23 SNES Function norm 1.020056306747e+02
>>>>> >  24 SNES Function norm 1.020056306745e+02
>>>>> >  25 SNES Function norm 1.020056306744e+02
>>>>> >  26 SNES Function norm 1.020056306744e+02
>>>>> >  27 SNES Function norm 1.020056306744e+02
>>>>> >  28 SNES Function norm 1.020056306744e+02
>>>>> >  29 SNES Function norm 1.020056306744e+02
>>>>> >  30 SNES Function norm 1.020056306744e+02
>>>>> >  31 SNES Function norm 1.020056306744e+02
>>>>> >  32 SNES Function norm 1.020056306744e+02
>>>>> >  33 SNES Function norm 1.020056306744e+02
>>>>> >  34 SNES Function norm 1.020056306744e+02
>>>>> >  35 SNES Function norm 1.020056306744e+02
>>>>> >  36 SNES Function norm 1.020056306744e+02
>>>>> >  37 SNES Function norm 1.020056306744e+02
>>>>> >  38 SNES Function norm 1.020056306744e+02
>>>>> >  39 SNES Function norm 1.020056306744e+02
>>>>> >  40 SNES Function norm 1.020056306744e+02
>>>>> >  41 SNES Function norm 1.020056306744e+02
>>>>> >  42 SNES Function norm 1.020056306744e+02
>>>>> >  43 SNES Function norm 1.020056306744e+02
>>>>> >  44 SNES Function norm 1.020056306744e+02
>>>>> >  45 SNES Function norm 1.020056306744e+02
>>>>> >  46 SNES Function norm 1.020056306744e+02
>>>>> >  47 SNES Function norm 1.020056306744e+02
>>>>> >  48 SNES Function norm 1.020056306744e+02
>>>>> >  49 SNES Function norm 1.020056306744e+02
>>>>> >  50 SNES Function norm 1.020056306744e+02
>>>>> > SNES Object: 4 MPI processes
>>>>> >   type: nasm
>>>>> >     total subdomain blocks = 4
>>>>> >     Local solver information for first block on rank 0:
>>>>> >     Use -snes_view ::ascii_info_detail to display information for
>>>>> all blocks
>>>>> >     SNES Object: (sub_) 1 MPI processes
>>>>> >       type: newtonls
>>>>> >       maximum iterations=50, maximum function evaluations=10000
>>>>> >       tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>>>>> >       total number of linear solver iterations=22
>>>>> >       total number of function evaluations=40
>>>>> >       norm schedule ALWAYS
>>>>> >       Jacobian is built using a DMDA local Jacobian
>>>>> >       SNESLineSearch Object: (sub_) 1 MPI processes
>>>>> >         type: bt
>>>>> >           interpolation: cubic
>>>>> >           alpha=1.000000e-04
>>>>> >         maxstep=1.000000e+08, minlambda=1.000000e-12
>>>>> >         tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>>>> > lambda=1.000000e-08
>>>>> >         maximum iterations=40
>>>>> >       KSP Object: (sub_) 1 MPI processes
>>>>> >         type: preonly
>>>>> >         maximum iterations=10000, initial guess is zero
>>>>> >         tolerances:  relative=1e-05, absolute=1e-50,
>>>>> divergence=10000.
>>>>> >         left preconditioning
>>>>> >         using NONE norm type for convergence test
>>>>> >       PC Object: (sub_) 1 MPI processes
>>>>> >         type: lu
>>>>> >           out-of-place factorization
>>>>> >           tolerance for zero pivot 2.22045e-14
>>>>> >           matrix ordering: nd
>>>>> >           factor fill ratio given 5., needed 2.13732
>>>>> >             Factored matrix follows:
>>>>> >               Mat Object: 1 MPI processes
>>>>> >                 type: seqaij
>>>>> >                 rows=169, cols=169
>>>>> >                 package used to perform factorization: petsc
>>>>> >                 total: nonzeros=13339, allocated nonzeros=13339
>>>>> >                   using I-node routines: found 104 nodes, limit used
>>>>> is 5
>>>>> >         linear system matrix = precond matrix:
>>>>> >         Mat Object: 1 MPI processes
>>>>> >           type: seqaij
>>>>> >           rows=169, cols=169
>>>>> >           total: nonzeros=6241, allocated nonzeros=6241
>>>>> >           total number of mallocs used during MatSetValues calls=0
>>>>> >             not using I-node routines
>>>>> >   maximum iterations=50, maximum function evaluations=10000
>>>>> >   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>>>>> >   total number of function evaluations=51
>>>>> >   norm schedule ALWAYS
>>>>> >   Jacobian is built using a DMDA local Jacobian
>>>>> > problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:
>>>>> >   error |u-uexact|_inf = 3.996e-01, |u-uexact|_h = 2.837e-01
>>>>> >
>>>>> > We have been stuck on this for a while now. We do not know how to
>>>>> debug
>>>>> > this issue. Please let us know if you have any insights.
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220512/42e5931d/attachment-0001.html>