[petsc-users] Convergence issues for SNES NASM

Matthew Knepley knepley at gmail.com
Thu May 12 08:36:56 CDT 2022


Your subdomain solves do not appear to be producing descent whatsoever.
Possible reasons:

  1) Your subdomain Jacobians are wrong (this is usually the problem)

  2) You have some global coupling field for which local solves give no
descent. (For this you want nonlinear elimination I think)

  Thanks,

     Matt

On Thu, May 12, 2022 at 9:02 AM Takahashi, Tadanaga <tt73 at njit.edu> wrote:

> I ran the code with the additional options but the raw output is about
> 75,000 lines. I cannot paste it directly in the email. The output is in the
> attached file.
>
> On Wed, May 11, 2022 at 11:44 PM Jed Brown <jed at jedbrown.org> wrote:
>
>> Can you add -snes_linesearch_monitor -sub_snes_linesearch_monitor
>> -ksp_converged_reason and send the output??
>>
>> "Takahashi, Tadanaga" <tt73 at njit.edu> writes:
>>
>> > Hello,
>> >
>> > We are working on a finite difference solver for a 2D nonlinear PDE with
>> > Dirichlet Boundary conditions on a rectangular domain. Our goal is to
>> solve
>> > the problem with parallel nonlinear additive Schwarz (NASM) as the outer
>> > solver. Our code is similar to SNES example 5
>> > <https://petsc.org/release/src/snes/tutorials/ex5.c.html>. In example
>> 5,
>> > the parallel NASM can be executed with a command like `mpiexec -n 4
>> ./ex5
>> > -mms 3 -snes_type nasm -snes_nasm_type restrict -da_overlap 2` which
>> gives
>> > a convergent result. We assume this is the correct usage. A comment in
>> the
>> > source code for NASM mentions that NASM should be a preconditioner but
>> > there's no documentation on the usage. The Brune paper does not cover
>> > parallel NASM either. We observed that increasing the overlap leads to
>> > fewer Schwarz iterations. The parallelization works seamlessly for an
>> > arbitrary number of subdomains. This is the type of behavior we were
>> > expecting from our code.
>> >
>> > Our method uses box-style stencil width d = ceil(N^(1/3)) on a N by N
>> DMDA.
>> > The finite difference stencil consists of 4d+1 points spread out in a
>> > diamond formation. If a stencil point is out of bounds, then it is
>> > projected onto the boundary curve. Since the nodes on the boundary curve
>> > would result in an irregular mesh, we chose not treat boundary nodes as
>> > unknowns as in Example 5. We use DMDACreate2d to create the DA for the
>> > interior points and DMDASNESSetFunctionLocal to associate the residue
>> > function to the SNES object.
>> >
>> > Our code works serially. We have also tested our code
>> > with Newton-Krylov-Schwarz (NKS) by running something akin to `mpiexec
>> -n
>> > <n> ./solve -snes_type newtonls`. We have tested the NKS for several
>> > quantities of subdomains and overlap and the code works as expected. We
>> > have some confidence in the correctness of our code. The overlapping
>> NASM
>> > was implemented in MATLAB so we know the method converges. However, the
>> > parallel NASM will not converge with our PETSc code. We don't understand
>> > why NKS works while NASM does not. The F-norm residue monotonically
>> > decreases and then stagnates.
>> >
>> > Here is an example of the output when attempting to run NASM in
>> parallel:
>> > takahashi at ubuntu:~/Desktop/MA-DDM/Cpp/Rectangle$ mpiexec -n 4 ./test1
>> -t1_N
>> > 20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type
>> nasm
>> > -snes_nasm_type restrict
>> >   0 SNES Function norm 7.244681057908e+02
>> >   1 SNES Function norm 1.237688062971e+02
>> >   2 SNES Function norm 1.068926073552e+02
>> >   3 SNES Function norm 1.027563237834e+02
>> >   4 SNES Function norm 1.022184806736e+02
>> >   5 SNES Function norm 1.020818227640e+02
>> >   6 SNES Function norm 1.020325629121e+02
>> >   7 SNES Function norm 1.020149036595e+02
>> >   8 SNES Function norm 1.020088110545e+02
>> >   9 SNES Function norm 1.020067198030e+02
>> >  10 SNES Function norm 1.020060034469e+02
>> >  11 SNES Function norm 1.020057582380e+02
>> >  12 SNES Function norm 1.020056743241e+02
>> >  13 SNES Function norm 1.020056456101e+02
>> >  14 SNES Function norm 1.020056357849e+02
>> >  15 SNES Function norm 1.020056324231e+02
>> >  16 SNES Function norm 1.020056312727e+02
>> >  17 SNES Function norm 1.020056308791e+02
>> >  18 SNES Function norm 1.020056307444e+02
>> >  19 SNES Function norm 1.020056306983e+02
>> >  20 SNES Function norm 1.020056306826e+02
>> >  21 SNES Function norm 1.020056306772e+02
>> >  22 SNES Function norm 1.020056306753e+02
>> >  23 SNES Function norm 1.020056306747e+02
>> >  24 SNES Function norm 1.020056306745e+02
>> >  25 SNES Function norm 1.020056306744e+02
>> >  26 SNES Function norm 1.020056306744e+02
>> >  27 SNES Function norm 1.020056306744e+02
>> >  28 SNES Function norm 1.020056306744e+02
>> >  29 SNES Function norm 1.020056306744e+02
>> >  30 SNES Function norm 1.020056306744e+02
>> >  31 SNES Function norm 1.020056306744e+02
>> >  32 SNES Function norm 1.020056306744e+02
>> >  33 SNES Function norm 1.020056306744e+02
>> >  34 SNES Function norm 1.020056306744e+02
>> >  35 SNES Function norm 1.020056306744e+02
>> >  36 SNES Function norm 1.020056306744e+02
>> >  37 SNES Function norm 1.020056306744e+02
>> >  38 SNES Function norm 1.020056306744e+02
>> >  39 SNES Function norm 1.020056306744e+02
>> >  40 SNES Function norm 1.020056306744e+02
>> >  41 SNES Function norm 1.020056306744e+02
>> >  42 SNES Function norm 1.020056306744e+02
>> >  43 SNES Function norm 1.020056306744e+02
>> >  44 SNES Function norm 1.020056306744e+02
>> >  45 SNES Function norm 1.020056306744e+02
>> >  46 SNES Function norm 1.020056306744e+02
>> >  47 SNES Function norm 1.020056306744e+02
>> >  48 SNES Function norm 1.020056306744e+02
>> >  49 SNES Function norm 1.020056306744e+02
>> >  50 SNES Function norm 1.020056306744e+02
>> > SNES Object: 4 MPI processes
>> >   type: nasm
>> >     total subdomain blocks = 4
>> >     Local solver information for first block on rank 0:
>> >     Use -snes_view ::ascii_info_detail to display information for all
>> blocks
>> >     SNES Object: (sub_) 1 MPI processes
>> >       type: newtonls
>> >       maximum iterations=50, maximum function evaluations=10000
>> >       tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>> >       total number of linear solver iterations=22
>> >       total number of function evaluations=40
>> >       norm schedule ALWAYS
>> >       Jacobian is built using a DMDA local Jacobian
>> >       SNESLineSearch Object: (sub_) 1 MPI processes
>> >         type: bt
>> >           interpolation: cubic
>> >           alpha=1.000000e-04
>> >         maxstep=1.000000e+08, minlambda=1.000000e-12
>> >         tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>> > lambda=1.000000e-08
>> >         maximum iterations=40
>> >       KSP Object: (sub_) 1 MPI processes
>> >         type: preonly
>> >         maximum iterations=10000, initial guess is zero
>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >         left preconditioning
>> >         using NONE norm type for convergence test
>> >       PC Object: (sub_) 1 MPI processes
>> >         type: lu
>> >           out-of-place factorization
>> >           tolerance for zero pivot 2.22045e-14
>> >           matrix ordering: nd
>> >           factor fill ratio given 5., needed 2.13732
>> >             Factored matrix follows:
>> >               Mat Object: 1 MPI processes
>> >                 type: seqaij
>> >                 rows=169, cols=169
>> >                 package used to perform factorization: petsc
>> >                 total: nonzeros=13339, allocated nonzeros=13339
>> >                   using I-node routines: found 104 nodes, limit used is
>> 5
>> >         linear system matrix = precond matrix:
>> >         Mat Object: 1 MPI processes
>> >           type: seqaij
>> >           rows=169, cols=169
>> >           total: nonzeros=6241, allocated nonzeros=6241
>> >           total number of mallocs used during MatSetValues calls=0
>> >             not using I-node routines
>> >   maximum iterations=50, maximum function evaluations=10000
>> >   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>> >   total number of function evaluations=51
>> >   norm schedule ALWAYS
>> >   Jacobian is built using a DMDA local Jacobian
>> > problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:
>> >   error |u-uexact|_inf = 3.996e-01, |u-uexact|_h = 2.837e-01
>> >
>> > We have been stuck on this for a while now. We do not know how to debug
>> > this issue. Please let us know if you have any insights.
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220512/9c65b917/attachment-0001.html>


More information about the petsc-users mailing list