[petsc-users] Convergence issues for SNES NASM

Takahashi, Tadanaga tt73 at njit.edu
Tue May 10 14:12:38 CDT 2022


Hello,

We are working on a finite difference solver for a 2D nonlinear PDE with
Dirichlet Boundary conditions on a rectangular domain. Our goal is to solve
the problem with parallel nonlinear additive Schwarz (NASM) as the outer
solver. Our code is similar to SNES example 5
<https://petsc.org/release/src/snes/tutorials/ex5.c.html>. In example 5,
the parallel NASM can be executed with a command like `mpiexec -n 4 ./ex5
-mms 3 -snes_type nasm -snes_nasm_type restrict -da_overlap 2` which gives
a convergent result. We assume this is the correct usage. A comment in the
source code for NASM mentions that NASM should be a preconditioner but
there's no documentation on the usage. The Brune paper does not cover
parallel NASM either. We observed that increasing the overlap leads to
fewer Schwarz iterations. The parallelization works seamlessly for an
arbitrary number of subdomains. This is the type of behavior we were
expecting from our code.

Our method uses box-style stencil width d = ceil(N^(1/3)) on a N by N DMDA.
The finite difference stencil consists of 4d+1 points spread out in a
diamond formation. If a stencil point is out of bounds, then it is
projected onto the boundary curve. Since the nodes on the boundary curve
would result in an irregular mesh, we chose not treat boundary nodes as
unknowns as in Example 5. We use DMDACreate2d to create the DA for the
interior points and DMDASNESSetFunctionLocal to associate the residue
function to the SNES object.

Our code works serially. We have also tested our code
with Newton-Krylov-Schwarz (NKS) by running something akin to `mpiexec -n
<n> ./solve -snes_type newtonls`. We have tested the NKS for several
quantities of subdomains and overlap and the code works as expected. We
have some confidence in the correctness of our code. The overlapping NASM
was implemented in MATLAB so we know the method converges. However, the
parallel NASM will not converge with our PETSc code. We don't understand
why NKS works while NASM does not. The F-norm residue monotonically
decreases and then stagnates.

Here is an example of the output when attempting to run NASM in parallel:
takahashi at ubuntu:~/Desktop/MA-DDM/Cpp/Rectangle$ mpiexec -n 4 ./test1 -t1_N
20 -snes_max_it 50 -snes_monitor -snes_view -da_overlap 3 -snes_type nasm
-snes_nasm_type restrict
  0 SNES Function norm 7.244681057908e+02
  1 SNES Function norm 1.237688062971e+02
  2 SNES Function norm 1.068926073552e+02
  3 SNES Function norm 1.027563237834e+02
  4 SNES Function norm 1.022184806736e+02
  5 SNES Function norm 1.020818227640e+02
  6 SNES Function norm 1.020325629121e+02
  7 SNES Function norm 1.020149036595e+02
  8 SNES Function norm 1.020088110545e+02
  9 SNES Function norm 1.020067198030e+02
 10 SNES Function norm 1.020060034469e+02
 11 SNES Function norm 1.020057582380e+02
 12 SNES Function norm 1.020056743241e+02
 13 SNES Function norm 1.020056456101e+02
 14 SNES Function norm 1.020056357849e+02
 15 SNES Function norm 1.020056324231e+02
 16 SNES Function norm 1.020056312727e+02
 17 SNES Function norm 1.020056308791e+02
 18 SNES Function norm 1.020056307444e+02
 19 SNES Function norm 1.020056306983e+02
 20 SNES Function norm 1.020056306826e+02
 21 SNES Function norm 1.020056306772e+02
 22 SNES Function norm 1.020056306753e+02
 23 SNES Function norm 1.020056306747e+02
 24 SNES Function norm 1.020056306745e+02
 25 SNES Function norm 1.020056306744e+02
 26 SNES Function norm 1.020056306744e+02
 27 SNES Function norm 1.020056306744e+02
 28 SNES Function norm 1.020056306744e+02
 29 SNES Function norm 1.020056306744e+02
 30 SNES Function norm 1.020056306744e+02
 31 SNES Function norm 1.020056306744e+02
 32 SNES Function norm 1.020056306744e+02
 33 SNES Function norm 1.020056306744e+02
 34 SNES Function norm 1.020056306744e+02
 35 SNES Function norm 1.020056306744e+02
 36 SNES Function norm 1.020056306744e+02
 37 SNES Function norm 1.020056306744e+02
 38 SNES Function norm 1.020056306744e+02
 39 SNES Function norm 1.020056306744e+02
 40 SNES Function norm 1.020056306744e+02
 41 SNES Function norm 1.020056306744e+02
 42 SNES Function norm 1.020056306744e+02
 43 SNES Function norm 1.020056306744e+02
 44 SNES Function norm 1.020056306744e+02
 45 SNES Function norm 1.020056306744e+02
 46 SNES Function norm 1.020056306744e+02
 47 SNES Function norm 1.020056306744e+02
 48 SNES Function norm 1.020056306744e+02
 49 SNES Function norm 1.020056306744e+02
 50 SNES Function norm 1.020056306744e+02
SNES Object: 4 MPI processes
  type: nasm
    total subdomain blocks = 4
    Local solver information for first block on rank 0:
    Use -snes_view ::ascii_info_detail to display information for all blocks
    SNES Object: (sub_) 1 MPI processes
      type: newtonls
      maximum iterations=50, maximum function evaluations=10000
      tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
      total number of linear solver iterations=22
      total number of function evaluations=40
      norm schedule ALWAYS
      Jacobian is built using a DMDA local Jacobian
      SNESLineSearch Object: (sub_) 1 MPI processes
        type: bt
          interpolation: cubic
          alpha=1.000000e-04
        maxstep=1.000000e+08, minlambda=1.000000e-12
        tolerances: relative=1.000000e-08, absolute=1.000000e-15,
lambda=1.000000e-08
        maximum iterations=40
      KSP Object: (sub_) 1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object: (sub_) 1 MPI processes
        type: lu
          out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          matrix ordering: nd
          factor fill ratio given 5., needed 2.13732
            Factored matrix follows:
              Mat Object: 1 MPI processes
                type: seqaij
                rows=169, cols=169
                package used to perform factorization: petsc
                total: nonzeros=13339, allocated nonzeros=13339
                  using I-node routines: found 104 nodes, limit used is 5
        linear system matrix = precond matrix:
        Mat Object: 1 MPI processes
          type: seqaij
          rows=169, cols=169
          total: nonzeros=6241, allocated nonzeros=6241
          total number of mallocs used during MatSetValues calls=0
            not using I-node routines
  maximum iterations=50, maximum function evaluations=10000
  tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
  total number of function evaluations=51
  norm schedule ALWAYS
  Jacobian is built using a DMDA local Jacobian
problem ex10 on 20 x 20 point 2D grid with d = 3, and eps = 0.082:
  error |u-uexact|_inf = 3.996e-01, |u-uexact|_h = 2.837e-01

We have been stuck on this for a while now. We do not know how to debug
this issue. Please let us know if you have any insights.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220510/2e95c1cb/attachment-0001.html>


More information about the petsc-users mailing list