[petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions
Große-Bley, Paul
rz230 at uni-heidelberg.de
Tue Feb 28 13:05:24 CST 2023
Sorry, I should have made myself more clear. I changed the three 7
passed to DMDACreate3d to 33 to make the example a bit more realistic,
as I also use "U-cycles", i.e. my coarsest level is still big enough to
make use of some GPU parallelism. I should have just put that into the
given command line argument string with -da_grid_x 33 -da_grid_y 33
-da_grid_z 33
On 2/28/23 18:43, Barry Smith wrote:
>
> I am sorry, I cannot reproduce what you describe. I am using
> src/ksp/ksp/tutorials/ex45.c in the main branch (should be same as
> release for this purpose).
>
> No change to the code I get
>
> $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09
> -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson
> -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits
> -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson
> -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits
> -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
>
> 0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm
> 1.476491378857e+01 ||r(i)||/||b|| 1.000000000000e+00
>
> 1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm
> 5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03
>
> 2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm
> 1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04
>
> 3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm
> 6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06
>
> 4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm
> 3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07
>
> 5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm
> 4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08
>
> 6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm
> 7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09
>
> 7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm
> 1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10
>
> Linear solve converged due to CONVERGED_RTOL iterations 7
>
> KSP Object: 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=10000, nonzero initial guess
>
> tolerances: relative=1e-09, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using PRECONDITIONED norm type for convergence test
>
> PC Object: 1 MPI process
>
> type: mg
>
> type is MULTIPLICATIVE, levels=3 cycles=v
>
> Cycles per PCApply=1
>
> Not using Galerkin computed coarse grid matrices
>
> Coarse grid solver -- level 0 -------------------------------
>
> KSP Object: (mg_coarse_) 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using PRECONDITIONED norm type for convergence test
>
> PC Object: (mg_coarse_) 1 MPI process
>
> type: jacobi
>
> type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=8, cols=8
>
> total: nonzeros=32, allocated nonzeros=32
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Down solver (pre-smoother) on level 1 -------------------------------
>
> KSP Object: (mg_levels_1_) 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using NONE norm type for convergence test
>
> PC Object: (mg_levels_1_) 1 MPI process
>
> type: jacobi
>
> type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=64, cols=64
>
> total: nonzeros=352, allocated nonzeros=352
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Up solver (post-smoother) same as down solver (pre-smoother)
>
> Down solver (pre-smoother) on level 2 -------------------------------
>
> KSP Object: (mg_levels_2_) 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using NONE norm type for convergence test
>
> PC Object: (mg_levels_2_) 1 MPI process
>
> type: jacobi
>
> type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=343, cols=343
>
> total: nonzeros=2107, allocated nonzeros=2107
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Up solver (post-smoother) same as down solver (pre-smoother)
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=343, cols=343
>
> total: nonzeros=2107, allocated nonzeros=2107
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Residual norm 1.33388e-08
>
> ~/Src/petsc/src/ksp/ksp/tutorials*(main=)*arch-main
>
> $
>
>
>
> Now change code with
>
> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1
> || k == mz - 1) {
> barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx);
> } else {
> barray[k][j][i] = 1; //Hx * Hy * Hz;
> }
>
> I do not understand where I am suppose to change the dimension to 33
> so I ignore that statement. Same command line with change above gives
>
> $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09
> -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson
> -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits
> -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson
> -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits
> -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
>
> 0 KSP preconditioned resid norm 7.292257119299e+01 true resid norm
> 1.118033988750e+01 ||r(i)||/||b|| 1.000000000000e+00
>
> 1 KSP preconditioned resid norm 2.534913491362e+00 true resid norm
> 3.528425353826e-01 ||r(i)||/||b|| 3.155919577875e-02
>
> 2 KSP preconditioned resid norm 9.145057509152e-02 true resid norm
> 1.279725352471e-02 ||r(i)||/||b|| 1.144621152262e-03
>
> 3 KSP preconditioned resid norm 3.302446009474e-03 true resid norm
> 5.122622088691e-04 ||r(i)||/||b|| 4.581812485342e-05
>
> 4 KSP preconditioned resid norm 1.204504429329e-04 true resid norm
> 4.370692051248e-05 ||r(i)||/||b|| 3.909265814124e-06
>
> 5 KSP preconditioned resid norm 5.339971695523e-06 true resid norm
> 7.229991776815e-06 ||r(i)||/||b|| 6.466701235889e-07
>
> 6 KSP preconditioned resid norm 5.856425044706e-07 true resid norm
> 1.282860114273e-06 ||r(i)||/||b|| 1.147424968455e-07
>
> 7 KSP preconditioned resid norm 1.007137752126e-07 true resid norm
> 2.283009757390e-07 ||r(i)||/||b|| 2.041986004328e-08
>
> 8 KSP preconditioned resid norm 1.790021892548e-08 true resid norm
> 4.063263596129e-08 ||r(i)||/||b|| 3.634293444578e-09
>
> Linear solve converged due to CONVERGED_RTOL iterations 8
>
> KSP Object: 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=10000, nonzero initial guess
>
> tolerances: relative=1e-09, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using PRECONDITIONED norm type for convergence test
>
> PC Object: 1 MPI process
>
> type: mg
>
> type is MULTIPLICATIVE, levels=3 cycles=v
>
> Cycles per PCApply=1
>
> Not using Galerkin computed coarse grid matrices
>
> Coarse grid solver -- level 0 -------------------------------
>
> KSP Object: (mg_coarse_) 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using PRECONDITIONED norm type for convergence test
>
> PC Object: (mg_coarse_) 1 MPI process
>
> type: jacobi
>
> type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=8, cols=8
>
> total: nonzeros=32, allocated nonzeros=32
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Down solver (pre-smoother) on level 1 -------------------------------
>
> KSP Object: (mg_levels_1_) 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using NONE norm type for convergence test
>
> PC Object: (mg_levels_1_) 1 MPI process
>
> type: jacobi
>
> type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=64, cols=64
>
> total: nonzeros=352, allocated nonzeros=352
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Up solver (post-smoother) same as down solver (pre-smoother)
>
> Down solver (pre-smoother) on level 2 -------------------------------
>
> KSP Object: (mg_levels_2_) 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using NONE norm type for convergence test
>
> PC Object: (mg_levels_2_) 1 MPI process
>
> type: jacobi
>
> type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=343, cols=343
>
> total: nonzeros=2107, allocated nonzeros=2107
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Up solver (post-smoother) same as down solver (pre-smoother)
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=343, cols=343
>
> total: nonzeros=2107, allocated nonzeros=2107
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Residual norm 4.06326e-08
>
> ~/Src/petsc/src/ksp/ksp/tutorials*(main *=)*arch-main
>
> $
>
>
> In neither case is it taking 25 iterations. What am I doing wrong?
>
>
> Normally one expects only trivial changes in the convergence of
> multigrid methods when one changes values in the right hand side as
> with the run above.
>
>
> Barry
>
>
>
>
>> On Feb 27, 2023, at 7:16 PM, Paul Grosse-Bley
>> <paul.grosse-bley at ziti.uni-heidelberg.de> wrote:
>>
>> The scaling might be the problem, especially since I don't know what
>> you mean by scaling it according to FE.
>>
>> For reproducing the issue with a smaller problem:
>> Change the ComputeRHS function in ex45.c
>>
>> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k ==
>> mz - 1) {
>> barray[k][j][i] = 0.0;
>> } else {
>> barray[k][j][i] = 1.0;
>> }
>>
>> Change the dimensions to e.g. 33 (I scaled it down, so it goes quick
>> without a GPU) instead of 7 and then run with
>>
>> -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type
>> mg -pc_mg_levels 3 -mg_levels_ksp_type richardson
>> -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits
>> -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson
>> -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits
>> -mg_coarse_pc_type jacobi
>>
>> You will find that it takes 145 iterations instead of 25 for the
>> original ex45 RHS. My hpgmg-cuda implementation (using 32^3) takes 41
>> iterations.
>>
>> To what do I have to change the diagonal entries of the matrix for
>> the boundary according to FE? Right now the diagonal is completely
>> constant.
>>
>> Paul
>>
>> On Tuesday, February 28, 2023 00:23 CET, Barry Smith
>> <bsmith at petsc.dev> wrote:
>>>
>>> I have not seen explicitly including, or excluding, the Dirichlet
>>> boundary values in the system having a significant affect on the
>>> convergence so long as you SCALE the diagonal rows (of those
>>> Dirichlet points) by a value similar to the other entries along the
>>> diagonal. If they are scaled completely differently, that can screw
>>> up the convergence. For src/ksp/ksp/ex45.c I see that the
>>> appropriate scaling is used (note the scaling should come from a
>>> finite element view of the discretization even if the discretization
>>> is finite differences as is done in ex45.c)
>>>
>>> Are you willing to share the two codes so we can take a look with
>>> experienced eyes to try to figure out the difference?
>>>
>>> Barry
>>>
>>>
>>>
>>>
>>> > On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley
>>> <paul.grosse-bley at ziti.uni-heidelberg.de> wrote:
>>> >
>>> > Hi Barry,
>>> >
>>> > the reason why I wanted to change to ghost boundaries is that I
>>> was worrying about the effect of PCMGs coarsening on these boundary
>>> values.
>>> >
>>> > As mentioned before, I am trying to reproduce results from the
>>> hpgmg-cuda benchmark (a modified version of it, e.g. using 2nd order
>>> instead of 4th etc.).
>>> > I am trying to solve the Poisson equation -\nabla^2 u = 1 with u =
>>> 0 on the boundary with rtol=1e-9. While my MG solver implemented in
>>> hpgmg solves this in 40 V-cycles (I weakened it a lot by only doing
>>> smooths at the coarse level instead of CG). When I run the "same" MG
>>> solver built in PETSc on this problem, it starts out reducing the
>>> residual norm as fast or even faster for the first 20-30 iterations.
>>> But for the last order of magnitude in the residual norm it needs
>>> more than 300 V-cycles, i.e. it gets very slow. At this point I am
>>> pretty much out of ideas about what is the cause, especially since
>>> e.g. adding back cg at the coarsest level doesn't seem to change the
>>> number of iterations at all. Therefore I am suspecting the
>>> discretization to be the problem. HPGMG uses an even number of
>>> points per dimension (e.g. 256), while PCMG wants an odd number
>>> (e.g. 257). So I also tried adding another layer of boundary values
>>> for the discretization to effectively use only 254 points per
>>> dimension. This caused the solver to get even slightly worse.
>>> >
>>> > So can the explicit boundary values screw with the coarsening,
>>> especially when they are not finite? Because with the problem as
>>> stated in ex45 with finite (i.e. non-zero) boundary values, the MG
>>> solver takes only 18 V-cycles.
>>> >
>>> > Best,
>>> > Paul
>>> >
>>> >
>>> >
>>> > On Monday, February 27, 2023 18:17 CET, Barry Smith
>>> <bsmith at petsc.dev> wrote:
>>> >
>>> >>
>>> >> Paul,
>>> >>
>>> >> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in
>>> the local vectors (obtained with DMCreateLocalVector() but they will
>>> not appear in the global vectors obtained with
>>> DMCreateGlobalVector(); perhaps this is the issue? Since they do not
>>> appear in the global vector they will not appear in the linear
>>> system so there will be no diagonal entries for you to set since
>>> those rows/columns do not exist in the linear system. In other
>>> words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put
>>> the Dirichlet values explicitly into the system being solved;
>>> DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems
>>> than linear systems.
>>> >>
>>> >> Barry
>>> >>
>>> >> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley
>>> <paul.grosse-bley at ziti.uni-heidelberg.de> wrote:
>>> >> >
>>> >> > Hi,
>>> >> >
>>> >> > I would like to modify src/ksp/ksp/tutorials/ex45.c to
>>> implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED
>>> instead of using DM_BOUNDARY_NONE and explicitly implementing the
>>> boundary by adding diagnonal-only rows.
>>> >> >
>>> >> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors
>>> from that DM have the extra memory for the ghost entries and that I
>>> can basically use DMDAGetGhostCorners instead of DMDAGetCorners to
>>> access the array gotten via DMDAVecGetArray. But when I access (gxs,
>>> gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at
>>> the implementation of DMDAVecGetArray it looked to me as if
>>> accessing (-1, -1, -1) should work as DMDAVecGetArray passes the
>>> ghost corners to VecGetArray3d which then adds the right offsets.
>>> >> >
>>> >> > I could not find any example using DM_BOUNDARY_GHOSTED and then
>>> actually accessing the ghost/boundary elements. Can I assume that
>>> they are set to zero for the solution vector, i.e. the u=0 on
>>> \del\Omega and I do not need to access them at all?
>>> >> >
>>> >> > Best,
>>> >> > Paul Große-Bley
>>> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230228/28165ab3/attachment-0001.html>
More information about the petsc-users
mailing list