[petsc-users] How to use DM_BOUNDARY_GHOSTED for Dirichlet boundary conditions

Tue Feb 28 13:05:24 CST 2023

Sorry, I should have made myself more clear. I changed the three 7 
passed to DMDACreate3d to 33 to make the example a bit more realistic, 
as I also use "U-cycles", i.e. my coarsest level is still big enough to 
make use of some GPU parallelism. I should have just put that into the 
given command line argument string with -da_grid_x 33 -da_grid_y 33 
-da_grid_z 33

On 2/28/23 18:43, Barry Smith wrote:
>
>    I am sorry, I cannot reproduce what you describe. I am using 
> src/ksp/ksp/tutorials/ex45.c in the main branch (should be same as 
> release for this purpose).
>
>    No change to the code I get
>
> $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 
> -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson 
> -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits 
> -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson 
> -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits 
> -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
>
>   0 KSP preconditioned resid norm 1.851257578045e+01 true resid norm 
> 1.476491378857e+01 ||r(i)||/||b|| 1.000000000000e+00
>
>   1 KSP preconditioned resid norm 3.720545622095e-01 true resid norm 
> 5.171053311198e-02 ||r(i)||/||b|| 3.502257707188e-03
>
>   2 KSP preconditioned resid norm 1.339047557616e-02 true resid norm 
> 1.866765310863e-03 ||r(i)||/||b|| 1.264325235890e-04
>
>   3 KSP preconditioned resid norm 4.833887599029e-04 true resid norm 
> 6.867629264754e-05 ||r(i)||/||b|| 4.651316873974e-06
>
>   4 KSP preconditioned resid norm 1.748167886388e-05 true resid norm 
> 3.398334857479e-06 ||r(i)||/||b|| 2.301628648933e-07
>
>   5 KSP preconditioned resid norm 6.570567424652e-07 true resid norm 
> 4.304483984231e-07 ||r(i)||/||b|| 2.915346507180e-08
>
>   6 KSP preconditioned resid norm 4.013427896557e-08 true resid norm 
> 7.502068698790e-08 ||r(i)||/||b|| 5.081010838410e-09
>
>   7 KSP preconditioned resid norm 5.934811016347e-09 true resid norm 
> 1.333884145638e-08 ||r(i)||/||b|| 9.034147877457e-10
>
> Linear solve converged due to CONVERGED_RTOL iterations 7
>
> KSP Object: 1 MPI process
>
>   type: richardson
>
> damping factor=1.
>
> maximum iterations=10000, nonzero initial guess
>
> tolerances:  relative=1e-09, absolute=1e-50, divergence=10000.
>
>   left preconditioning
>
>   using PRECONDITIONED norm type for convergence test
>
> PC Object: 1 MPI process
>
>   type: mg
>
>     type is MULTIPLICATIVE, levels=3 cycles=v
>
> Cycles per PCApply=1
>
> Not using Galerkin computed coarse grid matrices
>
>   Coarse grid solver -- level 0 -------------------------------
>
>     KSP Object: (mg_coarse_) 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using PRECONDITIONED norm type for convergence test
>
>     PC Object: (mg_coarse_) 1 MPI process
>
> type: jacobi
>
> type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=8, cols=8
>
> total: nonzeros=32, allocated nonzeros=32
>
> total number of mallocs used during MatSetValues calls=0
>
>   not using I-node routines
>
>   Down solver (pre-smoother) on level 1 -------------------------------
>
>     KSP Object: (mg_levels_1_) 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using NONE norm type for convergence test
>
>     PC Object: (mg_levels_1_) 1 MPI process
>
> type: jacobi
>
> type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=64, cols=64
>
> total: nonzeros=352, allocated nonzeros=352
>
> total number of mallocs used during MatSetValues calls=0
>
>   not using I-node routines
>
>   Up solver (post-smoother) same as down solver (pre-smoother)
>
>   Down solver (pre-smoother) on level 2 -------------------------------
>
>     KSP Object: (mg_levels_2_) 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using NONE norm type for convergence test
>
>     PC Object: (mg_levels_2_) 1 MPI process
>
> type: jacobi
>
> type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=343, cols=343
>
> total: nonzeros=2107, allocated nonzeros=2107
>
> total number of mallocs used during MatSetValues calls=0
>
>   not using I-node routines
>
>   Up solver (post-smoother) same as down solver (pre-smoother)
>
>   linear system matrix = precond matrix:
>
>   Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=343, cols=343
>
> total: nonzeros=2107, allocated nonzeros=2107
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Residual norm 1.33388e-08
>
> ~/Src/petsc/src/ksp/ksp/tutorials*(main=)*arch-main
>
> $
>
>
>
>    Now change code with
>
>         if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 
> || k == mz - 1) {
>           barray[k][j][i] = 0; //2.0 * (HxHydHz + HxHzdHy + HyHzdHx);
>         } else {
>           barray[k][j][i] = 1; //Hx * Hy * Hz;
>         }
>
>   I do not understand where I am suppose to change the dimension to 33 
> so I ignore that statement. Same command line with change above gives
>
> $ ./ex45 -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 
> -pc_type mg -pc_mg_levels 3 -mg_levels_ksp_type richardson 
> -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits 
> -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson 
> -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits 
> -mg_coarse_pc_type jacobi -ksp_monitor_true_residual -ksp_view
>
>   0 KSP preconditioned resid norm 7.292257119299e+01 true resid norm 
> 1.118033988750e+01 ||r(i)||/||b|| 1.000000000000e+00
>
>   1 KSP preconditioned resid norm 2.534913491362e+00 true resid norm 
> 3.528425353826e-01 ||r(i)||/||b|| 3.155919577875e-02
>
>   2 KSP preconditioned resid norm 9.145057509152e-02 true resid norm 
> 1.279725352471e-02 ||r(i)||/||b|| 1.144621152262e-03
>
>   3 KSP preconditioned resid norm 3.302446009474e-03 true resid norm 
> 5.122622088691e-04 ||r(i)||/||b|| 4.581812485342e-05
>
>   4 KSP preconditioned resid norm 1.204504429329e-04 true resid norm 
> 4.370692051248e-05 ||r(i)||/||b|| 3.909265814124e-06
>
>   5 KSP preconditioned resid norm 5.339971695523e-06 true resid norm 
> 7.229991776815e-06 ||r(i)||/||b|| 6.466701235889e-07
>
>   6 KSP preconditioned resid norm 5.856425044706e-07 true resid norm 
> 1.282860114273e-06 ||r(i)||/||b|| 1.147424968455e-07
>
>   7 KSP preconditioned resid norm 1.007137752126e-07 true resid norm 
> 2.283009757390e-07 ||r(i)||/||b|| 2.041986004328e-08
>
>   8 KSP preconditioned resid norm 1.790021892548e-08 true resid norm 
> 4.063263596129e-08 ||r(i)||/||b|| 3.634293444578e-09
>
> Linear solve converged due to CONVERGED_RTOL iterations 8
>
> KSP Object: 1 MPI process
>
> type: richardson
>
> damping factor=1.
>
> maximum iterations=10000, nonzero initial guess
>
> tolerances:  relative=1e-09, absolute=1e-50, divergence=10000.
>
>   left preconditioning
>
> using PRECONDITIONED norm type for convergence test
>
> PC Object: 1 MPI process
>
> type: mg
>
> type is MULTIPLICATIVE, levels=3 cycles=v
>
> Cycles per PCApply=1
>
> Not using Galerkin computed coarse grid matrices
>
> Coarse grid solver -- level 0 -------------------------------
>
> KSP Object: (mg_coarse_) 1 MPI process
>
> type: richardson
>
>   damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using PRECONDITIONED norm type for convergence test
>
>     PC Object: (mg_coarse_) 1 MPI process
>
> type: jacobi
>
>   type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
>   type: seqaij
>
>   rows=8, cols=8
>
>   total: nonzeros=32, allocated nonzeros=32
>
>   total number of mallocs used during MatSetValues calls=0
>
>     not using I-node routines
>
>   Down solver (pre-smoother) on level 1 -------------------------------
>
> KSP Object: (mg_levels_1_) 1 MPI process
>
> type: richardson
>
>   damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using NONE norm type for convergence test
>
>     PC Object: (mg_levels_1_) 1 MPI process
>
> type: jacobi
>
>   type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
>   type: seqaij
>
>   rows=64, cols=64
>
>   total: nonzeros=352, allocated nonzeros=352
>
>   total number of mallocs used during MatSetValues calls=0
>
>     not using I-node routines
>
>   Up solver (post-smoother) same as down solver (pre-smoother)
>
>   Down solver (pre-smoother) on level 2 -------------------------------
>
> KSP Object: (mg_levels_2_) 1 MPI process
>
> type: richardson
>
>   damping factor=1.
>
> maximum iterations=6, nonzero initial guess
>
> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>
> left preconditioning
>
> using NONE norm type for convergence test
>
>     PC Object: (mg_levels_2_) 1 MPI process
>
> type: jacobi
>
>   type DIAGONAL
>
> linear system matrix = precond matrix:
>
> Mat Object: 1 MPI process
>
>   type: seqaij
>
>   rows=343, cols=343
>
>   total: nonzeros=2107, allocated nonzeros=2107
>
>   total number of mallocs used during MatSetValues calls=0
>
>     not using I-node routines
>
>   Up solver (post-smoother) same as down solver (pre-smoother)
>
> linear system matrix = precond matrix:
>
>   Mat Object: 1 MPI process
>
> type: seqaij
>
> rows=343, cols=343
>
> total: nonzeros=2107, allocated nonzeros=2107
>
> total number of mallocs used during MatSetValues calls=0
>
> not using I-node routines
>
> Residual norm 4.06326e-08
>
> ~/Src/petsc/src/ksp/ksp/tutorials*(main *=)*arch-main
>
> $
>
>
> In neither case is it taking 25 iterations. What am I doing wrong?
>
>
> Normally one expects only trivial changes in the convergence of 
> multigrid methods when one changes values in the right hand side as 
> with the run above.
>
>
> Barry
>
>
>
>
>> On Feb 27, 2023, at 7:16 PM, Paul Grosse-Bley 
>> <paul.grosse-bley at ziti.uni-heidelberg.de> wrote:
>>
>> The scaling might be the problem, especially since I don't know what 
>> you mean by scaling it according to FE.
>>
>> For reproducing the issue with a smaller problem:
>> Change the ComputeRHS function in ex45.c
>>
>> if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == 
>> mz - 1) {
>>   barray[k][j][i] = 0.0;
>> } else {
>>   barray[k][j][i] = 1.0;
>> }
>>
>> Change the dimensions to e.g. 33 (I scaled it down, so it goes quick 
>> without a GPU) instead of 7 and then run with
>>
>> -ksp_converged_reason -ksp_type richardson -ksp_rtol 1e-09 -pc_type 
>> mg -pc_mg_levels 3 -mg_levels_ksp_type richardson 
>> -mg_levels_ksp_max_it 6 -mg_levels_ksp_converged_maxits 
>> -mg_levels_pc_type jacobi -mg_coarse_ksp_type richardson 
>> -mg_coarse_ksp_max_it 6 -mg_coarse_ksp_converged_maxits 
>> -mg_coarse_pc_type jacobi
>>
>> You will find that it takes 145 iterations instead of 25 for the 
>> original ex45 RHS. My hpgmg-cuda implementation (using 32^3) takes 41 
>> iterations.
>>
>> To what do I have to change the diagonal entries of the matrix for 
>> the boundary according to FE? Right now the diagonal is completely 
>> constant.
>>
>> Paul
>>
>> On Tuesday, February 28, 2023 00:23 CET, Barry Smith 
>> <bsmith at petsc.dev> wrote:
>>>
>>> I have not seen explicitly including, or excluding, the Dirichlet 
>>> boundary values in the system having a significant affect on the 
>>> convergence so long as you SCALE the diagonal rows (of those 
>>> Dirichlet points) by a value similar to the other entries along the 
>>> diagonal. If they are scaled completely differently, that can screw 
>>> up the convergence. For src/ksp/ksp/ex45.c I see that the 
>>> appropriate scaling is used (note the scaling should come from a 
>>> finite element view of the discretization even if the discretization 
>>> is finite differences as is done in ex45.c)
>>>
>>> Are you willing to share the two codes so we can take a look with 
>>> experienced eyes to try to figure out the difference?
>>>
>>> Barry
>>>
>>>
>>>
>>>
>>> > On Feb 27, 2023, at 5:48 PM, Paul Grosse-Bley 
>>> <paul.grosse-bley at ziti.uni-heidelberg.de> wrote:
>>> >
>>> > Hi Barry,
>>> >
>>> > the reason why I wanted to change to ghost boundaries is that I 
>>> was worrying about the effect of PCMGs coarsening on these boundary 
>>> values.
>>> >
>>> > As mentioned before, I am trying to reproduce results from the 
>>> hpgmg-cuda benchmark (a modified version of it, e.g. using 2nd order 
>>> instead of 4th etc.).
>>> > I am trying to solve the Poisson equation -\nabla^2 u = 1 with u = 
>>> 0 on the boundary with rtol=1e-9. While my MG solver implemented in 
>>> hpgmg solves this in 40 V-cycles (I weakened it a lot by only doing 
>>> smooths at the coarse level instead of CG). When I run the "same" MG 
>>> solver built in PETSc on this problem, it starts out reducing the 
>>> residual norm as fast or even faster for the first 20-30 iterations. 
>>> But for the last order of magnitude in the residual norm it needs 
>>> more than 300 V-cycles, i.e. it gets very slow. At this point I am 
>>> pretty much out of ideas about what is the cause, especially since 
>>> e.g. adding back cg at the coarsest level doesn't seem to change the 
>>> number of iterations at all. Therefore I am suspecting the 
>>> discretization to be the problem. HPGMG uses an even number of 
>>> points per dimension (e.g. 256), while PCMG wants an odd number 
>>> (e.g. 257). So I also tried adding another layer of boundary values 
>>> for the discretization to effectively use only 254 points per 
>>> dimension. This caused the solver to get even slightly worse.
>>> >
>>> > So can the explicit boundary values screw with the coarsening, 
>>> especially when they are not finite? Because with the problem as 
>>> stated in ex45 with finite (i.e. non-zero) boundary values, the MG 
>>> solver takes only 18 V-cycles.
>>> >
>>> > Best,
>>> > Paul
>>> >
>>> >
>>> >
>>> > On Monday, February 27, 2023 18:17 CET, Barry Smith 
>>> <bsmith at petsc.dev> wrote:
>>> >
>>> >>
>>> >> Paul,
>>> >>
>>> >> DM_BOUNDARY_GHOSTED would result in the extra ghost locations in 
>>> the local vectors (obtained with DMCreateLocalVector() but they will 
>>> not appear in the global vectors obtained with 
>>> DMCreateGlobalVector(); perhaps this is the issue? Since they do not 
>>> appear in the global vector they will not appear in the linear 
>>> system so there will be no diagonal entries for you to set since 
>>> those rows/columns do not exist in the linear system. In other 
>>> words, using DM_BOUNDARY_GHOSTED is a way to avoid needing to put 
>>> the Dirichlet values explicitly into the system being solved; 
>>> DM_BOUNDARY_GHOSTED is generally more helpful for nonlinear systems 
>>> than linear systems.
>>> >>
>>> >> Barry
>>> >>
>>> >> > On Feb 27, 2023, at 12:08 PM, Paul Grosse-Bley 
>>> <paul.grosse-bley at ziti.uni-heidelberg.de> wrote:
>>> >> >
>>> >> > Hi,
>>> >> >
>>> >> > I would like to modify src/ksp/ksp/tutorials/ex45.c to 
>>> implement Dirichlet boundary conditions using DM_BOUNDARY_GHOSTED 
>>> instead of using DM_BOUNDARY_NONE and explicitly implementing the 
>>> boundary by adding diagnonal-only rows.
>>> >> >
>>> >> > My assumption was that with DM_BOUNDARY_GHOSTED all vectors 
>>> from that DM have the extra memory for the ghost entries and that I 
>>> can basically use DMDAGetGhostCorners instead of DMDAGetCorners to 
>>> access the array gotten via DMDAVecGetArray. But when I access (gxs, 
>>> gys, gzs) = (-1,-1,-1) I get a segmentation fault. When looking at 
>>> the implementation of DMDAVecGetArray it looked to me as if 
>>> accessing (-1, -1, -1) should work as DMDAVecGetArray passes the 
>>> ghost corners to VecGetArray3d which then adds the right offsets.
>>> >> >
>>> >> > I could not find any example using DM_BOUNDARY_GHOSTED and then 
>>> actually accessing the ghost/boundary elements. Can I assume that 
>>> they are set to zero for the solution vector, i.e. the u=0 on 
>>> \del\Omega and I do not need to access them at all?
>>> >> >
>>> >> > Best,
>>> >> > Paul Große-Bley
>>> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230228/28165ab3/attachment-0001.html>