[petsc-dev] Block system

Mon Feb 13 13:28:12 CST 2017

> On Feb 13, 2017, at 5:46 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> Pierre Jolivet <Pierre.Jolivet at enseeiht.fr> writes:
> 
>> On Mon, 13 Feb 2017 17:07:21 +0100, Jed Brown wrote:
>>> Pierre Jolivet <Pierre.Jolivet at enseeiht.fr> writes:
>>> 
>>>> Hello,
>>>> Given this block matrix:
>>>> A = [A11,A12,A13,A14;
>>>>      A21,A22,A23,A24;
>>>>      A31,A32,A33,A34;
>>>>      A41,A42,A43,A44];
>>>> It is trivial to precondition Ax = b with M^-1 = diag(A11^-1, 
>>>> A22^-1,
>>>> A33^-1, A44^-1);
>>>> My application requires a slightly fancier preconditionner which 
>>>> should
>>>> be M^-1 = diag(inv([A11,A12;A21,A22]),inv([A33,A34;A43,A44]));
>>>> I'm not sure what is the right tool for this.
>>>> I've stopped at a 4x4 block matrix, but at scale I have a matrix 
>>>> with
>>>> few thousands x few thousands blocks (still with the nested 2 x 2 
>>>> block
>>>> structure).
>>> 
>>> Are all of these blocks distributed on your communicator or do they 
>>> have
>>> some locality?  PCFieldSplit is intended for problems where the 
>>> blocks
>> 
>> All the blocks are distributed indeed.
> 
> Do you mean that each block is distributed?

Yes, it is a little bit odd. I indeed have a 2D distribution, that I redistribute into a 1D row distribution (MATMPIAIJ). But maybe there are smarter ways to deal with 2D distributed sparse matrices in PETSc?
Because of this redistribution, the diagonal blocks of standard block Jacobi are tiny, and lead to a “bad" preconditioner. 
PCBJacobiSetTotalBlocks seems to fix this just perfectly, thank you very much!

Here are two typos in PETSc output btw:
"ICNTL(8) (scalling strategy):” => should be spelled “scaling"
"ICNTL(19) (Shur complement info):” => should be spelled “Schur"

Thanks.

>>> are all distributed and solving them sequentially is acceptable.  The
>>> other limiting case for an additive preconditioner like you have 
>>> above
>>> is block Jacobi (perhaps with multi-process subdomains or multiple
>>> subdomains per process; such decompositions are supported).
>> 
>> Yes, that is basically what I need, block Jacobi with subdomains 
>> defined as aggregation of multiple processes, but I don't know how to do 
>> this and thought of using an additive FieldSplit. Could you give me a 
>> pointer to such a distribution, please?
> 
> The easiest is PCBJacobiSetTotalBlocks (-pc_bjacobi_blocks).  That gets
> you this kind of method
> 
> $ mpiexec -n 4 ./ex5 -pc_bjacobi_blocks 2 -sub_pc_type jacobi -snes_view
> SNES Object: 4 MPI processes
>  type: newtonls
>  maximum iterations=50, maximum function evaluations=10000
>  tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>  total number of linear solver iterations=4
>  total number of function evaluations=5
>  norm schedule ALWAYS
>  SNESLineSearch Object: 4 MPI processes
>    type: bt
>      interpolation: cubic
>      alpha=1.000000e-04
>    maxstep=1.000000e+08, minlambda=1.000000e-12
>    tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
>    maximum iterations=40
>  KSP Object: 4 MPI processes
>    type: gmres
>      GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>      GMRES: happy breakdown tolerance 1e-30
>    maximum iterations=10000, initial guess is zero
>    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>    left preconditioning
>    using PRECONDITIONED norm type for convergence test
>  PC Object: 4 MPI processes
>    type: bjacobi
>      block Jacobi: number of blocks = 2
>      Local solve is same for all blocks, in the following KSP and PC objects:
>    KSP Object: (sub_) 2 MPI processes
>      type: gmres
>        GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>        GMRES: happy breakdown tolerance 1e-30
>      maximum iterations=10000, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>      left preconditioning
>      using PRECONDITIONED norm type for convergence test
>    PC Object: (sub_) 2 MPI processes
>      type: jacobi
>      linear system matrix = precond matrix:
>      Mat Object: 2 MPI processes
>        type: mpiaij
>        rows=8, cols=8
>        total: nonzeros=28, allocated nonzeros=28
>        total number of mallocs used during MatSetValues calls =0
>          not using I-node (on process 0) routines
>    linear system matrix = precond matrix:
>    Mat Object: 4 MPI processes
>      type: mpiaij
>      rows=16, cols=16
>      total: nonzeros=64, allocated nonzeros=64
>      total number of mallocs used during MatSetValues calls =0