[petsc-users] Multigrid coarse grid solver

Wed Apr 26 18:30:30 CDT 2017

  Yes, you asked for LU so it used LU! 

   Of course for smaller coarse grids and large numbers of processes this is very inefficient. 

   The default behavior for GAMG is probably what you want. In that case it is equivalent to 
-mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu.  But GAMG tries hard to put all the coarse grid degrees 
of freedom on the first process and none on the rest, so you do end up with the exact equivalent of a direct solver. 
Try -ksp_view in that case.

   There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse matrix on EACH process and each process does its own factorization and solve. This saves one phase of the communication for each V cycle since every process has the entire solution it just grabs from itself the values it needs without communication.

> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
> 
> I'm a bit confused by the selection of the coarse grid solver for
> multigrid. For the demo ksp/ex56, if I do:
> 
>    mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg
> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
> 
> I see
> 
>  Coarse grid solver -- level -------------------------------
>    KSP Object: (mg_coarse_) 1 MPI processes
>      type: preonly
>      maximum iterations=10000, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>      left preconditioning
>      using NONE norm type for convergence test
>    PC Object: (mg_coarse_) 1 MPI processes
>      type: lu
>        out-of-place factorization
>        tolerance for zero pivot 2.22045e-14
>        matrix ordering: nd
>        factor fill ratio given 5., needed 1.
>          Factored matrix follows:
>            Mat Object: 1 MPI processes
>              type: seqaij
>              rows=6, cols=6, bs=6
>              package used to perform factorization: petsc
>              total: nonzeros=36, allocated nonzeros=36
>              total number of mallocs used during MatSetValues calls =0
>                using I-node routines: found 2 nodes, limit used is 5
>      linear system matrix = precond matrix:
>      Mat Object: 1 MPI processes
>        type: seqaij
>        rows=6, cols=6, bs=6
>        total: nonzeros=36, allocated nonzeros=36
>        total number of mallocs used during MatSetValues calls =0
>          using I-node routines: found 2 nodes, limit used is 5
> 
> which is what I expect. Increasing from 1 to 2 processes:
> 
>    mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg
> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
> 
> I see
> 
>  Coarse grid solver -- level -------------------------------
>    KSP Object: (mg_coarse_) 2 MPI processes
>      type: preonly
>      maximum iterations=10000, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>      left preconditioning
>      using NONE norm type for convergence test
>    PC Object: (mg_coarse_) 2 MPI processes
>      type: lu
>        out-of-place factorization
>        tolerance for zero pivot 2.22045e-14
>        matrix ordering: natural
>        factor fill ratio given 0., needed 0.
>          Factored matrix follows:
>            Mat Object: 2 MPI processes
>              type: superlu_dist
>              rows=6, cols=6
>              package used to perform factorization: superlu_dist
>              total: nonzeros=0, allocated nonzeros=0
>              total number of mallocs used during MatSetValues calls =0
>                SuperLU_DIST run parameters:
>                  Process grid nprow 2 x npcol 1
>                  Equilibrate matrix TRUE
>                  Matrix input mode 1
>                  Replace tiny pivots FALSE
>                  Use iterative refinement FALSE
>                  Processors in row 2 col partition 1
>                  Row permutation LargeDiag
>                  Column permutation METIS_AT_PLUS_A
>                  Parallel symbolic factorization FALSE
>                  Repeated factorization SamePattern
>      linear system matrix = precond matrix:
>      Mat Object: 2 MPI processes
>        type: mpiaij
>        rows=6, cols=6, bs=6
>        total: nonzeros=36, allocated nonzeros=36
>        total number of mallocs used during MatSetValues calls =0
>          using I-node (on process 0) routines: found 2 nodes, limit used is 5
> 
> Note that the coarse grid is now using superlu_dist. Is the coarse
> grid being solved in parallel?
> 
> Garth