[petsc-users] Multigrid coarse grid solver

Thu Apr 27 10:37:53 CDT 2017

> On Apr 27, 2017, at 7:45 AM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> Barry, we seem to get an error when you explicitly set this. 

  Of course you get an error, you are asking PETSc to do a parallel LU; PETSc does NOT have a parallel LU as you well know. How could you possibly think this would work?

   Note if you have for example superlu_dist or mumps installed and do NOT list -mg_coarse_pc_factor_mat_solver_package petsc then it will default to use one of the parallel solvers automatically (it just looks through the list of installed solvers for that case and picks the first one).

> 
> Garth, Maybe to set the default explicitly you need to use pc_type asm -sub_pc_type lu. That is the true default.
> 
> More below but this is the error message:
> 
> 17:46 knepley/feature-plasma-example *= ~/Codes/petsc/src/ksp/ksp/examples/tutorials$ /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu -mg_coarse_pc_factor_mat_solver_package petsc
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for possible LU and Cholesky solvers
> [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da  GIT Date: 2017-04-25 12:56:20 -0500
> [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by markadams Wed Apr 26 17:46:28 2017
> [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F
> 
> 
> On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
> On 27 April 2017 at 00:30, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> >   Yes, you asked for LU so it used LU!
> >
> >    Of course for smaller coarse grids and large numbers of processes this is very inefficient.
> >
> >    The default behavior for GAMG is probably what you want. In that case it is equivalent to
> > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu.  But GAMG tries hard to put all the coarse grid degrees
> > of freedom on the first process and none on the rest, so you do end up with the exact equivalent of a direct solver.
> > Try -ksp_view in that case.
> >
> 
> Thanks, Barry.
> 
> I'm struggling a little to understand the matrix data structure for
> the coarse grid. Is it just a mpiaji matrix, with all entries
> (usually) on one process?
> 
> Yes.
>  
> 
> Is there an options key prefix for the matrix on different levels?
> E.g., to turn on a viewer?
> 
> something like -mg_level_1_ksp_view should work (run with -help to get the correct syntax).
>  
> 
> If I get GAMG to use more than one process for the coarse grid (a GAMG
> setting), can I get a parallel LU (exact) solver to solve it using
> only the processes that store parts of the coarse grid matrix?
> 
> No, we should make a sub communicator for the active processes only, but I am not too motivated to do this because the only reason that this matters is if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions everywhere for not good reason, or 2) you use a Krylov solver (very uncommon). All of the communication in a non-krylov solver in point to point and there is no win that I know of with a sub communicator. 
> 
> Note, the redundant coarse grid solver does use a subcommuncator, obviously, but I think it is hardwired to PETSC_COMM_SELF, but maybe not?
>  
> 
> Related to all this, do the parallel LU solvers internally
> re-distribute a matrix over the whole MPI communicator as part of
> their re-ordering phase?
> 
> They better not!
> 
> I doubt any solver would be that eager by default.
>  
> 
> Garth
> 
> >    There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse matrix on EACH process and each process does its own factorization and solve. This saves one phase of the communication for each V cycle since every process has the entire solution it just grabs from itself the values it needs without communication.
> >
> >
> >
> >
> >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
> >>
> >> I'm a bit confused by the selection of the coarse grid solver for
> >> multigrid. For the demo ksp/ex56, if I do:
> >>
> >>    mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg
> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
> >>
> >> I see
> >>
> >>  Coarse grid solver -- level -------------------------------
> >>    KSP Object: (mg_coarse_) 1 MPI processes
> >>      type: preonly
> >>      maximum iterations=10000, initial guess is zero
> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>      left preconditioning
> >>      using NONE norm type for convergence test
> >>    PC Object: (mg_coarse_) 1 MPI processes
> >>      type: lu
> >>        out-of-place factorization
> >>        tolerance for zero pivot 2.22045e-14
> >>        matrix ordering: nd
> >>        factor fill ratio given 5., needed 1.
> >>          Factored matrix follows:
> >>            Mat Object: 1 MPI processes
> >>              type: seqaij
> >>              rows=6, cols=6, bs=6
> >>              package used to perform factorization: petsc
> >>              total: nonzeros=36, allocated nonzeros=36
> >>              total number of mallocs used during MatSetValues calls =0
> >>                using I-node routines: found 2 nodes, limit used is 5
> >>      linear system matrix = precond matrix:
> >>      Mat Object: 1 MPI processes
> >>        type: seqaij
> >>        rows=6, cols=6, bs=6
> >>        total: nonzeros=36, allocated nonzeros=36
> >>        total number of mallocs used during MatSetValues calls =0
> >>          using I-node routines: found 2 nodes, limit used is 5
> >>
> >> which is what I expect. Increasing from 1 to 2 processes:
> >>
> >>    mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg
> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
> >>
> >> I see
> >>
> >>  Coarse grid solver -- level -------------------------------
> >>    KSP Object: (mg_coarse_) 2 MPI processes
> >>      type: preonly
> >>      maximum iterations=10000, initial guess is zero
> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>      left preconditioning
> >>      using NONE norm type for convergence test
> >>    PC Object: (mg_coarse_) 2 MPI processes
> >>      type: lu
> >>        out-of-place factorization
> >>        tolerance for zero pivot 2.22045e-14
> >>        matrix ordering: natural
> >>        factor fill ratio given 0., needed 0.
> >>          Factored matrix follows:
> >>            Mat Object: 2 MPI processes
> >>              type: superlu_dist
> >>              rows=6, cols=6
> >>              package used to perform factorization: superlu_dist
> >>              total: nonzeros=0, allocated nonzeros=0
> >>              total number of mallocs used during MatSetValues calls =0
> >>                SuperLU_DIST run parameters:
> >>                  Process grid nprow 2 x npcol 1
> >>                  Equilibrate matrix TRUE
> >>                  Matrix input mode 1
> >>                  Replace tiny pivots FALSE
> >>                  Use iterative refinement FALSE
> >>                  Processors in row 2 col partition 1
> >>                  Row permutation LargeDiag
> >>                  Column permutation METIS_AT_PLUS_A
> >>                  Parallel symbolic factorization FALSE
> >>                  Repeated factorization SamePattern
> >>      linear system matrix = precond matrix:
> >>      Mat Object: 2 MPI processes
> >>        type: mpiaij
> >>        rows=6, cols=6, bs=6
> >>        total: nonzeros=36, allocated nonzeros=36
> >>        total number of mallocs used during MatSetValues calls =0
> >>          using I-node (on process 0) routines: found 2 nodes, limit used is 5
> >>
> >> Note that the coarse grid is now using superlu_dist. Is the coarse
> >> grid being solved in parallel?
> >>
> >> Garth
> >
>