[petsc-users] Multigrid coarse grid solver

Thu Apr 27 08:27:47 CDT 2017

On 27 April 2017 at 13:45, Mark Adams <mfadams at lbl.gov> wrote:
> Barry, we seem to get an error when you explicitly set this.
>
> Garth, Maybe to set the default explicitly you need to use pc_type asm
> -sub_pc_type lu. That is the true default.
>
> More below but this is the error message:
>
> 17:46 knepley/feature-plasma-example *=
> ~/Codes/petsc/src/ksp/ksp/examples/tutorials$
> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne
> 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
> -mg_coarse_pc_factor_mat_solver_package petsc
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: See
> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for
> possible LU and Cholesky solvers
> [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da  GIT
> Date: 2017-04-25 12:56:20 -0500
> [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by
> markadams Wed Apr 26 17:46:28 2017
> [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++
> COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F
>
>
> On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
>>
>> On 27 April 2017 at 00:30, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >
>> >   Yes, you asked for LU so it used LU!
>> >
>> >    Of course for smaller coarse grids and large numbers of processes
>> > this is very inefficient.
>> >
>> >    The default behavior for GAMG is probably what you want. In that case
>> > it is equivalent to
>> > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu.  But GAMG tries
>> > hard to put all the coarse grid degrees
>> > of freedom on the first process and none on the rest, so you do end up
>> > with the exact equivalent of a direct solver.
>> > Try -ksp_view in that case.
>> >
>>
>> Thanks, Barry.
>>
>> I'm struggling a little to understand the matrix data structure for
>> the coarse grid. Is it just a mpiaji matrix, with all entries
>> (usually) on one process?
>
>
> Yes.
>
>>
>>
>> Is there an options key prefix for the matrix on different levels?
>> E.g., to turn on a viewer?
>
>
> something like -mg_level_1_ksp_view should work (run with -help to get the
> correct syntax).
>

Does the matrix operator(s) associated with the ksp have an options prefix?

>>
>>
>> If I get GAMG to use more than one process for the coarse grid (a GAMG
>> setting), can I get a parallel LU (exact) solver to solve it using
>> only the processes that store parts of the coarse grid matrix?
>
>
> No, we should make a sub communicator for the active processes only, but I
> am not too motivated to do this because the only reason that this matters is
> if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions
> everywhere for not good reason, or 2) you use a Krylov solver (very
> uncommon). All of the communication in a non-krylov solver in point to point
> and there is no win that I know of with a sub communicator.
>
> Note, the redundant coarse grid solver does use a subcommuncator, obviously,
> but I think it is hardwired to PETSC_COMM_SELF, but maybe not?
>
>>
>>
>> Related to all this, do the parallel LU solvers internally
>> re-distribute a matrix over the whole MPI communicator as part of
>> their re-ordering phase?
>
>
> They better not!
>

I did a test with MUMPS, and from the MUMPS diagnostics (memory use
per process) it appears that it does split the matrix across all
processes.

Garth

> I doubt any solver would be that eager by default.
>
>>
>>
>> Garth
>>
>> >    There is also -mg_coarse_pc_type redundant
>> > -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse
>> > matrix on EACH process and each process does its own factorization and
>> > solve. This saves one phase of the communication for each V cycle since
>> > every process has the entire solution it just grabs from itself the values
>> > it needs without communication.
>> >
>> >
>> >
>> >
>> >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
>> >>
>> >> I'm a bit confused by the selection of the coarse grid solver for
>> >> multigrid. For the demo ksp/ex56, if I do:
>> >>
>> >>    mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg
>> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
>> >>
>> >> I see
>> >>
>> >>  Coarse grid solver -- level -------------------------------
>> >>    KSP Object: (mg_coarse_) 1 MPI processes
>> >>      type: preonly
>> >>      maximum iterations=10000, initial guess is zero
>> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >>      left preconditioning
>> >>      using NONE norm type for convergence test
>> >>    PC Object: (mg_coarse_) 1 MPI processes
>> >>      type: lu
>> >>        out-of-place factorization
>> >>        tolerance for zero pivot 2.22045e-14
>> >>        matrix ordering: nd
>> >>        factor fill ratio given 5., needed 1.
>> >>          Factored matrix follows:
>> >>            Mat Object: 1 MPI processes
>> >>              type: seqaij
>> >>              rows=6, cols=6, bs=6
>> >>              package used to perform factorization: petsc
>> >>              total: nonzeros=36, allocated nonzeros=36
>> >>              total number of mallocs used during MatSetValues calls =0
>> >>                using I-node routines: found 2 nodes, limit used is 5
>> >>      linear system matrix = precond matrix:
>> >>      Mat Object: 1 MPI processes
>> >>        type: seqaij
>> >>        rows=6, cols=6, bs=6
>> >>        total: nonzeros=36, allocated nonzeros=36
>> >>        total number of mallocs used during MatSetValues calls =0
>> >>          using I-node routines: found 2 nodes, limit used is 5
>> >>
>> >> which is what I expect. Increasing from 1 to 2 processes:
>> >>
>> >>    mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg
>> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
>> >>
>> >> I see
>> >>
>> >>  Coarse grid solver -- level -------------------------------
>> >>    KSP Object: (mg_coarse_) 2 MPI processes
>> >>      type: preonly
>> >>      maximum iterations=10000, initial guess is zero
>> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >>      left preconditioning
>> >>      using NONE norm type for convergence test
>> >>    PC Object: (mg_coarse_) 2 MPI processes
>> >>      type: lu
>> >>        out-of-place factorization
>> >>        tolerance for zero pivot 2.22045e-14
>> >>        matrix ordering: natural
>> >>        factor fill ratio given 0., needed 0.
>> >>          Factored matrix follows:
>> >>            Mat Object: 2 MPI processes
>> >>              type: superlu_dist
>> >>              rows=6, cols=6
>> >>              package used to perform factorization: superlu_dist
>> >>              total: nonzeros=0, allocated nonzeros=0
>> >>              total number of mallocs used during MatSetValues calls =0
>> >>                SuperLU_DIST run parameters:
>> >>                  Process grid nprow 2 x npcol 1
>> >>                  Equilibrate matrix TRUE
>> >>                  Matrix input mode 1
>> >>                  Replace tiny pivots FALSE
>> >>                  Use iterative refinement FALSE
>> >>                  Processors in row 2 col partition 1
>> >>                  Row permutation LargeDiag
>> >>                  Column permutation METIS_AT_PLUS_A
>> >>                  Parallel symbolic factorization FALSE
>> >>                  Repeated factorization SamePattern
>> >>      linear system matrix = precond matrix:
>> >>      Mat Object: 2 MPI processes
>> >>        type: mpiaij
>> >>        rows=6, cols=6, bs=6
>> >>        total: nonzeros=36, allocated nonzeros=36
>> >>        total number of mallocs used during MatSetValues calls =0
>> >>          using I-node (on process 0) routines: found 2 nodes, limit
>> >> used is 5
>> >>
>> >> Note that the coarse grid is now using superlu_dist. Is the coarse
>> >> grid being solved in parallel?
>> >>
>> >> Garth
>> >
>
>