[petsc-users] Multigrid coarse grid solver

Thu Apr 27 09:07:02 CDT 2017

>
> Does the matrix operator(s) associated with the ksp have an options prefix?
>
>
I don't think so. run with -help to check.

> >>
> >>
> >> If I get GAMG to use more than one process for the coarse grid (a GAMG
> >> setting), can I get a parallel LU (exact) solver to solve it using
> >> only the processes that store parts of the coarse grid matrix?
> >
> >
> > No, we should make a sub communicator for the active processes only, but
> I
> > am not too motivated to do this because the only reason that this
> matters is
> > if 1) a solver (ie, the parallel direct solver) is lazy and puts
> reductions
> > everywhere for not good reason, or 2) you use a Krylov solver (very
> > uncommon). All of the communication in a non-krylov solver in point to
> point
> > and there is no win that I know of with a sub communicator.
> >
> > Note, the redundant coarse grid solver does use a subcommuncator,
> obviously,
> > but I think it is hardwired to PETSC_COMM_SELF, but maybe not?
> >
> >>
> >>
> >> Related to all this, do the parallel LU solvers internally
> >> re-distribute a matrix over the whole MPI communicator as part of
> >> their re-ordering phase?
> >
> >
> > They better not!
> >
>
> I did a test with MUMPS, and from the MUMPS diagnostics (memory use
> per process) it appears that it does split the matrix across all
> processes.
>

Yikes!  That is your problem with strong speedup.  Use SuperLU.

I think making a subcommunicator for the coarse grid in GAMG would wreck
havoc.

Could we turn that option off in MUMPS from GAMG?  Or just turn it off by
default? PETSc does not usually get that eager about partitioning.

>
> Garth
>
> > I doubt any solver would be that eager by default.
> >
> >>
> >>
> >> Garth
> >>
> >> >    There is also -mg_coarse_pc_type redundant
> >> > -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the
> coarse
> >> > matrix on EACH process and each process does its own factorization and
> >> > solve. This saves one phase of the communication for each V cycle
> since
> >> > every process has the entire solution it just grabs from itself the
> values
> >> > it needs without communication.
> >> >
> >> >
> >> >
> >> >
> >> >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
> >> >>
> >> >> I'm a bit confused by the selection of the coarse grid solver for
> >> >> multigrid. For the demo ksp/ex56, if I do:
> >> >>
> >> >>    mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg
> >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
> >> >>
> >> >> I see
> >> >>
> >> >>  Coarse grid solver -- level -------------------------------
> >> >>    KSP Object: (mg_coarse_) 1 MPI processes
> >> >>      type: preonly
> >> >>      maximum iterations=10000, initial guess is zero
> >> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >> >>      left preconditioning
> >> >>      using NONE norm type for convergence test
> >> >>    PC Object: (mg_coarse_) 1 MPI processes
> >> >>      type: lu
> >> >>        out-of-place factorization
> >> >>        tolerance for zero pivot 2.22045e-14
> >> >>        matrix ordering: nd
> >> >>        factor fill ratio given 5., needed 1.
> >> >>          Factored matrix follows:
> >> >>            Mat Object: 1 MPI processes
> >> >>              type: seqaij
> >> >>              rows=6, cols=6, bs=6
> >> >>              package used to perform factorization: petsc
> >> >>              total: nonzeros=36, allocated nonzeros=36
> >> >>              total number of mallocs used during MatSetValues calls
> =0
> >> >>                using I-node routines: found 2 nodes, limit used is 5
> >> >>      linear system matrix = precond matrix:
> >> >>      Mat Object: 1 MPI processes
> >> >>        type: seqaij
> >> >>        rows=6, cols=6, bs=6
> >> >>        total: nonzeros=36, allocated nonzeros=36
> >> >>        total number of mallocs used during MatSetValues calls =0
> >> >>          using I-node routines: found 2 nodes, limit used is 5
> >> >>
> >> >> which is what I expect. Increasing from 1 to 2 processes:
> >> >>
> >> >>    mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg
> >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
> >> >>
> >> >> I see
> >> >>
> >> >>  Coarse grid solver -- level -------------------------------
> >> >>    KSP Object: (mg_coarse_) 2 MPI processes
> >> >>      type: preonly
> >> >>      maximum iterations=10000, initial guess is zero
> >> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >> >>      left preconditioning
> >> >>      using NONE norm type for convergence test
> >> >>    PC Object: (mg_coarse_) 2 MPI processes
> >> >>      type: lu
> >> >>        out-of-place factorization
> >> >>        tolerance for zero pivot 2.22045e-14
> >> >>        matrix ordering: natural
> >> >>        factor fill ratio given 0., needed 0.
> >> >>          Factored matrix follows:
> >> >>            Mat Object: 2 MPI processes
> >> >>              type: superlu_dist
> >> >>              rows=6, cols=6
> >> >>              package used to perform factorization: superlu_dist
> >> >>              total: nonzeros=0, allocated nonzeros=0
> >> >>              total number of mallocs used during MatSetValues calls
> =0
> >> >>                SuperLU_DIST run parameters:
> >> >>                  Process grid nprow 2 x npcol 1
> >> >>                  Equilibrate matrix TRUE
> >> >>                  Matrix input mode 1
> >> >>                  Replace tiny pivots FALSE
> >> >>                  Use iterative refinement FALSE
> >> >>                  Processors in row 2 col partition 1
> >> >>                  Row permutation LargeDiag
> >> >>                  Column permutation METIS_AT_PLUS_A
> >> >>                  Parallel symbolic factorization FALSE
> >> >>                  Repeated factorization SamePattern
> >> >>      linear system matrix = precond matrix:
> >> >>      Mat Object: 2 MPI processes
> >> >>        type: mpiaij
> >> >>        rows=6, cols=6, bs=6
> >> >>        total: nonzeros=36, allocated nonzeros=36
> >> >>        total number of mallocs used during MatSetValues calls =0
> >> >>          using I-node (on process 0) routines: found 2 nodes, limit
> >> >> used is 5
> >> >>
> >> >> Note that the coarse grid is now using superlu_dist. Is the coarse
> >> >> grid being solved in parallel?
> >> >>
> >> >> Garth
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170427/b48352d1/attachment.html>