[petsc-users] Multigrid coarse grid solver

Thu Apr 27 09:13:49 CDT 2017

On Thu, Apr 27, 2017 at 9:07 AM, Mark Adams <mfadams at lbl.gov> wrote:

>
>
>> Does the matrix operator(s) associated with the ksp have an options
>> prefix?
>>
>>
> I don't think so. run with -help to check.
>
>
>> >>
>> >>
>> >> If I get GAMG to use more than one process for the coarse grid (a GAMG
>> >> setting), can I get a parallel LU (exact) solver to solve it using
>> >> only the processes that store parts of the coarse grid matrix?
>> >
>> >
>> > No, we should make a sub communicator for the active processes only,
>> but I
>> > am not too motivated to do this because the only reason that this
>> matters is
>> > if 1) a solver (ie, the parallel direct solver) is lazy and puts
>> reductions
>> > everywhere for not good reason, or 2) you use a Krylov solver (very
>> > uncommon). All of the communication in a non-krylov solver in point to
>> point
>> > and there is no win that I know of with a sub communicator.
>> >
>> > Note, the redundant coarse grid solver does use a subcommuncator,
>> obviously,
>> > but I think it is hardwired to PETSC_COMM_SELF, but maybe not?
>> >
>> >>
>> >>
>> >> Related to all this, do the parallel LU solvers internally
>> >> re-distribute a matrix over the whole MPI communicator as part of
>> >> their re-ordering phase?
>> >
>> >
>> > They better not!
>> >
>>
>> I did a test with MUMPS, and from the MUMPS diagnostics (memory use
>> per process) it appears that it does split the matrix across all
>> processes.
>
>
1) Can we motivate why you would ever want a parallel coarse grid? I cannot
think of a reason.

> Yikes!  That is your problem with strong speedup.  Use SuperLU.
>
> I think making a subcommunicator for the coarse grid in GAMG would wreck
> havoc.
>

2) I do not see why a subcommunicator is a problem. In fact, this is
exactly what PCTELESCOPE is designed to do.
    GAMG does a good job of reducing, but if you want completely custom
reductions, TELESCOPE is for that.

    Matt

> Could we turn that option off in MUMPS from GAMG?  Or just turn it off by
> default? PETSc does not usually get that eager about partitioning.
>
>
>>
>> Garth
>>
>> > I doubt any solver would be that eager by default.
>> >
>> >>
>> >>
>> >> Garth
>> >>
>> >> >    There is also -mg_coarse_pc_type redundant
>> >> > -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the
>> coarse
>> >> > matrix on EACH process and each process does its own factorization
>> and
>> >> > solve. This saves one phase of the communication for each V cycle
>> since
>> >> > every process has the entire solution it just grabs from itself the
>> values
>> >> > it needs without communication.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <gnw20 at cam.ac.uk>
>> wrote:
>> >> >>
>> >> >> I'm a bit confused by the selection of the coarse grid solver for
>> >> >> multigrid. For the demo ksp/ex56, if I do:
>> >> >>
>> >> >>    mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg
>> >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
>> >> >>
>> >> >> I see
>> >> >>
>> >> >>  Coarse grid solver -- level -------------------------------
>> >> >>    KSP Object: (mg_coarse_) 1 MPI processes
>> >> >>      type: preonly
>> >> >>      maximum iterations=10000, initial guess is zero
>> >> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >> >>      left preconditioning
>> >> >>      using NONE norm type for convergence test
>> >> >>    PC Object: (mg_coarse_) 1 MPI processes
>> >> >>      type: lu
>> >> >>        out-of-place factorization
>> >> >>        tolerance for zero pivot 2.22045e-14
>> >> >>        matrix ordering: nd
>> >> >>        factor fill ratio given 5., needed 1.
>> >> >>          Factored matrix follows:
>> >> >>            Mat Object: 1 MPI processes
>> >> >>              type: seqaij
>> >> >>              rows=6, cols=6, bs=6
>> >> >>              package used to perform factorization: petsc
>> >> >>              total: nonzeros=36, allocated nonzeros=36
>> >> >>              total number of mallocs used during MatSetValues calls
>> =0
>> >> >>                using I-node routines: found 2 nodes, limit used is 5
>> >> >>      linear system matrix = precond matrix:
>> >> >>      Mat Object: 1 MPI processes
>> >> >>        type: seqaij
>> >> >>        rows=6, cols=6, bs=6
>> >> >>        total: nonzeros=36, allocated nonzeros=36
>> >> >>        total number of mallocs used during MatSetValues calls =0
>> >> >>          using I-node routines: found 2 nodes, limit used is 5
>> >> >>
>> >> >> which is what I expect. Increasing from 1 to 2 processes:
>> >> >>
>> >> >>    mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg
>> >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
>> >> >>
>> >> >> I see
>> >> >>
>> >> >>  Coarse grid solver -- level -------------------------------
>> >> >>    KSP Object: (mg_coarse_) 2 MPI processes
>> >> >>      type: preonly
>> >> >>      maximum iterations=10000, initial guess is zero
>> >> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >> >>      left preconditioning
>> >> >>      using NONE norm type for convergence test
>> >> >>    PC Object: (mg_coarse_) 2 MPI processes
>> >> >>      type: lu
>> >> >>        out-of-place factorization
>> >> >>        tolerance for zero pivot 2.22045e-14
>> >> >>        matrix ordering: natural
>> >> >>        factor fill ratio given 0., needed 0.
>> >> >>          Factored matrix follows:
>> >> >>            Mat Object: 2 MPI processes
>> >> >>              type: superlu_dist
>> >> >>              rows=6, cols=6
>> >> >>              package used to perform factorization: superlu_dist
>> >> >>              total: nonzeros=0, allocated nonzeros=0
>> >> >>              total number of mallocs used during MatSetValues calls
>> =0
>> >> >>                SuperLU_DIST run parameters:
>> >> >>                  Process grid nprow 2 x npcol 1
>> >> >>                  Equilibrate matrix TRUE
>> >> >>                  Matrix input mode 1
>> >> >>                  Replace tiny pivots FALSE
>> >> >>                  Use iterative refinement FALSE
>> >> >>                  Processors in row 2 col partition 1
>> >> >>                  Row permutation LargeDiag
>> >> >>                  Column permutation METIS_AT_PLUS_A
>> >> >>                  Parallel symbolic factorization FALSE
>> >> >>                  Repeated factorization SamePattern
>> >> >>      linear system matrix = precond matrix:
>> >> >>      Mat Object: 2 MPI processes
>> >> >>        type: mpiaij
>> >> >>        rows=6, cols=6, bs=6
>> >> >>        total: nonzeros=36, allocated nonzeros=36
>> >> >>        total number of mallocs used during MatSetValues calls =0
>> >> >>          using I-node (on process 0) routines: found 2 nodes, limit
>> >> >> used is 5
>> >> >>
>> >> >> Note that the coarse grid is now using superlu_dist. Is the coarse
>> >> >> grid being solved in parallel?
>> >> >>
>> >> >> Garth
>> >> >
>> >
>> >
>>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170427/c4f149a7/attachment-0001.html>