[petsc-users] Multigrid coarse grid solver

Thu Apr 27 10:46:30 CDT 2017

> On Apr 27, 2017, at 12:59 AM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
> 
> On 27 April 2017 at 00:30, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> 
>>  Yes, you asked for LU so it used LU!
>> 
>>   Of course for smaller coarse grids and large numbers of processes this is very inefficient.
>> 
>>   The default behavior for GAMG is probably what you want. In that case it is equivalent to
>> -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu.  But GAMG tries hard to put all the coarse grid degrees
>> of freedom on the first process and none on the rest, so you do end up with the exact equivalent of a direct solver.
>> Try -ksp_view in that case.
>> 
> 
> Thanks, Barry.
> 
> I'm struggling a little to understand the matrix data structure for
> the coarse grid. Is it just a mpiaji matrix, with all entries
> (usually) on one process?

   Yes, when using GAMG

> 
> Is there an options key prefix for the matrix on different levels?
> E.g., to turn on a viewer?
> 
> If I get GAMG to use more than one process for the coarse grid (a GAMG
> setting), can I get a parallel LU (exact) solver to solve it using
> only the processes that store parts of the coarse grid matrix?

    See below
> 
> Related to all this, do the parallel LU solvers internally
> re-distribute a matrix over the whole MPI communicator as part of
> their re-ordering phase?

   This is up to each package, superlu_dist, mumps etc. certainly they could have lots of logic based on matrix size, number of processes, available memory, switch bandwidth to pick an optimal number of processes to use for each factorization but that is complicated I suspect they just use all the resources they can, and so perform very poorly for multiple processes and small problems.

   What is wrong with just leaving the default which uses a single process for the coarse grid solve? The only time you don't want that is when the coarse problem is very large, then you can use PCREDUNDANT (which allows solves on subsets) or PCTELESCOPE which is more general.

   Barry

> 
> Garth
> 
>>   There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse matrix on EACH process and each process does its own factorization and solve. This saves one phase of the communication for each V cycle since every process has the entire solution it just grabs from itself the values it needs without communication.
>> 
>> 
>> 
>> 
>>> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
>>> 
>>> I'm a bit confused by the selection of the coarse grid solver for
>>> multigrid. For the demo ksp/ex56, if I do:
>>> 
>>>   mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg
>>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
>>> 
>>> I see
>>> 
>>> Coarse grid solver -- level -------------------------------
>>>   KSP Object: (mg_coarse_) 1 MPI processes
>>>     type: preonly
>>>     maximum iterations=10000, initial guess is zero
>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>     left preconditioning
>>>     using NONE norm type for convergence test
>>>   PC Object: (mg_coarse_) 1 MPI processes
>>>     type: lu
>>>       out-of-place factorization
>>>       tolerance for zero pivot 2.22045e-14
>>>       matrix ordering: nd
>>>       factor fill ratio given 5., needed 1.
>>>         Factored matrix follows:
>>>           Mat Object: 1 MPI processes
>>>             type: seqaij
>>>             rows=6, cols=6, bs=6
>>>             package used to perform factorization: petsc
>>>             total: nonzeros=36, allocated nonzeros=36
>>>             total number of mallocs used during MatSetValues calls =0
>>>               using I-node routines: found 2 nodes, limit used is 5
>>>     linear system matrix = precond matrix:
>>>     Mat Object: 1 MPI processes
>>>       type: seqaij
>>>       rows=6, cols=6, bs=6
>>>       total: nonzeros=36, allocated nonzeros=36
>>>       total number of mallocs used during MatSetValues calls =0
>>>         using I-node routines: found 2 nodes, limit used is 5
>>> 
>>> which is what I expect. Increasing from 1 to 2 processes:
>>> 
>>>   mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg
>>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
>>> 
>>> I see
>>> 
>>> Coarse grid solver -- level -------------------------------
>>>   KSP Object: (mg_coarse_) 2 MPI processes
>>>     type: preonly
>>>     maximum iterations=10000, initial guess is zero
>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>     left preconditioning
>>>     using NONE norm type for convergence test
>>>   PC Object: (mg_coarse_) 2 MPI processes
>>>     type: lu
>>>       out-of-place factorization
>>>       tolerance for zero pivot 2.22045e-14
>>>       matrix ordering: natural
>>>       factor fill ratio given 0., needed 0.
>>>         Factored matrix follows:
>>>           Mat Object: 2 MPI processes
>>>             type: superlu_dist
>>>             rows=6, cols=6
>>>             package used to perform factorization: superlu_dist
>>>             total: nonzeros=0, allocated nonzeros=0
>>>             total number of mallocs used during MatSetValues calls =0
>>>               SuperLU_DIST run parameters:
>>>                 Process grid nprow 2 x npcol 1
>>>                 Equilibrate matrix TRUE
>>>                 Matrix input mode 1
>>>                 Replace tiny pivots FALSE
>>>                 Use iterative refinement FALSE
>>>                 Processors in row 2 col partition 1
>>>                 Row permutation LargeDiag
>>>                 Column permutation METIS_AT_PLUS_A
>>>                 Parallel symbolic factorization FALSE
>>>                 Repeated factorization SamePattern
>>>     linear system matrix = precond matrix:
>>>     Mat Object: 2 MPI processes
>>>       type: mpiaij
>>>       rows=6, cols=6, bs=6
>>>       total: nonzeros=36, allocated nonzeros=36
>>>       total number of mallocs used during MatSetValues calls =0
>>>         using I-node (on process 0) routines: found 2 nodes, limit used is 5
>>> 
>>> Note that the coarse grid is now using superlu_dist. Is the coarse
>>> grid being solved in parallel?
>>> 
>>> Garth
>>