[petsc-users] Multigrid coarse grid solver

Thu Apr 27 10:41:22 CDT 2017

> On Apr 27, 2017, at 8:27 AM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
> 
> On 27 April 2017 at 13:45, Mark Adams <mfadams at lbl.gov> wrote:
>> Barry, we seem to get an error when you explicitly set this.
>> 
>> Garth, Maybe to set the default explicitly you need to use pc_type asm
>> -sub_pc_type lu. That is the true default.
>> 
>> More below but this is the error message:
>> 
>> 17:46 knepley/feature-plasma-example *=
>> ~/Codes/petsc/src/ksp/ksp/examples/tutorials$
>> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne
>> 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
>> -mg_coarse_pc_factor_mat_solver_package petsc
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: See
>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for
>> possible LU and Cholesky solvers
>> [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij
>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
>> trouble shooting.
>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da  GIT
>> Date: 2017-04-25 12:56:20 -0500
>> [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by
>> markadams Wed Apr 26 17:46:28 2017
>> [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++
>> COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F
>> 
>> 
>> On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
>>> 
>>> On 27 April 2017 at 00:30, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>> 
>>>>  Yes, you asked for LU so it used LU!
>>>> 
>>>>   Of course for smaller coarse grids and large numbers of processes
>>>> this is very inefficient.
>>>> 
>>>>   The default behavior for GAMG is probably what you want. In that case
>>>> it is equivalent to
>>>> -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu.  But GAMG tries
>>>> hard to put all the coarse grid degrees
>>>> of freedom on the first process and none on the rest, so you do end up
>>>> with the exact equivalent of a direct solver.
>>>> Try -ksp_view in that case.
>>>> 
>>> 
>>> Thanks, Barry.
>>> 
>>> I'm struggling a little to understand the matrix data structure for
>>> the coarse grid. Is it just a mpiaji matrix, with all entries
>>> (usually) on one process?
>> 
>> 
>> Yes.
>> 
>>> 
>>> 
>>> Is there an options key prefix for the matrix on different levels?
>>> E.g., to turn on a viewer?
>> 
>> 
>> something like -mg_level_1_ksp_view should work (run with -help to get the
>> correct syntax).
>> 
> 
> Does the matrix operator(s) associated with the ksp have an options prefix?

   No, because the matrices are created independent of the KSP/PC infrastructure.

   You can use -mg_coarse_ksp_view_pmat  to print the matrix for just the coarse level; and do things like -mg_coarse_ksp_view_pmat ::ascii_info to display information about the matrix; 

> 
>>> 
>>> 
>>> If I get GAMG to use more than one process for the coarse grid (a GAMG
>>> setting), can I get a parallel LU (exact) solver to solve it using
>>> only the processes that store parts of the coarse grid matrix?
>> 
>> 
>> No, we should make a sub communicator for the active processes only, but I
>> am not too motivated to do this because the only reason that this matters is
>> if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions
>> everywhere for not good reason, or 2) you use a Krylov solver (very
>> uncommon). All of the communication in a non-krylov solver in point to point
>> and there is no win that I know of with a sub communicator.
>> 
>> Note, the redundant coarse grid solver does use a subcommuncator, obviously,
>> but I think it is hardwired to PETSC_COMM_SELF, but maybe not?
>> 
>>> 
>>> 
>>> Related to all this, do the parallel LU solvers internally
>>> re-distribute a matrix over the whole MPI communicator as part of
>>> their re-ordering phase?
>> 
>> 
>> They better not!
>> 
> 
> I did a test with MUMPS, and from the MUMPS diagnostics (memory use
> per process) it appears that it does split the matrix across all
> processes.
> 
> Garth
> 
>> I doubt any solver would be that eager by default.
>> 
>>> 
>>> 
>>> Garth
>>> 
>>>>   There is also -mg_coarse_pc_type redundant
>>>> -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse
>>>> matrix on EACH process and each process does its own factorization and
>>>> solve. This saves one phase of the communication for each V cycle since
>>>> every process has the entire solution it just grabs from itself the values
>>>> it needs without communication.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <gnw20 at cam.ac.uk> wrote:
>>>>> 
>>>>> I'm a bit confused by the selection of the coarse grid solver for
>>>>> multigrid. For the demo ksp/ex56, if I do:
>>>>> 
>>>>>   mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg
>>>>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
>>>>> 
>>>>> I see
>>>>> 
>>>>> Coarse grid solver -- level -------------------------------
>>>>>   KSP Object: (mg_coarse_) 1 MPI processes
>>>>>     type: preonly
>>>>>     maximum iterations=10000, initial guess is zero
>>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>     left preconditioning
>>>>>     using NONE norm type for convergence test
>>>>>   PC Object: (mg_coarse_) 1 MPI processes
>>>>>     type: lu
>>>>>       out-of-place factorization
>>>>>       tolerance for zero pivot 2.22045e-14
>>>>>       matrix ordering: nd
>>>>>       factor fill ratio given 5., needed 1.
>>>>>         Factored matrix follows:
>>>>>           Mat Object: 1 MPI processes
>>>>>             type: seqaij
>>>>>             rows=6, cols=6, bs=6
>>>>>             package used to perform factorization: petsc
>>>>>             total: nonzeros=36, allocated nonzeros=36
>>>>>             total number of mallocs used during MatSetValues calls =0
>>>>>               using I-node routines: found 2 nodes, limit used is 5
>>>>>     linear system matrix = precond matrix:
>>>>>     Mat Object: 1 MPI processes
>>>>>       type: seqaij
>>>>>       rows=6, cols=6, bs=6
>>>>>       total: nonzeros=36, allocated nonzeros=36
>>>>>       total number of mallocs used during MatSetValues calls =0
>>>>>         using I-node routines: found 2 nodes, limit used is 5
>>>>> 
>>>>> which is what I expect. Increasing from 1 to 2 processes:
>>>>> 
>>>>>   mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg
>>>>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
>>>>> 
>>>>> I see
>>>>> 
>>>>> Coarse grid solver -- level -------------------------------
>>>>>   KSP Object: (mg_coarse_) 2 MPI processes
>>>>>     type: preonly
>>>>>     maximum iterations=10000, initial guess is zero
>>>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>     left preconditioning
>>>>>     using NONE norm type for convergence test
>>>>>   PC Object: (mg_coarse_) 2 MPI processes
>>>>>     type: lu
>>>>>       out-of-place factorization
>>>>>       tolerance for zero pivot 2.22045e-14
>>>>>       matrix ordering: natural
>>>>>       factor fill ratio given 0., needed 0.
>>>>>         Factored matrix follows:
>>>>>           Mat Object: 2 MPI processes
>>>>>             type: superlu_dist
>>>>>             rows=6, cols=6
>>>>>             package used to perform factorization: superlu_dist
>>>>>             total: nonzeros=0, allocated nonzeros=0
>>>>>             total number of mallocs used during MatSetValues calls =0
>>>>>               SuperLU_DIST run parameters:
>>>>>                 Process grid nprow 2 x npcol 1
>>>>>                 Equilibrate matrix TRUE
>>>>>                 Matrix input mode 1
>>>>>                 Replace tiny pivots FALSE
>>>>>                 Use iterative refinement FALSE
>>>>>                 Processors in row 2 col partition 1
>>>>>                 Row permutation LargeDiag
>>>>>                 Column permutation METIS_AT_PLUS_A
>>>>>                 Parallel symbolic factorization FALSE
>>>>>                 Repeated factorization SamePattern
>>>>>     linear system matrix = precond matrix:
>>>>>     Mat Object: 2 MPI processes
>>>>>       type: mpiaij
>>>>>       rows=6, cols=6, bs=6
>>>>>       total: nonzeros=36, allocated nonzeros=36
>>>>>       total number of mallocs used during MatSetValues calls =0
>>>>>         using I-node (on process 0) routines: found 2 nodes, limit
>>>>> used is 5
>>>>> 
>>>>> Note that the coarse grid is now using superlu_dist. Is the coarse
>>>>> grid being solved in parallel?
>>>>> 
>>>>> Garth
>>>> 
>> 
>>