[petsc-users] GAMG for the unsymmetrical matrix

Thu Apr 13 10:12:50 CDT 2017

On Wed, Apr 12, 2017 at 1:31 PM, Kong, Fande <fande.kong at inl.gov> wrote:

> Hi Mark,
>
> Thanks for your reply.
>
> On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
>> The problem comes from setting the number of MG levels (-pc_mg_levels 2).
>> Not your fault, it looks like the GAMG logic is faulty, in your version at
>> least.
>>
>
> What I want is that GAMG coarsens the fine matrix once and then stops
> doing anything.  I did not see any benefits to have more levels if the
> number of processors is small.
>

The number of levels is a math issue and has nothing to do with
parallelism. If you do just one level your coarse grid is very large and
expensive to solve, so you want to keep coarsening. There is rarely a need
to use -pc_mg_levels

>
>
>>
>> GAMG will force the coarsest grid to one processor by default, in newer
>> versions. You can override the default with:
>>
>> -pc_gamg_use_parallel_coarse_grid_solver
>>
>> Your coarse grid solver is ASM with these 37 equation per process and 512
>> processes. That is bad.
>>
>
> Why this is bad? The subdomain problem is too small?
>

Because ASM with 512 blocks is a weak solver. You want the coarse grid to
be solved exactly.

>
>
>> Note, you could run this on one process to see the proper convergence
>> rate.
>>
>
> Convergence rate for which part? coarse solver, subdomain solver?
>

The overall converge rate.

>
>
>> You can fix this with parameters:
>>
>> >   -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations
>> per process on coarse grids (PCGAMGSetProcEqLim)
>> >   -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the
>> coarse grid (PCGAMGSetCoarseEqLim)
>>
>> If you really want two levels then set something like
>> -pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145
>> (or higher).
>>
>
>
> May have something like: make the coarse problem 1/8 large as the original
> problem? Otherwise, this number is just problem dependent.
>

GAMG will stop automatically so that you do not need problem dependant
parameters.

>
>
>
>> You can run with -info and grep on GAMG and you will meta-data for each
>> level. you should see "npe=1" for the coarsest, last, grid. Or use a
>> parallel direct solver.
>>
>
> I will try.
>
>
>>
>> Note, you should not see much degradation as you increase the number of
>> levels. 18145 eqs on a 3D problem will probably be noticeable. I generally
>> aim for about 3000.
>>
>
> It should be fine as long as the coarse problem is solved by a parallel
> solver.
>

>
> Fande,
>
>
>>
>>
>> On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>>
>>>
>>>
>>> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> You seem to have two levels here and 3M eqs on the fine grid and 37 on
>>>> the coarse grid.
>>>
>>>
>>> 37 is on the sub domain.
>>>
>>>  rows=18145, cols=18145 on the entire coarse grid.
>>>
>>>
>>>
>>>
>>>
>>>> I don't understand that.
>>>>
>>>> You are also calling the AMG setup a lot, but not spending much time
>>>> in it. Try running with -info and grep on "GAMG".
>>>>
>>>>
>>>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>>>> > Thanks, Barry.
>>>> >
>>>> > It works.
>>>> >
>>>> > GAMG is three times better than ASM in terms of the number of linear
>>>> > iterations, but it is five times slower than ASM. Any suggestions to
>>>> improve
>>>> > the performance of GAMG? Log files are attached.
>>>> >
>>>> > Fande,
>>>> >
>>>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>> wrote:
>>>> >>
>>>> >>
>>>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov>
>>>> wrote:
>>>> >> >
>>>> >> > Thanks, Mark and Barry,
>>>> >> >
>>>> >> > It works pretty wells in terms of the number of linear iterations
>>>> (using
>>>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute
>>>> time. I am
>>>> >> > using the two-level method via "-pc_mg_levels 2". The reason why
>>>> the compute
>>>> >> > time is larger than other preconditioning options is that a matrix
>>>> free
>>>> >> > method is used in the fine level and in my particular problem the
>>>> function
>>>> >> > evaluation is expensive.
>>>> >> >
>>>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free
>>>> Newton,
>>>> >> > but I do not think I want to make the preconditioning part
>>>> matrix-free.  Do
>>>> >> > you guys know how to turn off the matrix-free method for GAMG?
>>>> >>
>>>> >>    -pc_use_amat false
>>>> >>
>>>> >> >
>>>> >> > Here is the detailed solver:
>>>> >> >
>>>> >> > SNES Object: 384 MPI processes
>>>> >> >   type: newtonls
>>>> >> >   maximum iterations=200, maximum function evaluations=10000
>>>> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>>>> >> >   total number of linear solver iterations=20
>>>> >> >   total number of function evaluations=166
>>>> >> >   norm schedule ALWAYS
>>>> >> >   SNESLineSearch Object:   384 MPI processes
>>>> >> >     type: bt
>>>> >> >       interpolation: cubic
>>>> >> >       alpha=1.000000e-04
>>>> >> >     maxstep=1.000000e+08, minlambda=1.000000e-12
>>>> >> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>>> >> > lambda=1.000000e-08
>>>> >> >     maximum iterations=40
>>>> >> >   KSP Object:   384 MPI processes
>>>> >> >     type: gmres
>>>> >> >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>>>> >> > Orthogonalization with no iterative refinement
>>>> >> >       GMRES: happy breakdown tolerance 1e-30
>>>> >> >     maximum iterations=100, initial guess is zero
>>>> >> >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
>>>> >> >     right preconditioning
>>>> >> >     using UNPRECONDITIONED norm type for convergence test
>>>> >> >   PC Object:   384 MPI processes
>>>> >> >     type: gamg
>>>> >> >       MG: type is MULTIPLICATIVE, levels=2 cycles=v
>>>> >> >         Cycles per PCApply=1
>>>> >> >         Using Galerkin computed coarse grid matrices
>>>> >> >         GAMG specific options
>>>> >> >           Threshold for dropping small values from graph 0.
>>>> >> >           AGG specific options
>>>> >> >             Symmetric graph true
>>>> >> >     Coarse grid solver -- level -------------------------------
>>>> >> >       KSP Object:      (mg_coarse_)       384 MPI processes
>>>> >> >         type: preonly
>>>> >> >         maximum iterations=10000, initial guess is zero
>>>> >> >         tolerances:  relative=1e-05, absolute=1e-50,
>>>> divergence=10000.
>>>> >> >         left preconditioning
>>>> >> >         using NONE norm type for convergence test
>>>> >> >       PC Object:      (mg_coarse_)       384 MPI processes
>>>> >> >         type: bjacobi
>>>> >> >           block Jacobi: number of blocks = 384
>>>> >> >           Local solve is same for all blocks, in the following KSP
>>>> and
>>>> >> > PC objects:
>>>> >> >         KSP Object:        (mg_coarse_sub_)         1 MPI processes
>>>> >> >           type: preonly
>>>> >> >           maximum iterations=1, initial guess is zero
>>>> >> >           tolerances:  relative=1e-05, absolute=1e-50,
>>>> divergence=10000.
>>>> >> >           left preconditioning
>>>> >> >           using NONE norm type for convergence test
>>>> >> >         PC Object:        (mg_coarse_sub_)         1 MPI processes
>>>> >> >           type: lu
>>>> >> >             LU: out-of-place factorization
>>>> >> >             tolerance for zero pivot 2.22045e-14
>>>> >> >             using diagonal shift on blocks to prevent zero pivot
>>>> >> > [INBLOCKS]
>>>> >> >             matrix ordering: nd
>>>> >> >             factor fill ratio given 5., needed 1.31367
>>>> >> >               Factored matrix follows:
>>>> >> >                 Mat Object:                 1 MPI processes
>>>> >> >                   type: seqaij
>>>> >> >                   rows=37, cols=37
>>>> >> >                   package used to perform factorization: petsc
>>>> >> >                   total: nonzeros=913, allocated nonzeros=913
>>>> >> >                   total number of mallocs used during MatSetValues
>>>> calls
>>>> >> > =0
>>>> >> >                     not using I-node routines
>>>> >> >           linear system matrix = precond matrix:
>>>> >> >           Mat Object:           1 MPI processes
>>>> >> >             type: seqaij
>>>> >> >             rows=37, cols=37
>>>> >> >             total: nonzeros=695, allocated nonzeros=695
>>>> >> >             total number of mallocs used during MatSetValues calls
>>>> =0
>>>> >> >               not using I-node routines
>>>> >> >         linear system matrix = precond matrix:
>>>> >> >         Mat Object:         384 MPI processes
>>>> >> >           type: mpiaij
>>>> >> >           rows=18145, cols=18145
>>>> >> >           total: nonzeros=1709115, allocated nonzeros=1709115
>>>> >> >           total number of mallocs used during MatSetValues calls =0
>>>> >> >             not using I-node (on process 0) routines
>>>> >> >     Down solver (pre-smoother) on level 1
>>>> >> > -------------------------------
>>>> >> >       KSP Object:      (mg_levels_1_)       384 MPI processes
>>>> >> >         type: chebyshev
>>>> >> >           Chebyshev: eigenvalue estimates:  min = 0.133339, max =
>>>> >> > 1.46673
>>>> >> >           Chebyshev: eigenvalues estimated using gmres with
>>>> translations
>>>> >> > [0. 0.1; 0. 1.1]
>>>> >> >           KSP Object:          (mg_levels_1_esteig_)           384
>>>> MPI
>>>> >> > processes
>>>> >> >             type: gmres
>>>> >> >               GMRES: restart=30, using Classical (unmodified)
>>>> >> > Gram-Schmidt Orthogonalization with no iterative refinement
>>>> >> >               GMRES: happy breakdown tolerance 1e-30
>>>> >> >             maximum iterations=10, initial guess is zero
>>>> >> >             tolerances:  relative=1e-12, absolute=1e-50,
>>>> >> > divergence=10000.
>>>> >> >             left preconditioning
>>>> >> >             using PRECONDITIONED norm type for convergence test
>>>> >> >         maximum iterations=2
>>>> >> >         tolerances:  relative=1e-05, absolute=1e-50,
>>>> divergence=10000.
>>>> >> >         left preconditioning
>>>> >> >         using nonzero initial guess
>>>> >> >         using NONE norm type for convergence test
>>>> >> >       PC Object:      (mg_levels_1_)       384 MPI processes
>>>> >> >         type: sor
>>>> >> >           SOR: type = local_symmetric, iterations = 1, local
>>>> iterations
>>>> >> > = 1, omega = 1.
>>>> >> >         linear system matrix followed by preconditioner matrix:
>>>> >> >         Mat Object:         384 MPI processes
>>>> >> >           type: mffd
>>>> >> >           rows=3020875, cols=3020875
>>>> >> >             Matrix-free approximation:
>>>> >> >               err=1.49012e-08 (relative error in function
>>>> evaluation)
>>>> >> >               Using wp compute h routine
>>>> >> >                   Does not compute normU
>>>> >> >         Mat Object:        ()         384 MPI processes
>>>> >> >           type: mpiaij
>>>> >> >           rows=3020875, cols=3020875
>>>> >> >           total: nonzeros=215671710, allocated nonzeros=241731750
>>>> >> >           total number of mallocs used during MatSetValues calls =0
>>>> >> >             not using I-node (on process 0) routines
>>>> >> >     Up solver (post-smoother) same as down solver (pre-smoother)
>>>> >> >     linear system matrix followed by preconditioner matrix:
>>>> >> >     Mat Object:     384 MPI processes
>>>> >> >       type: mffd
>>>> >> >       rows=3020875, cols=3020875
>>>> >> >         Matrix-free approximation:
>>>> >> >           err=1.49012e-08 (relative error in function evaluation)
>>>> >> >           Using wp compute h routine
>>>> >> >               Does not compute normU
>>>> >> >     Mat Object:    ()     384 MPI processes
>>>> >> >       type: mpiaij
>>>> >> >       rows=3020875, cols=3020875
>>>> >> >       total: nonzeros=215671710, allocated nonzeros=241731750
>>>> >> >       total number of mallocs used during MatSetValues calls =0
>>>> >> >         not using I-node (on process 0) routines
>>>> >> >
>>>> >> >
>>>> >> > Fande,
>>>> >> >
>>>> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov>
>>>> wrote:
>>>> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov>
>>>> wrote:
>>>> >> > >
>>>> >> > >> Does this mean that GAMG works for the symmetrical matrix only?
>>>> >> > >
>>>> >> > >   No, it means that for non symmetric nonzero structure you need
>>>> the
>>>> >> > > extra flag. So use the extra flag. The reason we don't always
>>>> use the flag
>>>> >> > > is because it adds extra cost and isn't needed if the matrix
>>>> already has a
>>>> >> > > symmetric nonzero structure.
>>>> >> >
>>>> >> > BTW, if you have symmetric non-zero structure you can just set
>>>> >> > -pc_gamg_threshold -1.0', note the "or" in the message.
>>>> >> >
>>>> >> > If you want to mess with the threshold then you need to use the
>>>> >> > symmetrized flag.
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170413/6d96784c/attachment-0001.html>