[petsc-users] GAMG for the unsymmetrical matrix

Wed Apr 12 12:31:57 CDT 2017

Hi Mark,

Thanks for your reply.

On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams <mfadams at lbl.gov> wrote:

> The problem comes from setting the number of MG levels (-pc_mg_levels 2).
> Not your fault, it looks like the GAMG logic is faulty, in your version at
> least.
>

What I want is that GAMG coarsens the fine matrix once and then stops doing
anything.  I did not see any benefits to have more levels if the number of
processors is small.

>
> GAMG will force the coarsest grid to one processor by default, in newer
> versions. You can override the default with:
>
> -pc_gamg_use_parallel_coarse_grid_solver
>
> Your coarse grid solver is ASM with these 37 equation per process and 512
> processes. That is bad.
>

Why this is bad? The subdomain problem is too small?

> Note, you could run this on one process to see the proper convergence
> rate.
>

Convergence rate for which part? coarse solver, subdomain solver?

> You can fix this with parameters:
>
> >   -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations
> per process on coarse grids (PCGAMGSetProcEqLim)
> >   -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the
> coarse grid (PCGAMGSetCoarseEqLim)
>
> If you really want two levels then set something like
> -pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145
> (or higher).
>

May have something like: make the coarse problem 1/8 large as the original
problem? Otherwise, this number is just problem dependent.

> You can run with -info and grep on GAMG and you will meta-data for each
> level. you should see "npe=1" for the coarsest, last, grid. Or use a
> parallel direct solver.
>

I will try.

>
> Note, you should not see much degradation as you increase the number of
> levels. 18145 eqs on a 3D problem will probably be noticeable. I generally
> aim for about 3000.
>

It should be fine as long as the coarse problem is solved by a parallel
solver.

Fande,

>
>
> On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>
>>
>>
>> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> You seem to have two levels here and 3M eqs on the fine grid and 37 on
>>> the coarse grid.
>>
>>
>> 37 is on the sub domain.
>>
>>  rows=18145, cols=18145 on the entire coarse grid.
>>
>>
>>
>>
>>
>>> I don't understand that.
>>>
>>> You are also calling the AMG setup a lot, but not spending much time
>>> in it. Try running with -info and grep on "GAMG".
>>>
>>>
>>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>>> > Thanks, Barry.
>>> >
>>> > It works.
>>> >
>>> > GAMG is three times better than ASM in terms of the number of linear
>>> > iterations, but it is five times slower than ASM. Any suggestions to
>>> improve
>>> > the performance of GAMG? Log files are attached.
>>> >
>>> > Fande,
>>> >
>>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >>
>>> >>
>>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov> wrote:
>>> >> >
>>> >> > Thanks, Mark and Barry,
>>> >> >
>>> >> > It works pretty wells in terms of the number of linear iterations
>>> (using
>>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
>>> I am
>>> >> > using the two-level method via "-pc_mg_levels 2". The reason why
>>> the compute
>>> >> > time is larger than other preconditioning options is that a matrix
>>> free
>>> >> > method is used in the fine level and in my particular problem the
>>> function
>>> >> > evaluation is expensive.
>>> >> >
>>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free
>>> Newton,
>>> >> > but I do not think I want to make the preconditioning part
>>> matrix-free.  Do
>>> >> > you guys know how to turn off the matrix-free method for GAMG?
>>> >>
>>> >>    -pc_use_amat false
>>> >>
>>> >> >
>>> >> > Here is the detailed solver:
>>> >> >
>>> >> > SNES Object: 384 MPI processes
>>> >> >   type: newtonls
>>> >> >   maximum iterations=200, maximum function evaluations=10000
>>> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>>> >> >   total number of linear solver iterations=20
>>> >> >   total number of function evaluations=166
>>> >> >   norm schedule ALWAYS
>>> >> >   SNESLineSearch Object:   384 MPI processes
>>> >> >     type: bt
>>> >> >       interpolation: cubic
>>> >> >       alpha=1.000000e-04
>>> >> >     maxstep=1.000000e+08, minlambda=1.000000e-12
>>> >> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>> >> > lambda=1.000000e-08
>>> >> >     maximum iterations=40
>>> >> >   KSP Object:   384 MPI processes
>>> >> >     type: gmres
>>> >> >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>>> >> > Orthogonalization with no iterative refinement
>>> >> >       GMRES: happy breakdown tolerance 1e-30
>>> >> >     maximum iterations=100, initial guess is zero
>>> >> >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
>>> >> >     right preconditioning
>>> >> >     using UNPRECONDITIONED norm type for convergence test
>>> >> >   PC Object:   384 MPI processes
>>> >> >     type: gamg
>>> >> >       MG: type is MULTIPLICATIVE, levels=2 cycles=v
>>> >> >         Cycles per PCApply=1
>>> >> >         Using Galerkin computed coarse grid matrices
>>> >> >         GAMG specific options
>>> >> >           Threshold for dropping small values from graph 0.
>>> >> >           AGG specific options
>>> >> >             Symmetric graph true
>>> >> >     Coarse grid solver -- level -------------------------------
>>> >> >       KSP Object:      (mg_coarse_)       384 MPI processes
>>> >> >         type: preonly
>>> >> >         maximum iterations=10000, initial guess is zero
>>> >> >         tolerances:  relative=1e-05, absolute=1e-50,
>>> divergence=10000.
>>> >> >         left preconditioning
>>> >> >         using NONE norm type for convergence test
>>> >> >       PC Object:      (mg_coarse_)       384 MPI processes
>>> >> >         type: bjacobi
>>> >> >           block Jacobi: number of blocks = 384
>>> >> >           Local solve is same for all blocks, in the following KSP
>>> and
>>> >> > PC objects:
>>> >> >         KSP Object:        (mg_coarse_sub_)         1 MPI processes
>>> >> >           type: preonly
>>> >> >           maximum iterations=1, initial guess is zero
>>> >> >           tolerances:  relative=1e-05, absolute=1e-50,
>>> divergence=10000.
>>> >> >           left preconditioning
>>> >> >           using NONE norm type for convergence test
>>> >> >         PC Object:        (mg_coarse_sub_)         1 MPI processes
>>> >> >           type: lu
>>> >> >             LU: out-of-place factorization
>>> >> >             tolerance for zero pivot 2.22045e-14
>>> >> >             using diagonal shift on blocks to prevent zero pivot
>>> >> > [INBLOCKS]
>>> >> >             matrix ordering: nd
>>> >> >             factor fill ratio given 5., needed 1.31367
>>> >> >               Factored matrix follows:
>>> >> >                 Mat Object:                 1 MPI processes
>>> >> >                   type: seqaij
>>> >> >                   rows=37, cols=37
>>> >> >                   package used to perform factorization: petsc
>>> >> >                   total: nonzeros=913, allocated nonzeros=913
>>> >> >                   total number of mallocs used during MatSetValues
>>> calls
>>> >> > =0
>>> >> >                     not using I-node routines
>>> >> >           linear system matrix = precond matrix:
>>> >> >           Mat Object:           1 MPI processes
>>> >> >             type: seqaij
>>> >> >             rows=37, cols=37
>>> >> >             total: nonzeros=695, allocated nonzeros=695
>>> >> >             total number of mallocs used during MatSetValues calls
>>> =0
>>> >> >               not using I-node routines
>>> >> >         linear system matrix = precond matrix:
>>> >> >         Mat Object:         384 MPI processes
>>> >> >           type: mpiaij
>>> >> >           rows=18145, cols=18145
>>> >> >           total: nonzeros=1709115, allocated nonzeros=1709115
>>> >> >           total number of mallocs used during MatSetValues calls =0
>>> >> >             not using I-node (on process 0) routines
>>> >> >     Down solver (pre-smoother) on level 1
>>> >> > -------------------------------
>>> >> >       KSP Object:      (mg_levels_1_)       384 MPI processes
>>> >> >         type: chebyshev
>>> >> >           Chebyshev: eigenvalue estimates:  min = 0.133339, max =
>>> >> > 1.46673
>>> >> >           Chebyshev: eigenvalues estimated using gmres with
>>> translations
>>> >> > [0. 0.1; 0. 1.1]
>>> >> >           KSP Object:          (mg_levels_1_esteig_)           384
>>> MPI
>>> >> > processes
>>> >> >             type: gmres
>>> >> >               GMRES: restart=30, using Classical (unmodified)
>>> >> > Gram-Schmidt Orthogonalization with no iterative refinement
>>> >> >               GMRES: happy breakdown tolerance 1e-30
>>> >> >             maximum iterations=10, initial guess is zero
>>> >> >             tolerances:  relative=1e-12, absolute=1e-50,
>>> >> > divergence=10000.
>>> >> >             left preconditioning
>>> >> >             using PRECONDITIONED norm type for convergence test
>>> >> >         maximum iterations=2
>>> >> >         tolerances:  relative=1e-05, absolute=1e-50,
>>> divergence=10000.
>>> >> >         left preconditioning
>>> >> >         using nonzero initial guess
>>> >> >         using NONE norm type for convergence test
>>> >> >       PC Object:      (mg_levels_1_)       384 MPI processes
>>> >> >         type: sor
>>> >> >           SOR: type = local_symmetric, iterations = 1, local
>>> iterations
>>> >> > = 1, omega = 1.
>>> >> >         linear system matrix followed by preconditioner matrix:
>>> >> >         Mat Object:         384 MPI processes
>>> >> >           type: mffd
>>> >> >           rows=3020875, cols=3020875
>>> >> >             Matrix-free approximation:
>>> >> >               err=1.49012e-08 (relative error in function
>>> evaluation)
>>> >> >               Using wp compute h routine
>>> >> >                   Does not compute normU
>>> >> >         Mat Object:        ()         384 MPI processes
>>> >> >           type: mpiaij
>>> >> >           rows=3020875, cols=3020875
>>> >> >           total: nonzeros=215671710, allocated nonzeros=241731750
>>> >> >           total number of mallocs used during MatSetValues calls =0
>>> >> >             not using I-node (on process 0) routines
>>> >> >     Up solver (post-smoother) same as down solver (pre-smoother)
>>> >> >     linear system matrix followed by preconditioner matrix:
>>> >> >     Mat Object:     384 MPI processes
>>> >> >       type: mffd
>>> >> >       rows=3020875, cols=3020875
>>> >> >         Matrix-free approximation:
>>> >> >           err=1.49012e-08 (relative error in function evaluation)
>>> >> >           Using wp compute h routine
>>> >> >               Does not compute normU
>>> >> >     Mat Object:    ()     384 MPI processes
>>> >> >       type: mpiaij
>>> >> >       rows=3020875, cols=3020875
>>> >> >       total: nonzeros=215671710, allocated nonzeros=241731750
>>> >> >       total number of mallocs used during MatSetValues calls =0
>>> >> >         not using I-node (on process 0) routines
>>> >> >
>>> >> >
>>> >> > Fande,
>>> >> >
>>> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >> > >
>>> >> > >> Does this mean that GAMG works for the symmetrical matrix only?
>>> >> > >
>>> >> > >   No, it means that for non symmetric nonzero structure you need
>>> the
>>> >> > > extra flag. So use the extra flag. The reason we don't always use
>>> the flag
>>> >> > > is because it adds extra cost and isn't needed if the matrix
>>> already has a
>>> >> > > symmetric nonzero structure.
>>> >> >
>>> >> > BTW, if you have symmetric non-zero structure you can just set
>>> >> > -pc_gamg_threshold -1.0', note the "or" in the message.
>>> >> >
>>> >> > If you want to mess with the threshold then you need to use the
>>> >> > symmetrized flag.
>>> >> >
>>> >>
>>> >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170412/5b1de37b/attachment-0001.html>