[petsc-users] GAMG for the unsymmetrical matrix

Wed Apr 12 10:16:02 CDT 2017

The problem comes from setting the number of MG levels (-pc_mg_levels 2).
Not your fault, it looks like the GAMG logic is faulty, in your version at
least.

GAMG will force the coarsest grid to one processor by default, in newer
versions. You can override the default with:

-pc_gamg_use_parallel_coarse_grid_solver

Your coarse grid solver is ASM with these 37 equation per process and 512
processes. That is bad. Note, you could run this on one process to see the
proper convergence rate.  You can fix this with parameters:

>   -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations per
process on coarse grids (PCGAMGSetProcEqLim)
>   -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the
coarse grid (PCGAMGSetCoarseEqLim)

If you really want two levels then set something like
-pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145
(or higher). You can run with -info and grep on GAMG and you will meta-data
for each level. you should see "npe=1" for the coarsest, last, grid. Or use
a parallel direct solver.

Note, you should not see much degradation as you increase the number of
levels. 18145 eqs on a 3D problem will probably be noticeable. I generally
aim for about 3000.

On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande <fande.kong at inl.gov> wrote:

>
>
> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
>> You seem to have two levels here and 3M eqs on the fine grid and 37 on
>> the coarse grid.
>
>
> 37 is on the sub domain.
>
>  rows=18145, cols=18145 on the entire coarse grid.
>
>
>
>
>
>> I don't understand that.
>>
>> You are also calling the AMG setup a lot, but not spending much time
>> in it. Try running with -info and grep on "GAMG".
>>
>>
>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>> > Thanks, Barry.
>> >
>> > It works.
>> >
>> > GAMG is three times better than ASM in terms of the number of linear
>> > iterations, but it is five times slower than ASM. Any suggestions to
>> improve
>> > the performance of GAMG? Log files are attached.
>> >
>> > Fande,
>> >
>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >>
>> >>
>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov> wrote:
>> >> >
>> >> > Thanks, Mark and Barry,
>> >> >
>> >> > It works pretty wells in terms of the number of linear iterations
>> (using
>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
>> I am
>> >> > using the two-level method via "-pc_mg_levels 2". The reason why the
>> compute
>> >> > time is larger than other preconditioning options is that a matrix
>> free
>> >> > method is used in the fine level and in my particular problem the
>> function
>> >> > evaluation is expensive.
>> >> >
>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
>> >> > but I do not think I want to make the preconditioning part
>> matrix-free.  Do
>> >> > you guys know how to turn off the matrix-free method for GAMG?
>> >>
>> >>    -pc_use_amat false
>> >>
>> >> >
>> >> > Here is the detailed solver:
>> >> >
>> >> > SNES Object: 384 MPI processes
>> >> >   type: newtonls
>> >> >   maximum iterations=200, maximum function evaluations=10000
>> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>> >> >   total number of linear solver iterations=20
>> >> >   total number of function evaluations=166
>> >> >   norm schedule ALWAYS
>> >> >   SNESLineSearch Object:   384 MPI processes
>> >> >     type: bt
>> >> >       interpolation: cubic
>> >> >       alpha=1.000000e-04
>> >> >     maxstep=1.000000e+08, minlambda=1.000000e-12
>> >> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>> >> > lambda=1.000000e-08
>> >> >     maximum iterations=40
>> >> >   KSP Object:   384 MPI processes
>> >> >     type: gmres
>> >> >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>> >> > Orthogonalization with no iterative refinement
>> >> >       GMRES: happy breakdown tolerance 1e-30
>> >> >     maximum iterations=100, initial guess is zero
>> >> >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
>> >> >     right preconditioning
>> >> >     using UNPRECONDITIONED norm type for convergence test
>> >> >   PC Object:   384 MPI processes
>> >> >     type: gamg
>> >> >       MG: type is MULTIPLICATIVE, levels=2 cycles=v
>> >> >         Cycles per PCApply=1
>> >> >         Using Galerkin computed coarse grid matrices
>> >> >         GAMG specific options
>> >> >           Threshold for dropping small values from graph 0.
>> >> >           AGG specific options
>> >> >             Symmetric graph true
>> >> >     Coarse grid solver -- level -------------------------------
>> >> >       KSP Object:      (mg_coarse_)       384 MPI processes
>> >> >         type: preonly
>> >> >         maximum iterations=10000, initial guess is zero
>> >> >         tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >> >         left preconditioning
>> >> >         using NONE norm type for convergence test
>> >> >       PC Object:      (mg_coarse_)       384 MPI processes
>> >> >         type: bjacobi
>> >> >           block Jacobi: number of blocks = 384
>> >> >           Local solve is same for all blocks, in the following KSP
>> and
>> >> > PC objects:
>> >> >         KSP Object:        (mg_coarse_sub_)         1 MPI processes
>> >> >           type: preonly
>> >> >           maximum iterations=1, initial guess is zero
>> >> >           tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >> >           left preconditioning
>> >> >           using NONE norm type for convergence test
>> >> >         PC Object:        (mg_coarse_sub_)         1 MPI processes
>> >> >           type: lu
>> >> >             LU: out-of-place factorization
>> >> >             tolerance for zero pivot 2.22045e-14
>> >> >             using diagonal shift on blocks to prevent zero pivot
>> >> > [INBLOCKS]
>> >> >             matrix ordering: nd
>> >> >             factor fill ratio given 5., needed 1.31367
>> >> >               Factored matrix follows:
>> >> >                 Mat Object:                 1 MPI processes
>> >> >                   type: seqaij
>> >> >                   rows=37, cols=37
>> >> >                   package used to perform factorization: petsc
>> >> >                   total: nonzeros=913, allocated nonzeros=913
>> >> >                   total number of mallocs used during MatSetValues
>> calls
>> >> > =0
>> >> >                     not using I-node routines
>> >> >           linear system matrix = precond matrix:
>> >> >           Mat Object:           1 MPI processes
>> >> >             type: seqaij
>> >> >             rows=37, cols=37
>> >> >             total: nonzeros=695, allocated nonzeros=695
>> >> >             total number of mallocs used during MatSetValues calls =0
>> >> >               not using I-node routines
>> >> >         linear system matrix = precond matrix:
>> >> >         Mat Object:         384 MPI processes
>> >> >           type: mpiaij
>> >> >           rows=18145, cols=18145
>> >> >           total: nonzeros=1709115, allocated nonzeros=1709115
>> >> >           total number of mallocs used during MatSetValues calls =0
>> >> >             not using I-node (on process 0) routines
>> >> >     Down solver (pre-smoother) on level 1
>> >> > -------------------------------
>> >> >       KSP Object:      (mg_levels_1_)       384 MPI processes
>> >> >         type: chebyshev
>> >> >           Chebyshev: eigenvalue estimates:  min = 0.133339, max =
>> >> > 1.46673
>> >> >           Chebyshev: eigenvalues estimated using gmres with
>> translations
>> >> > [0. 0.1; 0. 1.1]
>> >> >           KSP Object:          (mg_levels_1_esteig_)           384
>> MPI
>> >> > processes
>> >> >             type: gmres
>> >> >               GMRES: restart=30, using Classical (unmodified)
>> >> > Gram-Schmidt Orthogonalization with no iterative refinement
>> >> >               GMRES: happy breakdown tolerance 1e-30
>> >> >             maximum iterations=10, initial guess is zero
>> >> >             tolerances:  relative=1e-12, absolute=1e-50,
>> >> > divergence=10000.
>> >> >             left preconditioning
>> >> >             using PRECONDITIONED norm type for convergence test
>> >> >         maximum iterations=2
>> >> >         tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >> >         left preconditioning
>> >> >         using nonzero initial guess
>> >> >         using NONE norm type for convergence test
>> >> >       PC Object:      (mg_levels_1_)       384 MPI processes
>> >> >         type: sor
>> >> >           SOR: type = local_symmetric, iterations = 1, local
>> iterations
>> >> > = 1, omega = 1.
>> >> >         linear system matrix followed by preconditioner matrix:
>> >> >         Mat Object:         384 MPI processes
>> >> >           type: mffd
>> >> >           rows=3020875, cols=3020875
>> >> >             Matrix-free approximation:
>> >> >               err=1.49012e-08 (relative error in function evaluation)
>> >> >               Using wp compute h routine
>> >> >                   Does not compute normU
>> >> >         Mat Object:        ()         384 MPI processes
>> >> >           type: mpiaij
>> >> >           rows=3020875, cols=3020875
>> >> >           total: nonzeros=215671710, allocated nonzeros=241731750
>> >> >           total number of mallocs used during MatSetValues calls =0
>> >> >             not using I-node (on process 0) routines
>> >> >     Up solver (post-smoother) same as down solver (pre-smoother)
>> >> >     linear system matrix followed by preconditioner matrix:
>> >> >     Mat Object:     384 MPI processes
>> >> >       type: mffd
>> >> >       rows=3020875, cols=3020875
>> >> >         Matrix-free approximation:
>> >> >           err=1.49012e-08 (relative error in function evaluation)
>> >> >           Using wp compute h routine
>> >> >               Does not compute normU
>> >> >     Mat Object:    ()     384 MPI processes
>> >> >       type: mpiaij
>> >> >       rows=3020875, cols=3020875
>> >> >       total: nonzeros=215671710, allocated nonzeros=241731750
>> >> >       total number of mallocs used during MatSetValues calls =0
>> >> >         not using I-node (on process 0) routines
>> >> >
>> >> >
>> >> > Fande,
>> >> >
>> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >> > >
>> >> > >> Does this mean that GAMG works for the symmetrical matrix only?
>> >> > >
>> >> > >   No, it means that for non symmetric nonzero structure you need
>> the
>> >> > > extra flag. So use the extra flag. The reason we don't always use
>> the flag
>> >> > > is because it adds extra cost and isn't needed if the matrix
>> already has a
>> >> > > symmetric nonzero structure.
>> >> >
>> >> > BTW, if you have symmetric non-zero structure you can just set
>> >> > -pc_gamg_threshold -1.0', note the "or" in the message.
>> >> >
>> >> > If you want to mess with the threshold then you need to use the
>> >> > symmetrized flag.
>> >> >
>> >>
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170412/fd0a782b/attachment-0001.html>