[petsc-users] GAMG for the unsymmetrical matrix

Thu Apr 13 10:14:30 CDT 2017

On Wed, Apr 12, 2017 at 7:04 PM, Kong, Fande <fande.kong at inl.gov> wrote:

>
>
> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
>> You seem to have two levels here and 3M eqs on the fine grid and 37 on
>> the coarse grid. I don't understand that.
>>
>> You are also calling the AMG setup a lot, but not spending much time
>> in it. Try running with -info and grep on "GAMG".
>>
>
> I got the following output:
>
> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
> nnz/row (ave)=71, np=384
> [0] PCGAMGFilterGraph():      100.% nnz after filtering, with threshold
> 0., 73.6364 nnz ave. (N=3020875)
> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
> [0] PCGAMGProlongator_AGG(): New grid 18162 nodes
> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00
> min=2.559747e-02 PC=jacobi
> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
> neq(loc)=40
> [0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384
> active pes
> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795
> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
> nnz/row (ave)=71, np=384
> [0] PCGAMGFilterGraph():      100.% nnz after filtering, with threshold
> 0., 73.6364 nnz ave. (N=3020875)
> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
> [0] PCGAMGProlongator_AGG(): New grid 18145 nodes
> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00
> min=2.557887e-02 PC=jacobi
> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
> neq(loc)=37
> [0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384
> active pes
>

You are still doing two levels. Just use the parameters that I told you and
you should see that 1) this coarsest (last) grid has "1 active pes" and 2)
the overall solve time and overall convergence rate is much better.

> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792
>         GAMG specific options
> PCGAMGGraph_AGG       40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04
> 7.6e+02  2  0  2  4  2   2  0  2  4  2  1170
> PCGAMGCoarse_AGG      40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04
> 1.2e+03 18 37  5 27  3  18 37  5 27  3 14632
> PCGAMGProl_AGG        40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03
> 9.6e+02  0  0  1  0  2   0  0  1  0  2     0
> PCGAMGPOpt_AGG        40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03
> 1.9e+03  1  4  4  1  4   1  4  4  1  4 51328
> GAMG: createProl      40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04
> 4.8e+03 21 42 12 32 10  21 42 12 32 10 14134
> GAMG: partLevel       40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03
> 1.5e+03  2  2  4  1  3   2  2  4  1  3  9431
>
>
>
>
>
>
>
>
>>
>>
>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>> > Thanks, Barry.
>> >
>> > It works.
>> >
>> > GAMG is three times better than ASM in terms of the number of linear
>> > iterations, but it is five times slower than ASM. Any suggestions to
>> improve
>> > the performance of GAMG? Log files are attached.
>> >
>> > Fande,
>> >
>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >>
>> >>
>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov> wrote:
>> >> >
>> >> > Thanks, Mark and Barry,
>> >> >
>> >> > It works pretty wells in terms of the number of linear iterations
>> (using
>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
>> I am
>> >> > using the two-level method via "-pc_mg_levels 2". The reason why the
>> compute
>> >> > time is larger than other preconditioning options is that a matrix
>> free
>> >> > method is used in the fine level and in my particular problem the
>> function
>> >> > evaluation is expensive.
>> >> >
>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
>> >> > but I do not think I want to make the preconditioning part
>> matrix-free.  Do
>> >> > you guys know how to turn off the matrix-free method for GAMG?
>> >>
>> >>    -pc_use_amat false
>> >>
>> >> >
>> >> > Here is the detailed solver:
>> >> >
>> >> > SNES Object: 384 MPI processes
>> >> >   type: newtonls
>> >> >   maximum iterations=200, maximum function evaluations=10000
>> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>> >> >   total number of linear solver iterations=20
>> >> >   total number of function evaluations=166
>> >> >   norm schedule ALWAYS
>> >> >   SNESLineSearch Object:   384 MPI processes
>> >> >     type: bt
>> >> >       interpolation: cubic
>> >> >       alpha=1.000000e-04
>> >> >     maxstep=1.000000e+08, minlambda=1.000000e-12
>> >> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>> >> > lambda=1.000000e-08
>> >> >     maximum iterations=40
>> >> >   KSP Object:   384 MPI processes
>> >> >     type: gmres
>> >> >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>> >> > Orthogonalization with no iterative refinement
>> >> >       GMRES: happy breakdown tolerance 1e-30
>> >> >     maximum iterations=100, initial guess is zero
>> >> >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
>> >> >     right preconditioning
>> >> >     using UNPRECONDITIONED norm type for convergence test
>> >> >   PC Object:   384 MPI processes
>> >> >     type: gamg
>> >> >       MG: type is MULTIPLICATIVE, levels=2 cycles=v
>> >> >         Cycles per PCApply=1
>> >> >         Using Galerkin computed coarse grid matrices
>> >> >         GAMG specific options
>> >> >           Threshold for dropping small values from graph 0.
>> >> >           AGG specific options
>> >> >             Symmetric graph true
>> >> >     Coarse grid solver -- level -------------------------------
>> >> >       KSP Object:      (mg_coarse_)       384 MPI processes
>> >> >         type: preonly
>> >> >         maximum iterations=10000, initial guess is zero
>> >> >         tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >> >         left preconditioning
>> >> >         using NONE norm type for convergence test
>> >> >       PC Object:      (mg_coarse_)       384 MPI processes
>> >> >         type: bjacobi
>> >> >           block Jacobi: number of blocks = 384
>> >> >           Local solve is same for all blocks, in the following KSP
>> and
>> >> > PC objects:
>> >> >         KSP Object:        (mg_coarse_sub_)         1 MPI processes
>> >> >           type: preonly
>> >> >           maximum iterations=1, initial guess is zero
>> >> >           tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >> >           left preconditioning
>> >> >           using NONE norm type for convergence test
>> >> >         PC Object:        (mg_coarse_sub_)         1 MPI processes
>> >> >           type: lu
>> >> >             LU: out-of-place factorization
>> >> >             tolerance for zero pivot 2.22045e-14
>> >> >             using diagonal shift on blocks to prevent zero pivot
>> >> > [INBLOCKS]
>> >> >             matrix ordering: nd
>> >> >             factor fill ratio given 5., needed 1.31367
>> >> >               Factored matrix follows:
>> >> >                 Mat Object:                 1 MPI processes
>> >> >                   type: seqaij
>> >> >                   rows=37, cols=37
>> >> >                   package used to perform factorization: petsc
>> >> >                   total: nonzeros=913, allocated nonzeros=913
>> >> >                   total number of mallocs used during MatSetValues
>> calls
>> >> > =0
>> >> >                     not using I-node routines
>> >> >           linear system matrix = precond matrix:
>> >> >           Mat Object:           1 MPI processes
>> >> >             type: seqaij
>> >> >             rows=37, cols=37
>> >> >             total: nonzeros=695, allocated nonzeros=695
>> >> >             total number of mallocs used during MatSetValues calls =0
>> >> >               not using I-node routines
>> >> >         linear system matrix = precond matrix:
>> >> >         Mat Object:         384 MPI processes
>> >> >           type: mpiaij
>> >> >           rows=18145, cols=18145
>> >> >           total: nonzeros=1709115, allocated nonzeros=1709115
>> >> >           total number of mallocs used during MatSetValues calls =0
>> >> >             not using I-node (on process 0) routines
>> >> >     Down solver (pre-smoother) on level 1
>> >> > -------------------------------
>> >> >       KSP Object:      (mg_levels_1_)       384 MPI processes
>> >> >         type: chebyshev
>> >> >           Chebyshev: eigenvalue estimates:  min = 0.133339, max =
>> >> > 1.46673
>> >> >           Chebyshev: eigenvalues estimated using gmres with
>> translations
>> >> > [0. 0.1; 0. 1.1]
>> >> >           KSP Object:          (mg_levels_1_esteig_)           384
>> MPI
>> >> > processes
>> >> >             type: gmres
>> >> >               GMRES: restart=30, using Classical (unmodified)
>> >> > Gram-Schmidt Orthogonalization with no iterative refinement
>> >> >               GMRES: happy breakdown tolerance 1e-30
>> >> >             maximum iterations=10, initial guess is zero
>> >> >             tolerances:  relative=1e-12, absolute=1e-50,
>> >> > divergence=10000.
>> >> >             left preconditioning
>> >> >             using PRECONDITIONED norm type for convergence test
>> >> >         maximum iterations=2
>> >> >         tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >> >         left preconditioning
>> >> >         using nonzero initial guess
>> >> >         using NONE norm type for convergence test
>> >> >       PC Object:      (mg_levels_1_)       384 MPI processes
>> >> >         type: sor
>> >> >           SOR: type = local_symmetric, iterations = 1, local
>> iterations
>> >> > = 1, omega = 1.
>> >> >         linear system matrix followed by preconditioner matrix:
>> >> >         Mat Object:         384 MPI processes
>> >> >           type: mffd
>> >> >           rows=3020875, cols=3020875
>> >> >             Matrix-free approximation:
>> >> >               err=1.49012e-08 (relative error in function evaluation)
>> >> >               Using wp compute h routine
>> >> >                   Does not compute normU
>> >> >         Mat Object:        ()         384 MPI processes
>> >> >           type: mpiaij
>> >> >           rows=3020875, cols=3020875
>> >> >           total: nonzeros=215671710, allocated nonzeros=241731750
>> >> >           total number of mallocs used during MatSetValues calls =0
>> >> >             not using I-node (on process 0) routines
>> >> >     Up solver (post-smoother) same as down solver (pre-smoother)
>> >> >     linear system matrix followed by preconditioner matrix:
>> >> >     Mat Object:     384 MPI processes
>> >> >       type: mffd
>> >> >       rows=3020875, cols=3020875
>> >> >         Matrix-free approximation:
>> >> >           err=1.49012e-08 (relative error in function evaluation)
>> >> >           Using wp compute h routine
>> >> >               Does not compute normU
>> >> >     Mat Object:    ()     384 MPI processes
>> >> >       type: mpiaij
>> >> >       rows=3020875, cols=3020875
>> >> >       total: nonzeros=215671710, allocated nonzeros=241731750
>> >> >       total number of mallocs used during MatSetValues calls =0
>> >> >         not using I-node (on process 0) routines
>> >> >
>> >> >
>> >> > Fande,
>> >> >
>> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >> > >
>> >> > >> Does this mean that GAMG works for the symmetrical matrix only?
>> >> > >
>> >> > >   No, it means that for non symmetric nonzero structure you need
>> the
>> >> > > extra flag. So use the extra flag. The reason we don't always use
>> the flag
>> >> > > is because it adds extra cost and isn't needed if the matrix
>> already has a
>> >> > > symmetric nonzero structure.
>> >> >
>> >> > BTW, if you have symmetric non-zero structure you can just set
>> >> > -pc_gamg_threshold -1.0', note the "or" in the message.
>> >> >
>> >> > If you want to mess with the threshold then you need to use the
>> >> > symmetrized flag.
>> >> >
>> >>
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170413/802e881e/attachment.html>