[petsc-users] GAMG for the unsymmetrical matrix

Sun Apr 9 07:04:37 CDT 2017

You seem to have two levels here and 3M eqs on the fine grid and 37 on
the coarse grid. I don't understand that.

You are also calling the AMG setup a lot, but not spending much time
in it. Try running with -info and grep on "GAMG".

On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> Thanks, Barry.
>
> It works.
>
> GAMG is three times better than ASM in terms of the number of linear
> iterations, but it is five times slower than ASM. Any suggestions to improve
> the performance of GAMG? Log files are attached.
>
> Fande,
>
> On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>
>> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov> wrote:
>> >
>> > Thanks, Mark and Barry,
>> >
>> > It works pretty wells in terms of the number of linear iterations (using
>> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am
>> > using the two-level method via "-pc_mg_levels 2". The reason why the compute
>> > time is larger than other preconditioning options is that a matrix free
>> > method is used in the fine level and in my particular problem the function
>> > evaluation is expensive.
>> >
>> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
>> > but I do not think I want to make the preconditioning part matrix-free.  Do
>> > you guys know how to turn off the matrix-free method for GAMG?
>>
>>    -pc_use_amat false
>>
>> >
>> > Here is the detailed solver:
>> >
>> > SNES Object: 384 MPI processes
>> >   type: newtonls
>> >   maximum iterations=200, maximum function evaluations=10000
>> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>> >   total number of linear solver iterations=20
>> >   total number of function evaluations=166
>> >   norm schedule ALWAYS
>> >   SNESLineSearch Object:   384 MPI processes
>> >     type: bt
>> >       interpolation: cubic
>> >       alpha=1.000000e-04
>> >     maxstep=1.000000e+08, minlambda=1.000000e-12
>> >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>> > lambda=1.000000e-08
>> >     maximum iterations=40
>> >   KSP Object:   384 MPI processes
>> >     type: gmres
>> >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>> > Orthogonalization with no iterative refinement
>> >       GMRES: happy breakdown tolerance 1e-30
>> >     maximum iterations=100, initial guess is zero
>> >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
>> >     right preconditioning
>> >     using UNPRECONDITIONED norm type for convergence test
>> >   PC Object:   384 MPI processes
>> >     type: gamg
>> >       MG: type is MULTIPLICATIVE, levels=2 cycles=v
>> >         Cycles per PCApply=1
>> >         Using Galerkin computed coarse grid matrices
>> >         GAMG specific options
>> >           Threshold for dropping small values from graph 0.
>> >           AGG specific options
>> >             Symmetric graph true
>> >     Coarse grid solver -- level -------------------------------
>> >       KSP Object:      (mg_coarse_)       384 MPI processes
>> >         type: preonly
>> >         maximum iterations=10000, initial guess is zero
>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >         left preconditioning
>> >         using NONE norm type for convergence test
>> >       PC Object:      (mg_coarse_)       384 MPI processes
>> >         type: bjacobi
>> >           block Jacobi: number of blocks = 384
>> >           Local solve is same for all blocks, in the following KSP and
>> > PC objects:
>> >         KSP Object:        (mg_coarse_sub_)         1 MPI processes
>> >           type: preonly
>> >           maximum iterations=1, initial guess is zero
>> >           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >           left preconditioning
>> >           using NONE norm type for convergence test
>> >         PC Object:        (mg_coarse_sub_)         1 MPI processes
>> >           type: lu
>> >             LU: out-of-place factorization
>> >             tolerance for zero pivot 2.22045e-14
>> >             using diagonal shift on blocks to prevent zero pivot
>> > [INBLOCKS]
>> >             matrix ordering: nd
>> >             factor fill ratio given 5., needed 1.31367
>> >               Factored matrix follows:
>> >                 Mat Object:                 1 MPI processes
>> >                   type: seqaij
>> >                   rows=37, cols=37
>> >                   package used to perform factorization: petsc
>> >                   total: nonzeros=913, allocated nonzeros=913
>> >                   total number of mallocs used during MatSetValues calls
>> > =0
>> >                     not using I-node routines
>> >           linear system matrix = precond matrix:
>> >           Mat Object:           1 MPI processes
>> >             type: seqaij
>> >             rows=37, cols=37
>> >             total: nonzeros=695, allocated nonzeros=695
>> >             total number of mallocs used during MatSetValues calls =0
>> >               not using I-node routines
>> >         linear system matrix = precond matrix:
>> >         Mat Object:         384 MPI processes
>> >           type: mpiaij
>> >           rows=18145, cols=18145
>> >           total: nonzeros=1709115, allocated nonzeros=1709115
>> >           total number of mallocs used during MatSetValues calls =0
>> >             not using I-node (on process 0) routines
>> >     Down solver (pre-smoother) on level 1
>> > -------------------------------
>> >       KSP Object:      (mg_levels_1_)       384 MPI processes
>> >         type: chebyshev
>> >           Chebyshev: eigenvalue estimates:  min = 0.133339, max =
>> > 1.46673
>> >           Chebyshev: eigenvalues estimated using gmres with translations
>> > [0. 0.1; 0. 1.1]
>> >           KSP Object:          (mg_levels_1_esteig_)           384 MPI
>> > processes
>> >             type: gmres
>> >               GMRES: restart=30, using Classical (unmodified)
>> > Gram-Schmidt Orthogonalization with no iterative refinement
>> >               GMRES: happy breakdown tolerance 1e-30
>> >             maximum iterations=10, initial guess is zero
>> >             tolerances:  relative=1e-12, absolute=1e-50,
>> > divergence=10000.
>> >             left preconditioning
>> >             using PRECONDITIONED norm type for convergence test
>> >         maximum iterations=2
>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >         left preconditioning
>> >         using nonzero initial guess
>> >         using NONE norm type for convergence test
>> >       PC Object:      (mg_levels_1_)       384 MPI processes
>> >         type: sor
>> >           SOR: type = local_symmetric, iterations = 1, local iterations
>> > = 1, omega = 1.
>> >         linear system matrix followed by preconditioner matrix:
>> >         Mat Object:         384 MPI processes
>> >           type: mffd
>> >           rows=3020875, cols=3020875
>> >             Matrix-free approximation:
>> >               err=1.49012e-08 (relative error in function evaluation)
>> >               Using wp compute h routine
>> >                   Does not compute normU
>> >         Mat Object:        ()         384 MPI processes
>> >           type: mpiaij
>> >           rows=3020875, cols=3020875
>> >           total: nonzeros=215671710, allocated nonzeros=241731750
>> >           total number of mallocs used during MatSetValues calls =0
>> >             not using I-node (on process 0) routines
>> >     Up solver (post-smoother) same as down solver (pre-smoother)
>> >     linear system matrix followed by preconditioner matrix:
>> >     Mat Object:     384 MPI processes
>> >       type: mffd
>> >       rows=3020875, cols=3020875
>> >         Matrix-free approximation:
>> >           err=1.49012e-08 (relative error in function evaluation)
>> >           Using wp compute h routine
>> >               Does not compute normU
>> >     Mat Object:    ()     384 MPI processes
>> >       type: mpiaij
>> >       rows=3020875, cols=3020875
>> >       total: nonzeros=215671710, allocated nonzeros=241731750
>> >       total number of mallocs used during MatSetValues calls =0
>> >         not using I-node (on process 0) routines
>> >
>> >
>> > Fande,
>> >
>> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> > >
>> > >> Does this mean that GAMG works for the symmetrical matrix only?
>> > >
>> > >   No, it means that for non symmetric nonzero structure you need the
>> > > extra flag. So use the extra flag. The reason we don't always use the flag
>> > > is because it adds extra cost and isn't needed if the matrix already has a
>> > > symmetric nonzero structure.
>> >
>> > BTW, if you have symmetric non-zero structure you can just set
>> > -pc_gamg_threshold -1.0', note the "or" in the message.
>> >
>> > If you want to mess with the threshold then you need to use the
>> > symmetrized flag.
>> >
>>
>