[petsc-users] GAMG for the unsymmetrical matrix

Fri Apr 7 16:52:06 CDT 2017

> On Apr 7, 2017, at 4:46 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> 
> 
> 
> On Fri, Apr 7, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>   Using Petsc Release Version 3.7.5, unknown
> 
>    So are you using the release or are you using master branch?
> 
> I am working on the maint branch. 
> 
> I did something two months ago:
> 
>  git clone -b maint https://bitbucket.org/petsc/petsc petsc.
> 
> 
> I am interested to improve the GAMG performance.

  Why, why not use the best solver for your problem?

> Is it possible? It can not beat ASM at all? The multilevel method should be better than the one-level if the number of processor cores is large.

   The ASM is taking 30 iterations, this is fantastic, it is really going to be tough to get GAMG to be faster (set up time for GAMG is high).

   What happens to both with 10 times as many processes? 100 times as many?

   Barry

> 
> Fande,
>  
> 
>    If you use master the ASM will be even faster.
> 
> What's new in master?
> 
> 
> Fande,
>  
> 
> 
> > On Apr 7, 2017, at 4:29 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> >
> > Thanks, Barry.
> >
> > It works.
> >
> > GAMG is three times better than ASM in terms of the number of linear iterations, but it is five times slower than ASM. Any suggestions to improve the performance of GAMG? Log files are attached.
> >
> > Fande,
> >
> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov> wrote:
> > >
> > > Thanks, Mark and Barry,
> > >
> > > It works pretty wells in terms of the number of linear iterations (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am using the two-level method via "-pc_mg_levels 2". The reason why the compute time is larger than other preconditioning options is that a matrix free method is used in the fine level and in my particular problem the function evaluation is expensive.
> > >
> > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I do not think I want to make the preconditioning part matrix-free.  Do you guys know how to turn off the matrix-free method for GAMG?
> >
> >    -pc_use_amat false
> >
> > >
> > > Here is the detailed solver:
> > >
> > > SNES Object: 384 MPI processes
> > >   type: newtonls
> > >   maximum iterations=200, maximum function evaluations=10000
> > >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> > >   total number of linear solver iterations=20
> > >   total number of function evaluations=166
> > >   norm schedule ALWAYS
> > >   SNESLineSearch Object:   384 MPI processes
> > >     type: bt
> > >       interpolation: cubic
> > >       alpha=1.000000e-04
> > >     maxstep=1.000000e+08, minlambda=1.000000e-12
> > >     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
> > >     maximum iterations=40
> > >   KSP Object:   384 MPI processes
> > >     type: gmres
> > >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> > >       GMRES: happy breakdown tolerance 1e-30
> > >     maximum iterations=100, initial guess is zero
> > >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
> > >     right preconditioning
> > >     using UNPRECONDITIONED norm type for convergence test
> > >   PC Object:   384 MPI processes
> > >     type: gamg
> > >       MG: type is MULTIPLICATIVE, levels=2 cycles=v
> > >         Cycles per PCApply=1
> > >         Using Galerkin computed coarse grid matrices
> > >         GAMG specific options
> > >           Threshold for dropping small values from graph 0.
> > >           AGG specific options
> > >             Symmetric graph true
> > >     Coarse grid solver -- level -------------------------------
> > >       KSP Object:      (mg_coarse_)       384 MPI processes
> > >         type: preonly
> > >         maximum iterations=10000, initial guess is zero
> > >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >         left preconditioning
> > >         using NONE norm type for convergence test
> > >       PC Object:      (mg_coarse_)       384 MPI processes
> > >         type: bjacobi
> > >           block Jacobi: number of blocks = 384
> > >           Local solve is same for all blocks, in the following KSP and PC objects:
> > >         KSP Object:        (mg_coarse_sub_)         1 MPI processes
> > >           type: preonly
> > >           maximum iterations=1, initial guess is zero
> > >           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >           left preconditioning
> > >           using NONE norm type for convergence test
> > >         PC Object:        (mg_coarse_sub_)         1 MPI processes
> > >           type: lu
> > >             LU: out-of-place factorization
> > >             tolerance for zero pivot 2.22045e-14
> > >             using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
> > >             matrix ordering: nd
> > >             factor fill ratio given 5., needed 1.31367
> > >               Factored matrix follows:
> > >                 Mat Object:                 1 MPI processes
> > >                   type: seqaij
> > >                   rows=37, cols=37
> > >                   package used to perform factorization: petsc
> > >                   total: nonzeros=913, allocated nonzeros=913
> > >                   total number of mallocs used during MatSetValues calls =0
> > >                     not using I-node routines
> > >           linear system matrix = precond matrix:
> > >           Mat Object:           1 MPI processes
> > >             type: seqaij
> > >             rows=37, cols=37
> > >             total: nonzeros=695, allocated nonzeros=695
> > >             total number of mallocs used during MatSetValues calls =0
> > >               not using I-node routines
> > >         linear system matrix = precond matrix:
> > >         Mat Object:         384 MPI processes
> > >           type: mpiaij
> > >           rows=18145, cols=18145
> > >           total: nonzeros=1709115, allocated nonzeros=1709115
> > >           total number of mallocs used during MatSetValues calls =0
> > >             not using I-node (on process 0) routines
> > >     Down solver (pre-smoother) on level 1 -------------------------------
> > >       KSP Object:      (mg_levels_1_)       384 MPI processes
> > >         type: chebyshev
> > >           Chebyshev: eigenvalue estimates:  min = 0.133339, max = 1.46673
> > >           Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
> > >           KSP Object:          (mg_levels_1_esteig_)           384 MPI processes
> > >             type: gmres
> > >               GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> > >               GMRES: happy breakdown tolerance 1e-30
> > >             maximum iterations=10, initial guess is zero
> > >             tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
> > >             left preconditioning
> > >             using PRECONDITIONED norm type for convergence test
> > >         maximum iterations=2
> > >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >         left preconditioning
> > >         using nonzero initial guess
> > >         using NONE norm type for convergence test
> > >       PC Object:      (mg_levels_1_)       384 MPI processes
> > >         type: sor
> > >           SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
> > >         linear system matrix followed by preconditioner matrix:
> > >         Mat Object:         384 MPI processes
> > >           type: mffd
> > >           rows=3020875, cols=3020875
> > >             Matrix-free approximation:
> > >               err=1.49012e-08 (relative error in function evaluation)
> > >               Using wp compute h routine
> > >                   Does not compute normU
> > >         Mat Object:        ()         384 MPI processes
> > >           type: mpiaij
> > >           rows=3020875, cols=3020875
> > >           total: nonzeros=215671710, allocated nonzeros=241731750
> > >           total number of mallocs used during MatSetValues calls =0
> > >             not using I-node (on process 0) routines
> > >     Up solver (post-smoother) same as down solver (pre-smoother)
> > >     linear system matrix followed by preconditioner matrix:
> > >     Mat Object:     384 MPI processes
> > >       type: mffd
> > >       rows=3020875, cols=3020875
> > >         Matrix-free approximation:
> > >           err=1.49012e-08 (relative error in function evaluation)
> > >           Using wp compute h routine
> > >               Does not compute normU
> > >     Mat Object:    ()     384 MPI processes
> > >       type: mpiaij
> > >       rows=3020875, cols=3020875
> > >       total: nonzeros=215671710, allocated nonzeros=241731750
> > >       total number of mallocs used during MatSetValues calls =0
> > >         not using I-node (on process 0) routines
> > >
> > >
> > > Fande,
> > >
> > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
> > > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > >
> > > >> Does this mean that GAMG works for the symmetrical matrix only?
> > > >
> > > >   No, it means that for non symmetric nonzero structure you need the extra flag. So use the extra flag. The reason we don't always use the flag is because it adds extra cost and isn't needed if the matrix already has a symmetric nonzero structure.
> > >
> > > BTW, if you have symmetric non-zero structure you can just set
> > > -pc_gamg_threshold -1.0', note the "or" in the message.
> > >
> > > If you want to mess with the threshold then you need to use the
> > > symmetrized flag.
> > >
> >
> >
> > <asm.txt><gamg.txt>
> 
>