[petsc-users] GAMG for the unsymmetrical matrix
Mark Adams
mfadams at lbl.gov
Sun Apr 9 07:04:37 CDT 2017
You seem to have two levels here and 3M eqs on the fine grid and 37 on
the coarse grid. I don't understand that.
You are also calling the AMG setup a lot, but not spending much time
in it. Try running with -info and grep on "GAMG".
On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> Thanks, Barry.
>
> It works.
>
> GAMG is three times better than ASM in terms of the number of linear
> iterations, but it is five times slower than ASM. Any suggestions to improve
> the performance of GAMG? Log files are attached.
>
> Fande,
>
> On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>
>> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov> wrote:
>> >
>> > Thanks, Mark and Barry,
>> >
>> > It works pretty wells in terms of the number of linear iterations (using
>> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am
>> > using the two-level method via "-pc_mg_levels 2". The reason why the compute
>> > time is larger than other preconditioning options is that a matrix free
>> > method is used in the fine level and in my particular problem the function
>> > evaluation is expensive.
>> >
>> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
>> > but I do not think I want to make the preconditioning part matrix-free. Do
>> > you guys know how to turn off the matrix-free method for GAMG?
>>
>> -pc_use_amat false
>>
>> >
>> > Here is the detailed solver:
>> >
>> > SNES Object: 384 MPI processes
>> > type: newtonls
>> > maximum iterations=200, maximum function evaluations=10000
>> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>> > total number of linear solver iterations=20
>> > total number of function evaluations=166
>> > norm schedule ALWAYS
>> > SNESLineSearch Object: 384 MPI processes
>> > type: bt
>> > interpolation: cubic
>> > alpha=1.000000e-04
>> > maxstep=1.000000e+08, minlambda=1.000000e-12
>> > tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>> > lambda=1.000000e-08
>> > maximum iterations=40
>> > KSP Object: 384 MPI processes
>> > type: gmres
>> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>> > Orthogonalization with no iterative refinement
>> > GMRES: happy breakdown tolerance 1e-30
>> > maximum iterations=100, initial guess is zero
>> > tolerances: relative=0.001, absolute=1e-50, divergence=10000.
>> > right preconditioning
>> > using UNPRECONDITIONED norm type for convergence test
>> > PC Object: 384 MPI processes
>> > type: gamg
>> > MG: type is MULTIPLICATIVE, levels=2 cycles=v
>> > Cycles per PCApply=1
>> > Using Galerkin computed coarse grid matrices
>> > GAMG specific options
>> > Threshold for dropping small values from graph 0.
>> > AGG specific options
>> > Symmetric graph true
>> > Coarse grid solver -- level -------------------------------
>> > KSP Object: (mg_coarse_) 384 MPI processes
>> > type: preonly
>> > maximum iterations=10000, initial guess is zero
>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> > left preconditioning
>> > using NONE norm type for convergence test
>> > PC Object: (mg_coarse_) 384 MPI processes
>> > type: bjacobi
>> > block Jacobi: number of blocks = 384
>> > Local solve is same for all blocks, in the following KSP and
>> > PC objects:
>> > KSP Object: (mg_coarse_sub_) 1 MPI processes
>> > type: preonly
>> > maximum iterations=1, initial guess is zero
>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> > left preconditioning
>> > using NONE norm type for convergence test
>> > PC Object: (mg_coarse_sub_) 1 MPI processes
>> > type: lu
>> > LU: out-of-place factorization
>> > tolerance for zero pivot 2.22045e-14
>> > using diagonal shift on blocks to prevent zero pivot
>> > [INBLOCKS]
>> > matrix ordering: nd
>> > factor fill ratio given 5., needed 1.31367
>> > Factored matrix follows:
>> > Mat Object: 1 MPI processes
>> > type: seqaij
>> > rows=37, cols=37
>> > package used to perform factorization: petsc
>> > total: nonzeros=913, allocated nonzeros=913
>> > total number of mallocs used during MatSetValues calls
>> > =0
>> > not using I-node routines
>> > linear system matrix = precond matrix:
>> > Mat Object: 1 MPI processes
>> > type: seqaij
>> > rows=37, cols=37
>> > total: nonzeros=695, allocated nonzeros=695
>> > total number of mallocs used during MatSetValues calls =0
>> > not using I-node routines
>> > linear system matrix = precond matrix:
>> > Mat Object: 384 MPI processes
>> > type: mpiaij
>> > rows=18145, cols=18145
>> > total: nonzeros=1709115, allocated nonzeros=1709115
>> > total number of mallocs used during MatSetValues calls =0
>> > not using I-node (on process 0) routines
>> > Down solver (pre-smoother) on level 1
>> > -------------------------------
>> > KSP Object: (mg_levels_1_) 384 MPI processes
>> > type: chebyshev
>> > Chebyshev: eigenvalue estimates: min = 0.133339, max =
>> > 1.46673
>> > Chebyshev: eigenvalues estimated using gmres with translations
>> > [0. 0.1; 0. 1.1]
>> > KSP Object: (mg_levels_1_esteig_) 384 MPI
>> > processes
>> > type: gmres
>> > GMRES: restart=30, using Classical (unmodified)
>> > Gram-Schmidt Orthogonalization with no iterative refinement
>> > GMRES: happy breakdown tolerance 1e-30
>> > maximum iterations=10, initial guess is zero
>> > tolerances: relative=1e-12, absolute=1e-50,
>> > divergence=10000.
>> > left preconditioning
>> > using PRECONDITIONED norm type for convergence test
>> > maximum iterations=2
>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> > left preconditioning
>> > using nonzero initial guess
>> > using NONE norm type for convergence test
>> > PC Object: (mg_levels_1_) 384 MPI processes
>> > type: sor
>> > SOR: type = local_symmetric, iterations = 1, local iterations
>> > = 1, omega = 1.
>> > linear system matrix followed by preconditioner matrix:
>> > Mat Object: 384 MPI processes
>> > type: mffd
>> > rows=3020875, cols=3020875
>> > Matrix-free approximation:
>> > err=1.49012e-08 (relative error in function evaluation)
>> > Using wp compute h routine
>> > Does not compute normU
>> > Mat Object: () 384 MPI processes
>> > type: mpiaij
>> > rows=3020875, cols=3020875
>> > total: nonzeros=215671710, allocated nonzeros=241731750
>> > total number of mallocs used during MatSetValues calls =0
>> > not using I-node (on process 0) routines
>> > Up solver (post-smoother) same as down solver (pre-smoother)
>> > linear system matrix followed by preconditioner matrix:
>> > Mat Object: 384 MPI processes
>> > type: mffd
>> > rows=3020875, cols=3020875
>> > Matrix-free approximation:
>> > err=1.49012e-08 (relative error in function evaluation)
>> > Using wp compute h routine
>> > Does not compute normU
>> > Mat Object: () 384 MPI processes
>> > type: mpiaij
>> > rows=3020875, cols=3020875
>> > total: nonzeros=215671710, allocated nonzeros=241731750
>> > total number of mallocs used during MatSetValues calls =0
>> > not using I-node (on process 0) routines
>> >
>> >
>> > Fande,
>> >
>> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
>> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> > >
>> > >> Does this mean that GAMG works for the symmetrical matrix only?
>> > >
>> > > No, it means that for non symmetric nonzero structure you need the
>> > > extra flag. So use the extra flag. The reason we don't always use the flag
>> > > is because it adds extra cost and isn't needed if the matrix already has a
>> > > symmetric nonzero structure.
>> >
>> > BTW, if you have symmetric non-zero structure you can just set
>> > -pc_gamg_threshold -1.0', note the "or" in the message.
>> >
>> > If you want to mess with the threshold then you need to use the
>> > symmetrized flag.
>> >
>>
>
More information about the petsc-users
mailing list