[petsc-users] GAMG for the unsymmetrical matrix
Kong, Fande
fande.kong at inl.gov
Wed Apr 19 10:31:30 CDT 2017
Thanks, Mark,
Now, the total compute time using GAMG is competitive with ASM. Looks like
I could not use something like: "-mg_level_1_ksp_type gmres" because this
option makes the compute time much worse.
Fande,
On Thu, Apr 13, 2017 at 9:14 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
> On Wed, Apr 12, 2017 at 7:04 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>
>>
>>
>> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> You seem to have two levels here and 3M eqs on the fine grid and 37 on
>>> the coarse grid. I don't understand that.
>>>
>>> You are also calling the AMG setup a lot, but not spending much time
>>> in it. Try running with -info and grep on "GAMG".
>>>
>>
>> I got the following output:
>>
>> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
>> nnz/row (ave)=71, np=384
>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>> 0., 73.6364 nnz ave. (N=3020875)
>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
>> [0] PCGAMGProlongator_AGG(): New grid 18162 nodes
>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00
>> min=2.559747e-02 PC=jacobi
>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
>> neq(loc)=40
>> [0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384
>> active pes
>> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795
>> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
>> nnz/row (ave)=71, np=384
>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold
>> 0., 73.6364 nnz ave. (N=3020875)
>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
>> [0] PCGAMGProlongator_AGG(): New grid 18145 nodes
>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00
>> min=2.557887e-02 PC=jacobi
>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
>> neq(loc)=37
>> [0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384
>> active pes
>>
>
> You are still doing two levels. Just use the parameters that I told you
> and you should see that 1) this coarsest (last) grid has "1 active pes" and
> 2) the overall solve time and overall convergence rate is much better.
>
>
>> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792
>> GAMG specific options
>> PCGAMGGraph_AGG 40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04
>> 7.6e+02 2 0 2 4 2 2 0 2 4 2 1170
>> PCGAMGCoarse_AGG 40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04
>> 1.2e+03 18 37 5 27 3 18 37 5 27 3 14632
>> PCGAMGProl_AGG 40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03
>> 9.6e+02 0 0 1 0 2 0 0 1 0 2 0
>> PCGAMGPOpt_AGG 40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03
>> 1.9e+03 1 4 4 1 4 1 4 4 1 4 51328
>> GAMG: createProl 40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04
>> 4.8e+03 21 42 12 32 10 21 42 12 32 10 14134
>> GAMG: partLevel 40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03
>> 1.5e+03 2 2 4 1 3 2 2 4 1 3 9431
>>
>>
>>
>>
>>
>>
>>
>>
>>>
>>>
>>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>>> > Thanks, Barry.
>>> >
>>> > It works.
>>> >
>>> > GAMG is three times better than ASM in terms of the number of linear
>>> > iterations, but it is five times slower than ASM. Any suggestions to
>>> improve
>>> > the performance of GAMG? Log files are attached.
>>> >
>>> > Fande,
>>> >
>>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >>
>>> >>
>>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.kong at inl.gov> wrote:
>>> >> >
>>> >> > Thanks, Mark and Barry,
>>> >> >
>>> >> > It works pretty wells in terms of the number of linear iterations
>>> (using
>>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
>>> I am
>>> >> > using the two-level method via "-pc_mg_levels 2". The reason why
>>> the compute
>>> >> > time is larger than other preconditioning options is that a matrix
>>> free
>>> >> > method is used in the fine level and in my particular problem the
>>> function
>>> >> > evaluation is expensive.
>>> >> >
>>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free
>>> Newton,
>>> >> > but I do not think I want to make the preconditioning part
>>> matrix-free. Do
>>> >> > you guys know how to turn off the matrix-free method for GAMG?
>>> >>
>>> >> -pc_use_amat false
>>> >>
>>> >> >
>>> >> > Here is the detailed solver:
>>> >> >
>>> >> > SNES Object: 384 MPI processes
>>> >> > type: newtonls
>>> >> > maximum iterations=200, maximum function evaluations=10000
>>> >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>>> >> > total number of linear solver iterations=20
>>> >> > total number of function evaluations=166
>>> >> > norm schedule ALWAYS
>>> >> > SNESLineSearch Object: 384 MPI processes
>>> >> > type: bt
>>> >> > interpolation: cubic
>>> >> > alpha=1.000000e-04
>>> >> > maxstep=1.000000e+08, minlambda=1.000000e-12
>>> >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>> >> > lambda=1.000000e-08
>>> >> > maximum iterations=40
>>> >> > KSP Object: 384 MPI processes
>>> >> > type: gmres
>>> >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>>> >> > Orthogonalization with no iterative refinement
>>> >> > GMRES: happy breakdown tolerance 1e-30
>>> >> > maximum iterations=100, initial guess is zero
>>> >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000.
>>> >> > right preconditioning
>>> >> > using UNPRECONDITIONED norm type for convergence test
>>> >> > PC Object: 384 MPI processes
>>> >> > type: gamg
>>> >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v
>>> >> > Cycles per PCApply=1
>>> >> > Using Galerkin computed coarse grid matrices
>>> >> > GAMG specific options
>>> >> > Threshold for dropping small values from graph 0.
>>> >> > AGG specific options
>>> >> > Symmetric graph true
>>> >> > Coarse grid solver -- level -------------------------------
>>> >> > KSP Object: (mg_coarse_) 384 MPI processes
>>> >> > type: preonly
>>> >> > maximum iterations=10000, initial guess is zero
>>> >> > tolerances: relative=1e-05, absolute=1e-50,
>>> divergence=10000.
>>> >> > left preconditioning
>>> >> > using NONE norm type for convergence test
>>> >> > PC Object: (mg_coarse_) 384 MPI processes
>>> >> > type: bjacobi
>>> >> > block Jacobi: number of blocks = 384
>>> >> > Local solve is same for all blocks, in the following KSP
>>> and
>>> >> > PC objects:
>>> >> > KSP Object: (mg_coarse_sub_) 1 MPI processes
>>> >> > type: preonly
>>> >> > maximum iterations=1, initial guess is zero
>>> >> > tolerances: relative=1e-05, absolute=1e-50,
>>> divergence=10000.
>>> >> > left preconditioning
>>> >> > using NONE norm type for convergence test
>>> >> > PC Object: (mg_coarse_sub_) 1 MPI processes
>>> >> > type: lu
>>> >> > LU: out-of-place factorization
>>> >> > tolerance for zero pivot 2.22045e-14
>>> >> > using diagonal shift on blocks to prevent zero pivot
>>> >> > [INBLOCKS]
>>> >> > matrix ordering: nd
>>> >> > factor fill ratio given 5., needed 1.31367
>>> >> > Factored matrix follows:
>>> >> > Mat Object: 1 MPI processes
>>> >> > type: seqaij
>>> >> > rows=37, cols=37
>>> >> > package used to perform factorization: petsc
>>> >> > total: nonzeros=913, allocated nonzeros=913
>>> >> > total number of mallocs used during MatSetValues
>>> calls
>>> >> > =0
>>> >> > not using I-node routines
>>> >> > linear system matrix = precond matrix:
>>> >> > Mat Object: 1 MPI processes
>>> >> > type: seqaij
>>> >> > rows=37, cols=37
>>> >> > total: nonzeros=695, allocated nonzeros=695
>>> >> > total number of mallocs used during MatSetValues calls
>>> =0
>>> >> > not using I-node routines
>>> >> > linear system matrix = precond matrix:
>>> >> > Mat Object: 384 MPI processes
>>> >> > type: mpiaij
>>> >> > rows=18145, cols=18145
>>> >> > total: nonzeros=1709115, allocated nonzeros=1709115
>>> >> > total number of mallocs used during MatSetValues calls =0
>>> >> > not using I-node (on process 0) routines
>>> >> > Down solver (pre-smoother) on level 1
>>> >> > -------------------------------
>>> >> > KSP Object: (mg_levels_1_) 384 MPI processes
>>> >> > type: chebyshev
>>> >> > Chebyshev: eigenvalue estimates: min = 0.133339, max =
>>> >> > 1.46673
>>> >> > Chebyshev: eigenvalues estimated using gmres with
>>> translations
>>> >> > [0. 0.1; 0. 1.1]
>>> >> > KSP Object: (mg_levels_1_esteig_) 384
>>> MPI
>>> >> > processes
>>> >> > type: gmres
>>> >> > GMRES: restart=30, using Classical (unmodified)
>>> >> > Gram-Schmidt Orthogonalization with no iterative refinement
>>> >> > GMRES: happy breakdown tolerance 1e-30
>>> >> > maximum iterations=10, initial guess is zero
>>> >> > tolerances: relative=1e-12, absolute=1e-50,
>>> >> > divergence=10000.
>>> >> > left preconditioning
>>> >> > using PRECONDITIONED norm type for convergence test
>>> >> > maximum iterations=2
>>> >> > tolerances: relative=1e-05, absolute=1e-50,
>>> divergence=10000.
>>> >> > left preconditioning
>>> >> > using nonzero initial guess
>>> >> > using NONE norm type for convergence test
>>> >> > PC Object: (mg_levels_1_) 384 MPI processes
>>> >> > type: sor
>>> >> > SOR: type = local_symmetric, iterations = 1, local
>>> iterations
>>> >> > = 1, omega = 1.
>>> >> > linear system matrix followed by preconditioner matrix:
>>> >> > Mat Object: 384 MPI processes
>>> >> > type: mffd
>>> >> > rows=3020875, cols=3020875
>>> >> > Matrix-free approximation:
>>> >> > err=1.49012e-08 (relative error in function
>>> evaluation)
>>> >> > Using wp compute h routine
>>> >> > Does not compute normU
>>> >> > Mat Object: () 384 MPI processes
>>> >> > type: mpiaij
>>> >> > rows=3020875, cols=3020875
>>> >> > total: nonzeros=215671710, allocated nonzeros=241731750
>>> >> > total number of mallocs used during MatSetValues calls =0
>>> >> > not using I-node (on process 0) routines
>>> >> > Up solver (post-smoother) same as down solver (pre-smoother)
>>> >> > linear system matrix followed by preconditioner matrix:
>>> >> > Mat Object: 384 MPI processes
>>> >> > type: mffd
>>> >> > rows=3020875, cols=3020875
>>> >> > Matrix-free approximation:
>>> >> > err=1.49012e-08 (relative error in function evaluation)
>>> >> > Using wp compute h routine
>>> >> > Does not compute normU
>>> >> > Mat Object: () 384 MPI processes
>>> >> > type: mpiaij
>>> >> > rows=3020875, cols=3020875
>>> >> > total: nonzeros=215671710, allocated nonzeros=241731750
>>> >> > total number of mallocs used during MatSetValues calls =0
>>> >> > not using I-node (on process 0) routines
>>> >> >
>>> >> >
>>> >> > Fande,
>>> >> >
>>> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >> > >
>>> >> > >> Does this mean that GAMG works for the symmetrical matrix only?
>>> >> > >
>>> >> > > No, it means that for non symmetric nonzero structure you need
>>> the
>>> >> > > extra flag. So use the extra flag. The reason we don't always use
>>> the flag
>>> >> > > is because it adds extra cost and isn't needed if the matrix
>>> already has a
>>> >> > > symmetric nonzero structure.
>>> >> >
>>> >> > BTW, if you have symmetric non-zero structure you can just set
>>> >> > -pc_gamg_threshold -1.0', note the "or" in the message.
>>> >> >
>>> >> > If you want to mess with the threshold then you need to use the
>>> >> > symmetrized flag.
>>> >> >
>>> >>
>>> >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170419/c55996ba/attachment-0001.html>
More information about the petsc-users
mailing list