[petsc-users] GAMG speed

Thu Aug 1 13:48:33 CDT 2013

  Do you use KSPSetDM(ksp,da);  ?  See src/ksp/ksp/examples/tutorials/ex19.c

   Barry

On Aug 1, 2013, at 1:35 PM, Michele Rosso <mrosso at uci.edu> wrote:

> Barry,
> 
> I am using a finite difference Cartesian uniform grid and DMDA and so far it has not given me any problem.
> I am using a ksp solver (not snes). In a previous thread, I was told an odd number of grid points was needed for the geometric multigrid, is that correct?
> I tried to run my case with
> 
> 
> -pc_type mg -da_refine 4
> 
> 
> 
> but it does not seem to use the -da_refine option:
> 
> mpiexec   -np 4 ./test  -pc_type mg -da_refine 4  -ksp_view -options_left
> 
> 
> KSP Object: 4 MPI processes
>  type: cg
>  maximum iterations=10000
>  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>  left preconditioning
>  using nonzero initial guess
>  using UNPRECONDITIONED norm type for convergence test
> PC Object: 4 MPI processes
>  type: mg
>    MG: type is MULTIPLICATIVE, levels=1 cycles=v
>      Cycles per PCApply=1
>      Not using Galerkin computed coarse grid matrices
>  Coarse grid solver -- level -------------------------------
>    KSP Object:    (mg_levels_0_)     4 MPI processes
>      type: chebyshev
>        Chebyshev: eigenvalue estimates:  min = 0.134543, max = 1.47998
>        Chebyshev: estimated using:  [0 0.1; 0 1.1]
>        KSP Object:        (mg_levels_0_est_)         4 MPI processes
>          type: gmres
>            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>            GMRES: happy breakdown tolerance 1e-30
>          maximum iterations=10, initial guess is zero
>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>          left preconditioning
>          using NONE norm type for convergence test
>        PC Object:        (mg_levels_0_)         4 MPI processes
>          type: sor
>            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
>          linear system matrix = precond matrix:
>          Matrix Object:           4 MPI processes
>            type: mpiaij
>            rows=262144, cols=262144
>            total: nonzeros=1835008, allocated nonzeros=1835008
>            total number of mallocs used during MatSetValues calls =0
>      maximum iterations=1, initial guess is zero
>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>      left preconditioning
>      using NONE norm type for convergence test
>    PC Object:    (mg_levels_0_)     4 MPI processes
>      type: sor
>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
>      linear system matrix = precond matrix:
>      Matrix Object:       4 MPI processes
>        type: mpiaij
>        rows=262144, cols=262144
>        total: nonzeros=1835008, allocated nonzeros=1835008
>        total number of mallocs used during MatSetValues calls =0
>  linear system matrix = precond matrix:
>  Matrix Object:   4 MPI processes
>    type: mpiaij
>    rows=262144, cols=262144
>    total: nonzeros=1835008, allocated nonzeros=1835008
>    total number of mallocs used during MatSetValues calls =0
> Solution       =    1.53600013     sec
> #PETSc Option Table entries:
> -da_refine 4
> -ksp_view
> -options_left
> -pc_type mg
> #End of PETSc Option Table entries
> There is one unused database option. It is:
> Option left: name:-da_refine value: 4
> 
> Michele
> 
> On 08/01/2013 11:21 AM, Barry Smith wrote:
>>    What kind of mesh are you using? Are you using DMDA? If you are using DMDA (and have written your code to use it "correctly") then it should be trivial to run with geometric multigrid and geometric multigrid should be a bit faster.
>> 
>>    For example on src/snes/examples/tutorials/ex19.c   I run with ./ex19 -pc_type mg -da_refine 4 and it refines the original DMDA 4 times and uses geometric multigrid with 5 levels.
>> 
>> 
>>    Barry
>> 
>> 
>> On Aug 1, 2013, at 1:14 PM, Michele Rosso <mrosso at uci.edu> wrote:
>> 
>>> Hi,
>>> 
>>> I am successfully using PETSc (v3.4.2)  to solve a 3D Poisson's equation with CG + GAMG as I was suggested to do in a previous thread.
>>> So far I am using GAMG with the default settings, i.e.
>>> 
>>> -pc_type gamg -pc_gamg_agg_nsmooths 1
>>> 
>>> The speed of the solution is satisfactory, but I would like to know if you have any suggestions to further speed it up, particularly
>>> if there is any parameters worth looking into to achieve an even faster solution, for example number of levels and so on.
>>> So far I am using Dirichlet's BCs for my test case, but I will soon have periodic conditions: in this case, does GAMG require particular settings?
>>> Finally, I did not try geometric multigrid: do you think it is worth a shot?
>>> 
>>> Here are my current settings:
>>> 
>>> I run with
>>> 
>>> -pc_type gamg -pc_gamg_agg_nsmooths 1 -ksp_view -options_left
>>> 
>>> and the output is:
>>> 
>>> KSP Object: 4 MPI processes
>>>   type: cg
>>>   maximum iterations=10000
>>>   tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
>>>   left preconditioning
>>>   using nonzero initial guess
>>>   using UNPRECONDITIONED norm type for convergence test
>>> PC Object: 4 MPI processes
>>>   type: gamg
>>>     MG: type is MULTIPLICATIVE, levels=3 cycles=v
>>>       Cycles per PCApply=1
>>>       Using Galerkin computed coarse grid matrices
>>>   Coarse grid solver -- level -------------------------------
>>>     KSP Object:    (mg_coarse_)     4 MPI processes
>>>       type: preonly
>>>       maximum iterations=1, initial guess is zero
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>       left preconditioning
>>>       using NONE norm type for convergence test
>>>     PC Object:    (mg_coarse_)     4 MPI processes
>>>       type: bjacobi
>>>         block Jacobi: number of blocks = 4
>>>         Local solve info for each block is in the following KSP and PC objects:
>>>       [0] number of local blocks = 1, first local block number = 0
>>>                 [0] local block number 0
>>> KSP Object:          (mg_coarse_sub_)         1 MPI processes
>>>           type: preonly
>>>           maximum iterations=1, initial guess is zero
>>>                 tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>> KSP Object:        (mg_coarse_sub_)            left preconditioning
>>>           using NONE norm type for convergence test
>>>           PC Object:        (mg_coarse_sub_)       1 MPI processes
>>>           type: preonly
>>>          1 MPI processes
>>>           type: lu
>>>           maximum iterations=1, initial guess is zero
>>>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>           LU: out-of-place factorization
>>>             left preconditioning
>>>           using NONE norm type for convergence test
>>>           PC Object:        (mg_coarse_sub_)         1 MPI processes
>>>           type: lu
>>>           tolerance for zero pivot 2.22045e-14
>>>             using diagonal shift on blocks to prevent zero pivot
>>>             matrix ordering: nd
>>>             LU: out-of-place factorization
>>>             tolerance for zero pivot 2.22045e-14
>>>             using diagonal shift on blocks to prevent zero pivot
>>>             matrix ordering: nd
>>>             factor fill ratio given 5, needed 0
>>>               Factored matrix follows:
>>>             factor fill ratio given 5, needed 4.13207
>>>               Factored matrix follows:
>>>                   Matrix Object:              Matrix Object:                 1 MPI processes
>>>                   type: seqaij
>>>                     rows=395, cols=395
>>>                     package used to perform factorization: petsc
>>>                   total: nonzeros=132379, allocated nonzeros=132379
>>>                   total number of mallocs used during MatSetValues calls =0
>>>                         not using I-node routines
>>>            1 MPI processes
>>>                   type: seqaij
>>>           linear system matrix = precond matrix:
>>>                     rows=0, cols=0
>>>                     package used to perform factorization: petsc
>>>                   total: nonzeros=1, allocated nonzeros=1
>>>                     total number of mallocs used during MatSetValues calls =0
>>>                       not using I-node routines
>>>               linear system matrix = precond matrix:
>>>   Matrix Object:             1 MPI processes
>>>             type: seqaij
>>>           Matrix Object:KSP Object:           1 MPI processes
>>>             type: seqaij
>>>             rows=0, cols=0
>>>             total: nonzeros=0, allocated nonzeros=0
>>>             total number of mallocs used during MatSetValues calls =0
>>>                 not using I-node routines
>>>           rows=395, cols=395
>>>             total: nonzeros=32037, allocated nonzeros=32037
>>>             total number of mallocs used during MatSetValues calls =0
>>>               not using I-node routines
>>>           - - - - - - - - - - - - - - - - - -
>>>           KSP Object:        (mg_coarse_sub_)         1 MPI processes
>>>           type: preonly
>>>           maximum iterations=1, initial guess is zero
>>>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>           left preconditioning
>>>           using NONE norm type for convergence test
>>>         PC Object:        (mg_coarse_sub_)         1 MPI processes
>>>           type: lu
>>>             LU: out-of-place factorization
>>>             tolerance for zero pivot 2.22045e-14
>>>             using diagonal shift on blocks to prevent zero pivot
>>>             matrix ordering: nd
>>>             factor fill ratio given 5, needed 0
>>>               Factored matrix follows:
>>>                 Matrix Object:                 1 MPI processes
>>>                   type: seqaij
>>>                   rows=0, cols=0
>>>                   package used to perform factorization: petsc
>>>                   total: nonzeros=1, allocated nonzeros=1
>>>                   total number of mallocs used during MatSetValues calls =0
>>>                     not using I-node routines
>>>           linear system matrix = precond matrix:
>>>           Matrix Object:           1 MPI processes
>>>             type: seqaij
>>>             rows=0, cols=0
>>>             total: nonzeros=0, allocated nonzeros=0
>>>             total number of mallocs used during MatSetValues calls =0
>>>               not using I-node routines
>>>   (mg_coarse_sub_)         1 MPI processes
>>>           type: preonly
>>>           maximum iterations=1, initial guess is zero
>>>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>           left preconditioning
>>>           using NONE norm type for convergence test
>>>         PC Object:        (mg_coarse_sub_)         1 MPI processes
>>>           type: lu
>>>             LU: out-of-place factorization
>>>             tolerance for zero pivot 2.22045e-14
>>>             using diagonal shift on blocks to prevent zero pivot
>>>             matrix ordering: nd
>>>             factor fill ratio given 5, needed 0
>>>               Factored matrix follows:
>>>                 Matrix Object:                 1 MPI processes
>>>                   type: seqaij
>>>                   rows=0, cols=0
>>>                   package used to perform factorization: petsc
>>>                   total: nonzeros=1, allocated nonzeros=1
>>>                   total number of mallocs used during MatSetValues calls =0
>>>                     not using I-node routines
>>>           linear system matrix = precond matrix:
>>>           Matrix Object:           1 MPI processes
>>>             type: seqaij
>>>             rows=0, cols=0
>>>             total: nonzeros=0, allocated nonzeros=0
>>>             total number of mallocs used during MatSetValues calls =0
>>>               not using I-node routines
>>>       [1] number of local blocks = 1, first local block number = 1
>>>         [1] local block number 0
>>>         - - - - - - - - - - - - - - - - - -
>>>       [2] number of local blocks = 1, first local block number = 2
>>>         [2] local block number 0
>>>         - - - - - - - - - - - - - - - - - -
>>>       [3] number of local blocks = 1, first local block number = 3
>>>         [3] local block number 0
>>>         - - - - - - - - - - - - - - - - - -
>>>       linear system matrix = precond matrix:
>>>       Matrix Object:       4 MPI processes
>>>         type: mpiaij
>>>         rows=395, cols=395
>>>         total: nonzeros=32037, allocated nonzeros=32037
>>>         total number of mallocs used during MatSetValues calls =0
>>>           not using I-node (on process 0) routines
>>>   Down solver (pre-smoother) on level 1 -------------------------------
>>>     KSP Object:    (mg_levels_1_)     4 MPI processes
>>>       type: chebyshev
>>>         Chebyshev: eigenvalue estimates:  min = 0.0636225, max = 1.33607
>>>       maximum iterations=2
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>       left preconditioning
>>>       using nonzero initial guess
>>>       using NONE norm type for convergence test
>>>     PC Object:    (mg_levels_1_)     4 MPI processes
>>>       type: jacobi
>>>       linear system matrix = precond matrix:
>>>       Matrix Object:       4 MPI processes
>>>         type: mpiaij
>>>         rows=23918, cols=23918
>>>         total: nonzeros=818732, allocated nonzeros=818732
>>>         total number of mallocs used during MatSetValues calls =0
>>>           not using I-node (on process 0) routines
>>>   Up solver (post-smoother) same as down solver (pre-smoother)
>>>   Down solver (pre-smoother) on level 2 -------------------------------
>>>     KSP Object:    (mg_levels_2_)     4 MPI processes
>>>       type: chebyshev
>>>         Chebyshev: eigenvalue estimates:  min = 0.0971369, max = 2.03987
>>>       maximum iterations=2
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>       left preconditioning
>>>       using nonzero initial guess
>>>       using NONE norm type for convergence test
>>>     PC Object:    (mg_levels_2_)     4 MPI processes
>>>       type: jacobi
>>>       linear system matrix = precond matrix:
>>>       Matrix Object:       4 MPI processes
>>>         type: mpiaij
>>>         rows=262144, cols=262144
>>>         total: nonzeros=1835008, allocated nonzeros=1835008
>>>         total number of mallocs used during MatSetValues calls =0
>>>   Up solver (post-smoother) same as down solver (pre-smoother)
>>>   linear system matrix = precond matrix:
>>>   Matrix Object:   4 MPI processes
>>>     type: mpiaij
>>>     rows=262144, cols=262144
>>>     total: nonzeros=1835008, allocated nonzeros=1835008
>>>     total number of mallocs used during MatSetValues calls =0
>>> #PETSc Option Table entries:
>>> -ksp_view
>>> -options_left
>>> -pc_gamg_agg_nsmooths 1
>>> -pc_type gamg
>>> #End of PETSc Option Table entries
>>> There are no unused options.
>>> 
>>> 
>>> Thank you,
>>> Michele
>> 
>