[petsc-users] mg pre-conditioner default setup from PETSc-3.4 to PETSc-3.7

Thu Sep 21 07:07:25 CDT 2017

Barry,

    I solved the issue, it was related to a wrong change I made in the
creation of the IS for the VecScatter (used in the shell matrix). Fixing
that, I reached the same performance. Thank your support.

Mark, Hong,

   Thank you for your replies; I am also evaluating your suggestion on the
optimal parametrization for the MG pre-conditioner.

Best regards,
Federico

On 15 September 2017 at 17:45, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>    $ python
> Python 2.7.13 (default, Dec 18 2016, 07:03:39)
> [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> MatMult >>> 3.6272e+03 - 2.0894e+03
> 1537.7999999999997
> KSP Solve >>> 3.6329e+03 - 2.0949e+03
> 1538.0
> >>>
>
> You are right, all the extra time is within the MatMult() so for some
> reason your shell mat mult is much slower. I cannot guess why unless I can
> see inside your shell matmult at what it is doing.
>
>
> Make sure your configure options are identical and using same compiler.
>
>   Barry
>
>
>
>
>
> > On Sep 15, 2017, at 5:08 AM, Federico Golfrè Andreasi <
> federico.golfre at gmail.com> wrote:
> >
> > Hi Barry,
> >
> > I have attached an extract of the our program output for both the
> versions: PETSc-3.4.4 and PETSc-3.7.3.
> >
> > In this program the KSP has a shell matrix as operator and a MPIAIJ
> matrix as pre-conditioner.
> > I was wondering if the slowing is related more on the operations done in
> the MatMult of the shell matrix;
> > because on a test program where I solve a similar system without shell
> matrix I do not see the performance degradation.
> >
> > Perhaps you could give me some hints,
> > Thank you and best regards,
> > Federico
> >
> >
> >
> >
> > On 13 September 2017 at 17:58, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > > On Sep 13, 2017, at 10:56 AM, Federico Golfrè Andreasi <
> federico.golfre at gmail.com> wrote:
> > >
> > > Hi Barry,
> > >
> > > I understand and perfectly agree with you that the behavior increase
> after the release due to better tuning.
> > >
> > > In my case, the difference in the solution is negligible, but the
> runtime increases up to +70% (with the same number of ksp_iterations).
> >
> >   Ok this is an important (and bad) difference.
> >
> > > So I was wondering if maybe there were just some flags related to
> memory preallocation or re-usage of intermediate solution that before was
> defaulted.
> >
> >    Note likely it is this.
> >
> >    Are both compiled with the same level of compiler optimization?
> >
> >    Please run both with -log_summary and send the output, this will tell
> us WHAT parts are now slower.
> >
> >   Barry
> >
> > >
> > > Thank you,
> > > Federico
> > >
> > >
> > >
> > > On 13 September 2017 at 17:29, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > >    There will likely always be slight differences in convergence over
> that many releases. Lots of little defaults etc get changed over time as we
> learn from users and increase the robustness of the defaults.
> > >
> > >     So in your case do the differences matter?
> > >
> > > 1) What is the time to solution in both cases, is it a few percent
> different or now much slower?
> > >
> > > 2) What about number of iterations? Almost identical (say 1 or 2
> different) or does it now take 30 iterations when it use to take 5?
> > >
> > >   Barry
> > >
> > > > On Sep 13, 2017, at 10:25 AM, Federico Golfrè Andreasi <
> federico.golfre at gmail.com> wrote:
> > > >
> > > > Dear PETSc users/developers,
> > > >
> > > > I recently switched from PETSc-3.4 to PETSc-3.7 and found that some
> default setup for the "mg" (mutigrid) preconditioner have changed.
> > > >
> > > > We were solving a linear system passing, throug command line, the
> following options:
> > > > -ksp_type      fgmres
> > > > -ksp_max_it    100000
> > > > -ksp_rtol      0.000001
> > > > -pc_type       mg
> > > > -ksp_view
> > > >
> > > > The output of the KSP view is as follow:
> > > >
> > > > KSP Object: 128 MPI processes
> > > >   type: fgmres
> > > >     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > > >     GMRES: happy breakdown tolerance 1e-30
> > > >   maximum iterations=100000, initial guess is zero
> > > >   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000
> > > >   right preconditioning
> > > >   using UNPRECONDITIONED norm type for convergence test
> > > > PC Object: 128 MPI processes
> > > >   type: mg
> > > >     MG: type is MULTIPLICATIVE, levels=1 cycles=v
> > > >       Cycles per PCApply=1
> > > >       Not using Galerkin computed coarse grid matrices
> > > >   Coarse grid solver -- level -------------------------------
> > > >     KSP Object:    (mg_levels_0_)     128 MPI processes
> > > >       type: chebyshev
> > > >         Chebyshev: eigenvalue estimates:  min = 0.223549, max =
> 2.45903
> > > >         Chebyshev: estimated using:  [0 0.1; 0 1.1]
> > > >         KSP Object:        (mg_levels_0_est_)         128 MPI
> processes
> > > >           type: gmres
> > > >             GMRES: restart=30, using Classical (unmodified)
> Gram-Schmidt Orthogonalization with no iterative refinement
> > > >             GMRES: happy breakdown tolerance 1e-30
> > > >           maximum iterations=10, initial guess is zero
> > > >           tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000
> > > >           left preconditioning
> > > >           using NONE norm type for convergence test
> > > >         PC Object:        (mg_levels_0_)         128 MPI processes
> > > >           type: sor
> > > >             SOR: type = local_symmetric, iterations = 1, local
> iterations = 1, omega = 1
> > > >           linear system matrix followed by preconditioner matrix:
> > > >           Matrix Object:           128 MPI processes
> > > >             type: mpiaij
> > > >             rows=279669, cols=279669
> > > >             total: nonzeros=6427943, allocated nonzeros=6427943
> > > >             total number of mallocs used during MatSetValues calls =0
> > > >               not using I-node (on process 0) routines
> > > >           Matrix Object:           128 MPI processes
> > > >             type: mpiaij
> > > >             rows=279669, cols=279669
> > > >             total: nonzeros=6427943, allocated nonzeros=6427943
> > > >             total number of mallocs used during MatSetValues calls =0
> > > >               not using I-node (on process 0) routines
> > > >       maximum iterations=1, initial guess is zero
> > > >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> > > >       left preconditioning
> > > >       using NONE norm type for convergence test
> > > >     PC Object:    (mg_levels_0_)     128 MPI processes
> > > >       type: sor
> > > >         SOR: type = local_symmetric, iterations = 1, local
> iterations = 1, omega = 1
> > > >       linear system matrix followed by preconditioner matrix:
> > > >       Matrix Object:       128 MPI processes
> > > >         type: mpiaij
> > > >         rows=279669, cols=279669
> > > >         total: nonzeros=6427943, allocated nonzeros=6427943
> > > >         total number of mallocs used during MatSetValues calls =0
> > > >           not using I-node (on process 0) routines
> > > >       Matrix Object:       128 MPI processes
> > > >         type: mpiaij
> > > >         rows=279669, cols=279669
> > > >         total: nonzeros=6427943, allocated nonzeros=6427943
> > > >         total number of mallocs used during MatSetValues calls =0
> > > >           not using I-node (on process 0) routines
> > > >   linear system matrix followed by preconditioner matrix:
> > > >   Matrix Object:   128 MPI processes
> > > >     type: mpiaij
> > > >     rows=279669, cols=279669
> > > >     total: nonzeros=6427943, allocated nonzeros=6427943
> > > >     total number of mallocs used during MatSetValues calls =0
> > > >       not using I-node (on process 0) routines
> > > >   Matrix Object:   128 MPI processes
> > > >     type: mpiaij
> > > >     rows=279669, cols=279669
> > > >     total: nonzeros=6427943, allocated nonzeros=6427943
> > > >     total number of mallocs used during MatSetValues calls =0
> > > >       not using I-node (on process 0) routines
> > > >
> > > > When I build the same program using PETSc-3.7 and run it with the
> same options we observe that the runtime increases and the convergence is
> slightly different. The output of the KSP view is:
> > > >
> > > > KSP Object: 128 MPI processes
> > > >   type: fgmres
> > > >     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > > >     GMRES: happy breakdown tolerance 1e-30
> > > >   maximum iterations=100000, initial guess is zero
> > > >   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> > > >   right preconditioning
> > > >   using UNPRECONDITIONED norm type for convergence test
> > > > PC Object: 128 MPI processes
> > > >   type: mg
> > > >     MG: type is MULTIPLICATIVE, levels=1 cycles=v
> > > >       Cycles per PCApply=1
> > > >       Not using Galerkin computed coarse grid matrices
> > > >   Coarse grid solver -- level -------------------------------
> > > >     KSP Object:    (mg_levels_0_)     128 MPI processes
> > > >       type: chebyshev
> > > >         Chebyshev: eigenvalue estimates:  min = 0.223549, max =
> 2.45903
> > > >         Chebyshev: eigenvalues estimated using gmres with
> translations  [0. 0.1; 0. 1.1]
> > > >         KSP Object:        (mg_levels_0_esteig_)         128 MPI
> processes
> > > >           type: gmres
> > > >             GMRES: restart=30, using Classical (unmodified)
> Gram-Schmidt Orthogonalization with no iterative refinement
> > > >             GMRES: happy breakdown tolerance 1e-30
> > > >           maximum iterations=10, initial guess is zero
> > > >           tolerances:  relative=1e-12, absolute=1e-50,
> divergence=10000.
> > > >           left preconditioning
> > > >           using PRECONDITIONED norm type for convergence test
> > > >       maximum iterations=2, initial guess is zero
> > > >       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > > >       left preconditioning
> > > >       using NONE norm type for convergence test
> > > >     PC Object:    (mg_levels_0_)     128 MPI processes
> > > >       type: sor
> > > >         SOR: type = local_symmetric, iterations = 1, local
> iterations = 1, omega = 1.
> > > >       linear system matrix followed by preconditioner matrix:
> > > >       Mat Object:       128 MPI processes
> > > >         type: mpiaij
> > > >         rows=279669, cols=279669
> > > >         total: nonzeros=6427943, allocated nonzeros=6427943
> > > >         total number of mallocs used during MatSetValues calls =0
> > > >           not using I-node (on process 0) routines
> > > >       Mat Object:       128 MPI processes
> > > >         type: mpiaij
> > > >         rows=279669, cols=279669
> > > >         total: nonzeros=6427943, allocated nonzeros=6427943
> > > >         total number of mallocs used during MatSetValues calls =0
> > > >           not using I-node (on process 0) routines
> > > >   linear system matrix followed by preconditioner matrix:
> > > >   Mat Object:   128 MPI processes
> > > >     type: mpiaij
> > > >     rows=279669, cols=279669
> > > >     total: nonzeros=6427943, allocated nonzeros=6427943
> > > >     total number of mallocs used during MatSetValues calls =0
> > > >       not using I-node (on process 0) routines
> > > >   Mat Object:   128 MPI processes
> > > >     type: mpiaij
> > > >     rows=279669, cols=279669
> > > >     total: nonzeros=6427943, allocated nonzeros=6427943
> > > >     total number of mallocs used during MatSetValues calls =0
> > > >       not using I-node (on process 0) routines
> > > >
> > > > I was able to get a closer solution adding the following options:
> > > > -mg_levels_0_esteig_ksp_norm_type   none
> > > > -mg_levels_0_esteig_ksp_rtol        1.0e-5
> > > > -mg_levels_ksp_max_it               1
> > > >
> > > > But I still can reach the same runtime we were observing with
> PETSc-3.4, could you please advice me if I should specify any other options?
> > > >
> > > > Thank you very much for your support,
> > > > Federico Golfre' Andreasi
> > > >
> > >
> > >
> >
> >
> > <run_petsc34.txt><run_petsc37.txt>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170921/9cc391b1/attachment.html>