[petsc-users] Poor multigrid convergence in parallel

Lawrence Mitchell lawrence.mitchell at imperial.ac.uk
Mon Jul 21 07:11:48 CDT 2014


On 21 Jul 2014, at 12:52, Dave May <dave.mayhem23 at gmail.com> wrote:

> 
> -pc_type mg -mg_levels_ksp_type richardson -mg_levels_pc_type jacobi -mg_levels_ksp_max_it 2
> 
> then I get identical convergence in serial and parallel
> 
> 
> Good. That's the correct result.
>  
> if, however, I run with
> 
> -pc_type mg -mg_levels_ksp_type chebyshev -mg_levels_pc_type sor -mg_levels_ksp_max_it 2
> (the default according to -ksp_view)
> 
> then I get very differing convergence in serial and parallel as described.
> 
>  
> It's normal that the behaviour is different. The PETSc SOR implementation is not parallel. It only performs SOR on your local subdomain.

Sure, however, with only two subdomains, I was not expecting to see such poor behaviour.
Below I show output from a run on 1 process and then two (along with ksp_view) for the following options:

 -pc_type mg -ksp_rtol 1e-8 -ksp_max_it 6 -pc_mg_levels 2 -mg_levels_pc_type sor -ksp_monitor

On 1 process:
  0 KSP Residual norm 5.865090856053e+02 
  1 KSP Residual norm 1.293159126247e+01 
  2 KSP Residual norm 5.181199296299e-01 
  3 KSP Residual norm 1.268870802643e-02 
  4 KSP Residual norm 5.116058930806e-04 
  5 KSP Residual norm 3.735036960550e-05 
  6 KSP Residual norm 1.755288530515e-06 
KSP Object: 1 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=6, initial guess is zero
  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=2 cycles=v
      Cycles per PCApply=1
      Not using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     1 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     1 MPI processes
      type: lu
        LU: out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
        matrix ordering: nd
        factor fill ratio given 5, needed 3.17724
          Factored matrix follows:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=144, cols=144
              package used to perform factorization: petsc
              total: nonzeros=2904, allocated nonzeros=2904
              total number of mallocs used during MatSetValues calls =0
                not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:       1 MPI processes
        type: seqaij
        rows=144, cols=144
        total: nonzeros=914, allocated nonzeros=0
        total number of mallocs used during MatSetValues calls =0
          not using I-node routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     1 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0.0999972, max = 1.09997
        Chebyshev: estimated using:  [0 0.1; 0 1.1]
        KSP Object:        (mg_levels_1_est_)         1 MPI processes
          type: gmres
            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            GMRES: happy breakdown tolerance 1e-30
          maximum iterations=10
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_levels_1_)         1 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           1 MPI processes
            type: seqaij
            rows=529, cols=529
            total: nonzeros=3521, allocated nonzeros=0
            total number of mallocs used during MatSetValues calls =0
              not using I-node routines
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     1 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       1 MPI processes
        type: seqaij
        rows=529, cols=529
        total: nonzeros=3521, allocated nonzeros=0
        total number of mallocs used during MatSetValues calls =0
          not using I-node routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=529, cols=529
    total: nonzeros=3521, allocated nonzeros=0
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines

On 2:

  0 KSP Residual norm 5.867749653193e+02 
  1 KSP Residual norm 1.353369658350e+01 
  2 KSP Residual norm 1.350163644248e+01 
  3 KSP Residual norm 1.007552895680e+01 
  4 KSP Residual norm 1.294191582208e+00 
  5 KSP Residual norm 9.409953768968e-01 
  6 KSP Residual norm 9.409360529590e-01 
KSP Object: 2 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=6, initial guess is zero
  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=2 cycles=v
      Cycles per PCApply=1
      Not using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     2 MPI processes
      type: preonly
      maximum iterations=1, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     2 MPI processes
      type: redundant
        Redundant preconditioner: First (color=0) of 2 PCs follows
      KSP Object:      (mg_coarse_redundant_)       1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
        left preconditioning
        using NONE norm type for convergence test
      PC Object:      (mg_coarse_redundant_)       1 MPI processes
        type: lu
          LU: out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
          matrix ordering: nd
          factor fill ratio given 5, needed 2.72494
            Factored matrix follows:
              Mat Object:               1 MPI processes
                type: seqaij
                rows=144, cols=144
                package used to perform factorization: petsc
                total: nonzeros=2120, allocated nonzeros=2120
                total number of mallocs used during MatSetValues calls =0
                  not using I-node routines
        linear system matrix = precond matrix:
        Mat Object:         1 MPI processes
          type: seqaij
          rows=144, cols=144
          total: nonzeros=778, allocated nonzeros=778
          total number of mallocs used during MatSetValues calls =0
            not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:       2 MPI processes
        type: mpiaij
        rows=144, cols=144
        total: nonzeros=778, allocated nonzeros=914
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     2 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0.099992, max = 1.09991
        Chebyshev: estimated using:  [0 0.1; 0 1.1]
        KSP Object:        (mg_levels_1_est_)         2 MPI processes
          type: gmres
            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            GMRES: happy breakdown tolerance 1e-30
          maximum iterations=10
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_levels_1_)         2 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           2 MPI processes
            type: mpiaij
            rows=529, cols=529
            total: nonzeros=3253, allocated nonzeros=3521
            total number of mallocs used during MatSetValues calls =0
              not using I-node (on process 0) routines
      maximum iterations=2
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     2 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       2 MPI processes
        type: mpiaij
        rows=529, cols=529
        total: nonzeros=3253, allocated nonzeros=3521
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   2 MPI processes
    type: mpiaij
    rows=529, cols=529
    total: nonzeros=3253, allocated nonzeros=3521
    total number of mallocs used during MatSetValues calls =0
      not using I-node (on process 0) routines


So notice that in the parallel case the residual reduction was ~10^3, rather than ~10^8 for the serial case.

> I see that this is a nested Krylov solve. Using fgmres on the outer sometimes is not enough. I've had problems where I needed to use the more stable orthogonalization routine in gmres.
> 
> Do you also observe different convergence behaviour (serial versus parallel) with these choices
> 1) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 1

Full options are (in addition to the above):

-ksp_type fgmres -pc_mg_levels 2 -ksp_monitor -ksp_max_it 6 -ksp_rtol 1e-8 -pc_type mg

1 process:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.294103921871e+01 
  2 KSP Residual norm 4.325949294172e+00 
  3 KSP Residual norm 1.373260455913e+00 
  4 KSP Residual norm 1.612639229769e-01 
  5 KSP Residual norm 1.896600662807e-02 
  6 KSP Residual norm 5.900847991084e-03 

2 processes:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.242896923248e+01 
  2 KSP Residual norm 1.092088559774e+01 
  3 KSP Residual norm 7.383276000966e+00 
  4 KSP Residual norm 5.634790202135e+00 
  5 KSP Residual norm 4.329897745238e+00 
  6 KSP Residual norm 3.754170628391e+00 


> 2) -mg_coarse_ksp_type gmres -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100 -mg_coarse_ksp_gmres_modifiedgramschmidt

1 process:
  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.030455192067e+01 
  2 KSP Residual norm 4.628068378242e-01 
  3 KSP Residual norm 1.965313019262e-02 
  4 KSP Residual norm 1.204109484597e-03 
  5 KSP Residual norm 5.812650812813e-05 
  6 KSP Residual norm 3.161780444565e-06 

2 processes:
  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.324768309183e+01 
  2 KSP Residual norm 1.225921121405e+01 
  3 KSP Residual norm 1.173286143250e+01 
  4 KSP Residual norm 7.033886488294e+00 
  5 KSP Residual norm 4.825036058054e+00 
  6 KSP Residual norm 4.265434976636e+00 


> 3) -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100

1 process:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.030455192067e+01 
  2 KSP Residual norm 4.628068378242e-01 
  3 KSP Residual norm 1.965313019262e-02 
  4 KSP Residual norm 1.204109484597e-03 
  5 KSP Residual norm 5.812650812814e-05 
  6 KSP Residual norm 3.161780444567e-06 

2 processes:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.324768309183e+01 
  2 KSP Residual norm 1.225921121405e+01 
  3 KSP Residual norm 1.173286143250e+01 
  4 KSP Residual norm 7.033886488294e+00 
  5 KSP Residual norm 4.825036058053e+00 
  6 KSP Residual norm 4.265434976635e+00 


> Sure - this wasn't a convergence test. I just wanted to see that the methods which should be identical in serial and parallel are in fact behaving as expected. Seems there are. So I'm included to think the problem is associated with having nested Krylov solves.


My observation appears to be that if I use unpreconditioned chebyshev as a smoother, then convergence in serial and parallel is identical and good.  As soon as I turn on SOR preconditioning for the smoother, the parallel convergence falls to pieces (and the preconditioner becomes indefinite):

e.g. with
-pc_type mg  -ksp_rtol 1e-8 -ksp_max_it 6      -pc_mg_levels 2   -ksp_monitor  -ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100 -mg_levels_pc_type none

1 process:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.530397174638e+01 
  2 KSP Residual norm 1.027554200472e+00 
  3 KSP Residual norm 3.809236982955e-02 
  4 KSP Residual norm 2.445633720099e-03 
  5 KSP Residual norm 1.192136916270e-04 
  6 KSP Residual norm 7.067629143105e-06 

2 processes:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.530397174638e+01 
  2 KSP Residual norm 1.027554200472e+00 
  3 KSP Residual norm 3.809236982955e-02 
  4 KSP Residual norm 2.445633720099e-03 
  5 KSP Residual norm 1.192136916270e-04 
  6 KSP Residual norm 7.067629143079e-06 

with sor as a preconditioner:

-pc_type mg  -ksp_rtol 1e-8 -ksp_max_it 6      -pc_mg_levels 2   -ksp_monitor  -ksp_type fgmres -mg_coarse_ksp_type cg -mg_coarse_pc_type jacobi -mg_coarse_ksp_max_it 100 -mg_levels_pc_type sor

1 process:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.030455192067e+01 
  2 KSP Residual norm 4.628068378242e-01 
  3 KSP Residual norm 1.965313019262e-02 
  4 KSP Residual norm 1.204109484597e-03 
  5 KSP Residual norm 5.812650812814e-05 
  6 KSP Residual norm 3.161780444567e-06

2 processes:

  0 KSP Residual norm 2.802543487620e+02 
  1 KSP Residual norm 1.324768309183e+01 
  2 KSP Residual norm 1.225921121405e+01 
  3 KSP Residual norm 1.173286143250e+01 
  4 KSP Residual norm 7.033886488294e+00 
  5 KSP Residual norm 4.825036058053e+00 
  6 KSP Residual norm 4.265434976635e+00 

Maybe it's just that I shouldn't be expecting this to work, but it seems odd to me.

Cheers,

Lawrence
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140721/104c49db/attachment.pgp>


More information about the petsc-users mailing list