[petsc-users] Issue using multi-grid as a pre-conditioner with KSP for a Poisson problem

Jason Lefley jason.lefley at aclectic.com
Fri Jun 23 00:52:12 CDT 2017


> On Jun 22, 2017, at 3:52 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> 
>   Try running the -pc_type mg   case with the additional argument -pc_mg_galerkin does that decrease the number of iterations?

Yes, adding that option yields a considerable improvement:

$ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5 -pc_mg_galerkin
right hand side 2 norm: 512.
right hand side infinity norm: 0.999097
building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128
  0 KSP preconditioned resid norm 9.822067063977e-01 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 2.212533971477e-01 true resid norm 3.294699127279e+02 ||r(i)||/||b|| 6.434959232968e-01
  2 KSP preconditioned resid norm 9.110625065462e-02 true resid norm 1.674425012051e+02 ||r(i)||/||b|| 3.270361351662e-01
  3 KSP preconditioned resid norm 3.418653611088e-02 true resid norm 7.052946665869e+01 ||r(i)||/||b|| 1.377528645678e-01
  4 KSP preconditioned resid norm 1.120326622247e-02 true resid norm 2.513164585178e+01 ||r(i)||/||b|| 4.908524580425e-02
  5 KSP preconditioned resid norm 3.171704458717e-03 true resid norm 7.887015942877e+00 ||r(i)||/||b|| 1.540432801343e-02
  6 KSP preconditioned resid norm 7.376165339467e-04 true resid norm 2.126203120360e+00 ||r(i)||/||b|| 4.152740469453e-03
  7 KSP preconditioned resid norm 1.599952443901e-04 true resid norm 4.455888786560e-01 ||r(i)||/||b|| 8.702907786250e-04
  8 KSP preconditioned resid norm 5.280374315084e-05 true resid norm 9.902193817384e-02 ||r(i)||/||b|| 1.934022229958e-04
  9 KSP preconditioned resid norm 3.284982207258e-05 true resid norm 3.046564806731e-02 ||r(i)||/||b|| 5.950321888146e-05
 10 KSP preconditioned resid norm 1.630367968481e-05 true resid norm 3.157470827473e-02 ||r(i)||/||b|| 6.166935209908e-05
 11 KSP preconditioned resid norm 4.546559816411e-06 true resid norm 9.115958407015e-03 ||r(i)||/||b|| 1.780460626370e-05
KSP Object: 16 MPI processes
  type: cg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 16 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     16 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     16 MPI processes
      type: redundant
        Redundant preconditioner: First (color=0) of 16 PCs follows
        KSP Object:        (mg_coarse_redundant_)         1 MPI processes
          type: preonly
          maximum iterations=10000, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
          left preconditioning
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_redundant_)         1 MPI processes
          type: lu
            LU: out-of-place factorization
            tolerance for zero pivot 2.22045e-14
            using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
            matrix ordering: nd
            factor fill ratio given 5., needed 7.56438
              Factored matrix follows:
                Mat Object:                 1 MPI processes
                  type: seqaij
                  rows=512, cols=512
                  package used to perform factorization: petsc
                  total: nonzeros=24206, allocated nonzeros=24206
                  total number of mallocs used during MatSetValues calls =0
                    not using I-node routines
          linear system matrix = precond matrix:
          Mat Object:           1 MPI processes
            type: seqaij
            rows=512, cols=512
            total: nonzeros=3200, allocated nonzeros=3200
            total number of mallocs used during MatSetValues calls =0
              not using I-node routines
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=512, cols=512
        total: nonzeros=3200, allocated nonzeros=3200
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     16 MPI processes
      type: richardson
        Richardson: using self-scale best computed damping factor
      maximum iterations=5
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     16 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=4096, cols=4096
        total: nonzeros=27136, allocated nonzeros=27136
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 -------------------------------
    KSP Object:    (mg_levels_2_)     16 MPI processes
      type: richardson
        Richardson: using self-scale best computed damping factor
      maximum iterations=5
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_2_)     16 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=32768, cols=32768
        total: nonzeros=223232, allocated nonzeros=223232
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 3 -------------------------------
    KSP Object:    (mg_levels_3_)     16 MPI processes
      type: richardson
        Richardson: using self-scale best computed damping factor
      maximum iterations=5
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_3_)     16 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=262144, cols=262144
        total: nonzeros=1810432, allocated nonzeros=1810432
        total number of mallocs used during MatSetValues calls =0
          not using I-node (on process 0) routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 -------------------------------
    KSP Object:    (mg_levels_4_)     16 MPI processes
      type: richardson
        Richardson: using self-scale best computed damping factor
      maximum iterations=5
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_4_)     16 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
      linear system matrix = precond matrix:
      Mat Object:       16 MPI processes
        type: mpiaij
        rows=2097152, cols=2097152
        total: nonzeros=14581760, allocated nonzeros=14581760
        total number of mallocs used during MatSetValues calls =0
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   16 MPI processes
    type: mpiaij
    rows=2097152, cols=2097152
    total: nonzeros=14581760, allocated nonzeros=14581760
    total number of mallocs used during MatSetValues calls =0
Residual 2 norm 0.00911596
Residual infinity norm 4.90052e-05

> 
>> On Jun 22, 2017, at 3:20 PM, Jason Lefley <jason.lefley at aclectic.com> wrote:
>> 
>> Thanks for the prompt replies. I ran with gamg and the results look more promising. I tried the suggested -mg_* options and did not see improvement.
> 
>   Yes, the 5 iterations you get below is pretty much the best you can expect. No reasonably tuning of smoother options is likely to have much affect (it is very difficult to improve from 5).
> 

We are quite happy with 5 iterations. The suggested mg options did not improve the runs involving -pc_type mg. I’d like to know how to get the same convergence we observe with gamg when using mg, if possible.

>   Barry
> 
>> The -ksp_view and -ksp_monitor_true_residual output from those tests and the solver_test source (modified ex34.c) follow:
>> 
>> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg
>> right hand side 2 norm: 512.
>> right hand side infinity norm: 0.999097
>> building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128
>>  0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00
>>  1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00
>>  2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02
>>  3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03
>>  4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04
>>  5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05
>> KSP Object: 16 MPI processes
> 
>    Y
>>  type: cg
>>  maximum iterations=10000, initial guess is zero
>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: 16 MPI processes
>>  type: gamg
>>    MG: type is MULTIPLICATIVE, levels=5 cycles=v
>>      Cycles per PCApply=1
>>      Using Galerkin computed coarse grid matrices
>>      GAMG specific options
>>        Threshold for dropping small values from graph 0.
>>        AGG specific options
>>          Symmetric graph false
>>  Coarse grid solver -- level -------------------------------
>>    KSP Object:    (mg_coarse_)     16 MPI processes
>>      type: preonly
>>      maximum iterations=10000, initial guess is zero
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_coarse_)     16 MPI processes
>>      type: bjacobi
>>        block Jacobi: number of blocks = 16
>>        Local solve is same for all blocks, in the following KSP and PC objects:
>>      KSP Object:      (mg_coarse_sub_)       1 MPI processes
>>        type: preonly
>>        maximum iterations=1, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object:      (mg_coarse_sub_)       1 MPI processes
>>        type: lu
>>          LU: out-of-place factorization
>>          tolerance for zero pivot 2.22045e-14
>>          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>>          matrix ordering: nd
>>          factor fill ratio given 5., needed 1.
>>            Factored matrix follows:
>>              Mat Object:               1 MPI processes
>>                type: seqaij
>>                rows=13, cols=13
>>                package used to perform factorization: petsc
>>                total: nonzeros=169, allocated nonzeros=169
>>                total number of mallocs used during MatSetValues calls =0
>>                  using I-node routines: found 3 nodes, limit used is 5
>>        linear system matrix = precond matrix:
>>        Mat Object:         1 MPI processes
>>          type: seqaij
>>          rows=13, cols=13
>>          total: nonzeros=169, allocated nonzeros=169
>>          total number of mallocs used during MatSetValues calls =0
>>            using I-node routines: found 3 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=13, cols=13
>>        total: nonzeros=169, allocated nonzeros=169
>>        total number of mallocs used during MatSetValues calls =0
>>          using I-node (on process 0) routines: found 3 nodes, limit used is 5
>>  Down solver (pre-smoother) on level 1 -------------------------------
>>    KSP Object:    (mg_levels_1_)     16 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.136516, max = 1.50168
>>        Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
>>        KSP Object:        (mg_levels_1_esteig_)         16 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10, initial guess is zero
>>          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using PRECONDITIONED norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_1_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=467, cols=467
>>        total: nonzeros=68689, allocated nonzeros=68689
>>        total number of mallocs used during MatSetValues calls =0
>>          not using I-node (on process 0) routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 2 -------------------------------
>>    KSP Object:    (mg_levels_2_)     16 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.148872, max = 1.63759
>>        Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
>>        KSP Object:        (mg_levels_2_esteig_)         16 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10, initial guess is zero
>>          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using PRECONDITIONED norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_2_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=14893, cols=14893
>>        total: nonzeros=1856839, allocated nonzeros=1856839
>>        total number of mallocs used during MatSetValues calls =0
>>          not using I-node (on process 0) routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 3 -------------------------------
>>    KSP Object:    (mg_levels_3_)     16 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.135736, max = 1.49309
>>        Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
>>        KSP Object:        (mg_levels_3_esteig_)         16 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10, initial guess is zero
>>          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using PRECONDITIONED norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_3_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=190701, cols=190701
>>        total: nonzeros=6209261, allocated nonzeros=6209261
>>        total number of mallocs used during MatSetValues calls =0
>>          not using I-node (on process 0) routines
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 4 -------------------------------
>>    KSP Object:    (mg_levels_4_)     16 MPI processes
>>      type: chebyshev
>>        Chebyshev: eigenvalue estimates:  min = 0.140039, max = 1.54043
>>        Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
>>        KSP Object:        (mg_levels_4_esteig_)         16 MPI processes
>>          type: gmres
>>            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>            GMRES: happy breakdown tolerance 1e-30
>>          maximum iterations=10, initial guess is zero
>>          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using PRECONDITIONED norm type for convergence test
>>      maximum iterations=2
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_4_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=2097152, cols=2097152
>>        total: nonzeros=14581760, allocated nonzeros=14581760
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  linear system matrix = precond matrix:
>>  Mat Object:   16 MPI processes
>>    type: mpiaij
>>    rows=2097152, cols=2097152
>>    total: nonzeros=14581760, allocated nonzeros=14581760
>>    total number of mallocs used during MatSetValues calls =0
>> Residual 2 norm 0.0230953
>> Residual infinity norm 0.000240174
>> 
>> 
>> 
>> $ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5
>> right hand side 2 norm: 512.
>> right hand side infinity norm: 0.999097
>> building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128
>> building operator with Dirichlet boundary conditions, global grid size: 16 x 16 x 16
>> building operator with Dirichlet boundary conditions, global grid size: 32 x 32 x 32
>> building operator with Dirichlet boundary conditions, global grid size: 64 x 64 x 64
>> building operator with Dirichlet boundary conditions, global grid size: 8 x 8 x 8
>>  0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00
>>  1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01
>>  2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01
>>  3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01
>>  4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01
>>  5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01
>>  6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01
>>  7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01
>>  8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01
>>  9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01
>> 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01
>> 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01
>> 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01
>> 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01
>> 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01
>> 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01
>> 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01
>> 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01
>> 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01
>> 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01
>> 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01
>> 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01
>> 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01
>> 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01
>> 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02
>> 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02
>> 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02
>> 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02
>> 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02
>> 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02
>> 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02
>> 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02
>> 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02
>> 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02
>> 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02
>> 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02
>> 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02
>> 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02
>> 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02
>> 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02
>> 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02
>> 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02
>> 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02
>> 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02
>> 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02
>> 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02
>> 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02
>> 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02
>> 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02
>> 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02
>> 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02
>> 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02
>> 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03
>> 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03
>> 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03
>> 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03
>> 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03
>> 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03
>> 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03
>> 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03
>> 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03
>> 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03
>> 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03
>> 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03
>> 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03
>> 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03
>> 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03
>> 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03
>> 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03
>> 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03
>> 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03
>> 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03
>> 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03
>> 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03
>> 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03
>> 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03
>> 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03
>> 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04
>> 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04
>> 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04
>> 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04
>> 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04
>> 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04
>> 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04
>> KSP Object: 16 MPI processes
>>  type: cg
>>  maximum iterations=10000, initial guess is zero
>>  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: 16 MPI processes
>>  type: mg
>>    MG: type is MULTIPLICATIVE, levels=5 cycles=v
>>      Cycles per PCApply=1
>>      Not using Galerkin computed coarse grid matrices
>>  Coarse grid solver -- level -------------------------------
>>    KSP Object:    (mg_coarse_)     16 MPI processes
>>      type: preonly
>>      maximum iterations=10000, initial guess is zero
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_coarse_)     16 MPI processes
>>      type: redundant
>>        Redundant preconditioner: First (color=0) of 16 PCs follows
>>        KSP Object:        (mg_coarse_redundant_)         1 MPI processes
>>          type: preonly
>>          maximum iterations=10000, initial guess is zero
>>          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using NONE norm type for convergence test
>>        PC Object:        (mg_coarse_redundant_)         1 MPI processes
>>          type: lu
>>            LU: out-of-place factorization
>>            tolerance for zero pivot 2.22045e-14
>>            using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>>            matrix ordering: nd
>>            factor fill ratio given 5., needed 7.56438
>>              Factored matrix follows:
>>                Mat Object:                 1 MPI processes
>>                  type: seqaij
>>                  rows=512, cols=512
>>                  package used to perform factorization: petsc
>>                  total: nonzeros=24206, allocated nonzeros=24206
>>                  total number of mallocs used during MatSetValues calls =0
>>                    not using I-node routines
>>          linear system matrix = precond matrix:
>>          Mat Object:           1 MPI processes
>>            type: seqaij
>>            rows=512, cols=512
>>            total: nonzeros=3200, allocated nonzeros=3200
>>            total number of mallocs used during MatSetValues calls =0
>>              not using I-node routines
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=512, cols=512
>>        total: nonzeros=3200, allocated nonzeros=3200
>>        total number of mallocs used during MatSetValues calls =0
>>  Down solver (pre-smoother) on level 1 -------------------------------
>>    KSP Object:    (mg_levels_1_)     16 MPI processes
>>      type: richardson
>>        Richardson: using self-scale best computed damping factor
>>      maximum iterations=5
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_1_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=4096, cols=4096
>>        total: nonzeros=27136, allocated nonzeros=27136
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 2 -------------------------------
>>    KSP Object:    (mg_levels_2_)     16 MPI processes
>>      type: richardson
>>        Richardson: using self-scale best computed damping factor
>>      maximum iterations=5
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_2_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=32768, cols=32768
>>        total: nonzeros=223232, allocated nonzeros=223232
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 3 -------------------------------
>>    KSP Object:    (mg_levels_3_)     16 MPI processes
>>      type: richardson
>>        Richardson: using self-scale best computed damping factor
>>      maximum iterations=5
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_3_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=262144, cols=262144
>>        total: nonzeros=1810432, allocated nonzeros=1810432
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  Down solver (pre-smoother) on level 4 -------------------------------
>>    KSP Object:    (mg_levels_4_)     16 MPI processes
>>      type: richardson
>>        Richardson: using self-scale best computed damping factor
>>      maximum iterations=5
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using nonzero initial guess
>>      using NONE norm type for convergence test
>>    PC Object:    (mg_levels_4_)     16 MPI processes
>>      type: sor
>>        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.
>>      linear system matrix = precond matrix:
>>      Mat Object:       16 MPI processes
>>        type: mpiaij
>>        rows=2097152, cols=2097152
>>        total: nonzeros=14581760, allocated nonzeros=14581760
>>        total number of mallocs used during MatSetValues calls =0
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  linear system matrix = precond matrix:
>>  Mat Object:   16 MPI processes
>>    type: mpiaij
>>    rows=2097152, cols=2097152
>>    total: nonzeros=14581760, allocated nonzeros=14581760
>>    total number of mallocs used during MatSetValues calls =0
>> Residual 2 norm 0.290082
>> Residual infinity norm 0.00192869
>> 
>> 
>> 
>> 
>> 
>> solver_test.c:
>> 
>> // modified version of ksp/ksp/examples/tutorials/ex34.c
>> // related: ksp/ksp/examples/tutorials/ex29.c
>> //          ksp/ksp/examples/tutorials/ex32.c
>> //          ksp/ksp/examples/tutorials/ex50.c
>> 
>> #include <petscdm.h>
>> #include <petscdmda.h>
>> #include <petscksp.h>
>> 
>> extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*);
>> extern PetscErrorCode ComputeRHS(KSP,Vec,void*);
>> 
>> typedef enum
>> {
>>    DIRICHLET,
>>    NEUMANN
>> } BCType;
>> 
>> #undef __FUNCT__
>> #define __FUNCT__ "main"
>> int main(int argc,char **argv)
>> {
>>    KSP            ksp;
>>    DM             da;
>>    PetscReal      norm;
>>    PetscErrorCode ierr;
>> 
>>    PetscInt    i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs;
>>    PetscScalar Hx,Hy,Hz;
>>    PetscScalar ***array;
>>    Vec         x,b,r;
>>    Mat         J;
>>    const char* bcTypes[2] = { "dirichlet", "neumann" };
>>    PetscInt    bcType = (PetscInt)DIRICHLET;
>> 
>>    PetscInitialize(&argc,&argv,(char*)0,0);
>> 
>>    ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr);
>>    ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", "", bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr);
>>    ierr = PetscOptionsEnd();CHKERRQ(ierr);
>> 
>>    ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);
>>    ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr);
>>    ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr);
>> 
>>    ierr = KSPSetDM(ksp,da);CHKERRQ(ierr);
>> 
>>    ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr);
>>    ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType);CHKERRQ(ierr);
>>    ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);
>>    ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr);
>>    ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr);
>>    ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr);
>>    ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr);
>>    ierr = VecDuplicate(b,&r);CHKERRQ(ierr);
>> 
>>    ierr = MatMult(J,x,r);CHKERRQ(ierr);
>>    ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr);
>>    ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr);
>>    ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm %g\n",(double)norm);CHKERRQ(ierr);
>>    ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr);
>>    ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm %g\n",(double)norm);CHKERRQ(ierr);
>> 
>>    ierr = VecDestroy(&r);CHKERRQ(ierr);
>>    ierr = KSPDestroy(&ksp);CHKERRQ(ierr);
>>    ierr = DMDestroy(&da);CHKERRQ(ierr);
>>    ierr = PetscFinalize();
>>    return 0;
>> }
>> 
>> #undef __FUNCT__
>> #define __FUNCT__ "ComputeRHS"
>> PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx)
>> {
>>    PetscErrorCode ierr;
>>    PetscInt       i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs;
>>    PetscScalar    Hx,Hy,Hz;
>>    PetscScalar    ***array;
>>    DM             da;
>>    BCType bcType = *(BCType*)ctx;
>> 
>>    PetscFunctionBeginUser;
>>    ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr);
>>    ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, 0,0,0,0,0,0,0,0,0);CHKERRQ(ierr);
>>    Hx   = 1.0 / (PetscReal)(mx);
>>    Hy   = 1.0 / (PetscReal)(my);
>>    Hz   = 1.0 / (PetscReal)(mz);
>>    ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr);
>>    ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr);
>>    for (k = zs; k < zs + zm; k++)
>>    {
>>        for (j = ys; j < ys + ym; j++)
>>        {
>>            for (i = xs; i < xs + xm; i++)
>>            {
>>                PetscReal x = ((PetscReal)i + 0.5) * Hx;
>>                PetscReal y = ((PetscReal)j + 0.5) * Hy;
>>                PetscReal z = ((PetscReal)k + 0.5) * Hz;
>>                array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI);
>>            }
>>        }
>>    }
>>    ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr);
>>    ierr = VecAssemblyBegin(b);CHKERRQ(ierr);
>>    ierr = VecAssemblyEnd(b);CHKERRQ(ierr);
>> 
>>    PetscReal norm;
>>    VecNorm(b, NORM_2, &norm);
>>    PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", (double)norm);
>>    VecNorm(b, NORM_INFINITY, &norm);
>>    PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", (double)norm);
>> 
>>    /* force right hand side to be consistent for singular matrix */
>>    /* note this is really a hack, normally the model would provide you with a consistent right handside */
>> 
>>    if (bcType == NEUMANN)
>>    {
>>        MatNullSpace nullspace;
>>        ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr);
>>        ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr);
>>        ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr);
>>    }
>>    PetscFunctionReturn(0);
>> }
>> 
>> 
>> #undef __FUNCT__
>> #define __FUNCT__ "ComputeMatrix"
>> PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx)
>> {
>>    PetscErrorCode ierr;
>>    PetscInt       i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, numk;
>>    PetscScalar    v[7],Hx,Hy,Hz;
>>    MatStencil     row, col[7];
>>    DM             da;
>>    BCType bcType = *(BCType*)ctx;
>> 
>>    PetscFunctionBeginUser;
>> 
>>    if (bcType == DIRICHLET)
>>        PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet boundary conditions, ");
>>    else if (bcType == NEUMANN)
>>        PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann boundary conditions, ");
>>    else
>>        SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary condition type\n");
>> 
>>    ierr    = KSPGetDM(ksp,&da);CHKERRQ(ierr);
>>    ierr    = DMDAGetInfo(da,0,&mx,&my,&mz,0,0,0,0,0,0,0,0,0);CHKERRQ(ierr);
>> 
>>    PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", mx, my, mz);
>> 
>>    Hx      = 1.0 / (PetscReal)(mx);
>>    Hy      = 1.0 / (PetscReal)(my);
>>    Hz      = 1.0 / (PetscReal)(mz);
>> 
>>    PetscReal Hx2 = Hx * Hx;
>>    PetscReal Hy2 = Hy * Hy;
>>    PetscReal Hz2 = Hz * Hz;
>> 
>>    PetscReal scaleX = 1.0 / Hx2;
>>    PetscReal scaleY = 1.0 / Hy2;
>>    PetscReal scaleZ = 1.0 / Hz2;
>> 
>>    ierr    = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr);
>>    for (k = zs; k < zs + zm; k++)
>>    {
>>        for (j = ys; j < ys + ym; j++)
>>        {
>>            for (i = xs; i < xs + xm; i++)
>>            {
>>                row.i = i;
>>                row.j = j;
>>                row.k = k;
>>                if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1)
>>                {
>>                    num = 0;
>>                    numi = 0;
>>                    numj = 0;
>>                    numk = 0;
>>                    if (k != 0)
>>                    {
>>                        v[num] = -scaleZ;
>>                        col[num].i = i;
>>                        col[num].j = j;
>>                        col[num].k = k - 1;
>>                        num++;
>>                        numk++;
>>                    }
>>                    if (j != 0)
>>                    {
>>                        v[num] = -scaleY;
>>                        col[num].i = i;
>>                        col[num].j = j - 1;
>>                        col[num].k = k;
>>                        num++;
>>                        numj++;
>>                    }
>>                    if (i != 0)
>>                    {
>>                        v[num] = -scaleX;
>>                        col[num].i = i - 1;
>>                        col[num].j = j;
>>                        col[num].k = k;
>>                        num++;
>>                        numi++;
>>                    }
>>                    if (i != mx - 1)
>>                    {
>>                        v[num] = -scaleX;
>>                        col[num].i = i + 1;
>>                        col[num].j = j;
>>                        col[num].k = k;
>>                        num++;
>>                        numi++;
>>                    }
>>                    if (j != my - 1)
>>                    {
>>                        v[num] = -scaleY;
>>                        col[num].i = i;
>>                        col[num].j = j + 1;
>>                        col[num].k = k;
>>                        num++;
>>                        numj++;
>>                    }
>>                    if (k != mz - 1)
>>                    {
>>                        v[num] = -scaleZ;
>>                        col[num].i = i;
>>                        col[num].j = j;
>>                        col[num].k = k + 1;
>>                        num++;
>>                        numk++;
>>                    }
>> 
>>                    if (bcType == NEUMANN)
>>                    {
>>                        v[num] = (PetscReal) (numk) * scaleZ + (PetscReal) (numj) * scaleY + (PetscReal) (numi) * scaleX;
>>                    }
>>                    else if (bcType == DIRICHLET)
>>                    {
>>                        v[num] = 2.0 * (scaleX + scaleY + scaleZ);
>>                    }
>> 
>>                    col[num].i = i;
>>                    col[num].j = j;
>>                    col[num].k = k;
>>                    num++;
>>                    ierr = MatSetValuesStencil(jac, 1, &row, num, col, v, INSERT_VALUES);
>>                    CHKERRQ(ierr);
>>                }
>>                else
>>                {
>>                    v[0] = -scaleZ;
>>                    col[0].i = i;
>>                    col[0].j = j;
>>                    col[0].k = k - 1;
>>                    v[1] = -scaleY;
>>                    col[1].i = i;
>>                    col[1].j = j - 1;
>>                    col[1].k = k;
>>                    v[2] = -scaleX;
>>                    col[2].i = i - 1;
>>                    col[2].j = j;
>>                    col[2].k = k;
>>                    v[3] = 2.0 * (scaleX + scaleY + scaleZ);
>>                    col[3].i = i;
>>                    col[3].j = j;
>>                    col[3].k = k;
>>                    v[4] = -scaleX;
>>                    col[4].i = i + 1;
>>                    col[4].j = j;
>>                    col[4].k = k;
>>                    v[5] = -scaleY;
>>                    col[5].i = i;
>>                    col[5].j = j + 1;
>>                    col[5].k = k;
>>                    v[6] = -scaleZ;
>>                    col[6].i = i;
>>                    col[6].j = j;
>>                    col[6].k = k + 1;
>>                    ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, INSERT_VALUES);
>>                    CHKERRQ(ierr);
>>                }
>>            }
>>        }
>>    }
>>    ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>    ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>    if (bcType == NEUMANN)
>>    {
>>        MatNullSpace   nullspace;
>>        ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr);
>>        ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr);
>>        ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr);
>>    }
>>    PetscFunctionReturn(0);
>> }
>> 
>> 
>>> On Jun 22, 2017, at 9:23 AM, Matthew Knepley <knepley at gmail.com> wrote:
>>> 
>>> On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley <jason.lefley at aclectic.com> wrote:
>>> Hello,
>>> 
>>> We are attempting to use the PETSc KSP solver framework in a fluid dynamics simulation we developed. The solution is part of a pressure projection and solves a Poisson problem. We use a cell-centered layout with a regular grid in 3d. We started with ex34.c from the KSP tutorials since it has the correct calls for the 3d DMDA, uses a cell-centered layout, and states that it works with multi-grid. We modified the operator construction function to match the coefficients and Dirichlet boundary conditions used in our problem (we’d also like to support Neumann but left those out for now to keep things simple). As a result of the modified boundary conditions, our version does not perform a null space removal on the right hand side or operator as the original did. We also modified the right hand side to contain a sinusoidal pattern for testing. Other than these changes, our code is the same as the original ex34.c
>>> 
>>> With the default KSP options and using CG with the default pre-conditioner and without a pre-conditioner, we see good convergence. However, we’d like to accelerate the time to solution further and target larger problem sizes (>= 1024^3) if possible. Given these objectives, multi-grid as a pre-conditioner interests us. To understand the improvement that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG and no pre-conditioner appears to converge in a single iteration and we wanted to compare against a problem that has similar convergence patterns to our problem. Here’s the tests we ran with ex45:
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129
>>>        time in KSPSolve(): 7.0178e+00
>>>        solver iterations: 157
>>>        KSP final norm of residual: 3.16874e-05
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg -pc_type none
>>>        time in KSPSolve(): 4.1072e+00
>>>        solver iterations: 213
>>>        KSP final norm of residual: 0.000138866
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg
>>>        time in KSPSolve(): 3.3962e+00
>>>        solver iterations: 88
>>>        KSP final norm of residual: 6.46242e-05
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi
>>>        time in KSPSolve(): 1.3201e+00
>>>        solver iterations: 4
>>>        KSP final norm of residual: 8.13339e-05
>>> 
>>> mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg
>>>        time in KSPSolve(): 1.3008e+00
>>>        solver iterations: 4
>>>        KSP final norm of residual: 2.21474e-05
>>> 
>>> We found the multi-grid pre-conditioner options in the KSP tutorials makefile. These results make sense; both the default GMRES and CG solvers converge and CG without a pre-conditioner takes more iterations. The multi-grid pre-conditioned runs are pretty dramatically accelerated and require only a handful of iterations.
>>> 
>>> We ran our code (modified ex34.c as described above) with the same parameters:
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128
>>>        time in KSPSolve(): 5.3729e+00
>>>        solver iterations: 123
>>>        KSP final norm of residual: 0.00595066
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg -pc_type none
>>>        time in KSPSolve(): 3.6154e+00
>>>        solver iterations: 188
>>>        KSP final norm of residual: 0.00505943
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg
>>>        time in KSPSolve(): 3.5661e+00
>>>        solver iterations: 98
>>>        KSP final norm of residual: 0.00967462
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi
>>>        time in KSPSolve(): 4.5606e+00
>>>        solver iterations: 44
>>>        KSP final norm of residual: 949.553
>>> 
>>> 1) Dave is right
>>> 
>>> 2) In order to see how many iterates to expect, first try using algebraic multigrid
>>> 
>>>  -pc_type gamg
>>> 
>>> This should work out of the box for Poisson
>>> 
>>> 3) For questions like this, we really need to see
>>> 
>>>  -ksp_view -ksp_monitor_true_residual
>>> 
>>> 4) It sounds like you smoother is not strong enough. You could try
>>> 
>>>  -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5
>>> 
>>> or maybe GMRES until it works.
>>> 
>>> Thanks,
>>> 
>>>    Matt
>>> 
>>> mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg
>>>        time in KSPSolve(): 1.5481e+01
>>>        solver iterations: 198
>>>        KSP final norm of residual: 0.916558
>>> 
>>> We performed all tests with petsc-3.7.6.
>>> 
>>> The trends with CG and GMRES seem consistent with the results from ex45. However, with multi-grid, something doesn’t seem right. Convergence seems poor and the solves run for many more iterations than ex45 with multi-grid as a pre-conditioner. I extensively validated the code that builds the matrix and also confirmed that the solution produced by CG, when evaluated with the system of equations elsewhere in our simulation, produces the same residual as indicated by PETSc. Given that we only made minimal modifications to the original example code, it seems likely that the operators constructed for the multi-grid levels are ok.
>>> 
>>> We also tried a variety of other suggested parameters for the multi-grid pre-conditioner as suggested in related mailing list posts but we didn’t observe any significant improvements over the results above.
>>> 
>>> Is there anything we can do to check the validity of the coefficient matrices built for the different multi-grid levels? Does it look like there could be problems there? Or any other suggestions to achieve better results with multi-grid? I have the -log_view, -ksp_view, and convergence monitor output from the above tests and can post any of it if it would assist.
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>> 
>>> http://www.caam.rice.edu/~mk51/
>> 
> 



More information about the petsc-users mailing list