<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Thanks for the prompt replies. I ran with gamg and the results look more promising. I tried the suggested -mg_* options and did not see improvement. The -ksp_view and -ksp_monitor_true_residual output from those tests and the solver_test source (modified ex34.c) follow:</div><div class=""><br class=""></div>$ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type gamg -ksp_type cg<div class=""><div class="">right hand side 2 norm: 512.</div><div class="">right hand side infinity norm: 0.999097</div><div class="">building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128</div><div class="">  0 KSP preconditioned resid norm 2.600515167901e+00 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00</div><div class="">  1 KSP preconditioned resid norm 6.715532962879e-02 true resid norm 7.578946422553e+02 ||r(i)||/||b|| 1.480262973155e+00</div><div class="">  2 KSP preconditioned resid norm 1.127682308441e-02 true resid norm 3.247852182315e+01 ||r(i)||/||b|| 6.343461293584e-02</div><div class="">  3 KSP preconditioned resid norm 7.760468503025e-04 true resid norm 3.304142895659e+00 ||r(i)||/||b|| 6.453404093085e-03</div><div class="">  4 KSP preconditioned resid norm 6.419777870067e-05 true resid norm 2.662993775521e-01 ||r(i)||/||b|| 5.201159717815e-04</div><div class="">  5 KSP preconditioned resid norm 5.107540549482e-06 true resid norm 2.309528369351e-02 ||r(i)||/||b|| 4.510797596388e-05</div><div class="">KSP Object: 16 MPI processes</div><div class="">  type: cg</div><div class="">  maximum iterations=10000, initial guess is zero</div><div class="">  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">  left preconditioning</div><div class="">  using PRECONDITIONED norm type for convergence test</div><div class="">PC Object: 16 MPI processes</div><div class="">  type: gamg</div><div class="">    MG: type is MULTIPLICATIVE, levels=5 cycles=v</div><div class="">      Cycles per PCApply=1</div><div class="">      Using Galerkin computed coarse grid matrices</div><div class="">      GAMG specific options</div><div class="">        Threshold for dropping small values from graph 0.</div><div class="">        AGG specific options</div><div class="">          Symmetric graph false</div><div class="">  Coarse grid solver -- level -------------------------------</div><div class="">    KSP Object:    (mg_coarse_)     16 MPI processes</div><div class="">      type: preonly</div><div class="">      maximum iterations=10000, initial guess is zero</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_coarse_)     16 MPI processes</div><div class="">      type: bjacobi</div><div class="">        block Jacobi: number of blocks = 16</div><div class="">        Local solve is same for all blocks, in the following KSP and PC objects:</div><div class="">      KSP Object:      (mg_coarse_sub_)       1 MPI processes</div><div class="">        type: preonly</div><div class="">        maximum iterations=1, initial guess is zero</div><div class="">        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">        left preconditioning</div><div class="">        using NONE norm type for convergence test</div><div class="">      PC Object:      (mg_coarse_sub_)       1 MPI processes</div><div class="">        type: lu</div><div class="">          LU: out-of-place factorization</div><div class="">          tolerance for zero pivot 2.22045e-14</div><div class="">          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]</div><div class="">          matrix ordering: nd</div><div class="">          factor fill ratio given 5., needed 1.</div><div class="">            Factored matrix follows:</div><div class="">              Mat Object:               1 MPI processes</div><div class="">                type: seqaij</div><div class="">                rows=13, cols=13</div><div class="">                package used to perform factorization: petsc</div><div class="">                total: nonzeros=169, allocated nonzeros=169</div><div class="">                total number of mallocs used during MatSetValues calls =0</div><div class="">                  using I-node routines: found 3 nodes, limit used is 5</div><div class="">        linear system matrix = precond matrix:</div><div class="">        Mat Object:         1 MPI processes</div><div class="">          type: seqaij</div><div class="">          rows=13, cols=13</div><div class="">          total: nonzeros=169, allocated nonzeros=169</div><div class="">          total number of mallocs used during MatSetValues calls =0</div><div class="">            using I-node routines: found 3 nodes, limit used is 5</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=13, cols=13</div><div class="">        total: nonzeros=169, allocated nonzeros=169</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">          using I-node (on process 0) routines: found 3 nodes, limit used is 5</div><div class="">  Down solver (pre-smoother) on level 1 -------------------------------</div><div class="">    KSP Object:    (mg_levels_1_)     16 MPI processes</div><div class="">      type: chebyshev</div><div class="">        Chebyshev: eigenvalue estimates:  min = 0.136516, max = 1.50168</div><div class="">        Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]</div><div class="">        KSP Object:        (mg_levels_1_esteig_)         16 MPI processes</div><div class="">          type: gmres</div><div class="">            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div class="">            GMRES: happy breakdown tolerance 1e-30</div><div class="">          maximum iterations=10, initial guess is zero</div><div class="">          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.</div><div class="">          left preconditioning</div><div class="">          using PRECONDITIONED norm type for convergence test</div><div class="">      maximum iterations=2</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using nonzero initial guess</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_levels_1_)     16 MPI processes</div><div class="">      type: sor</div><div class="">        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=467, cols=467</div><div class="">        total: nonzeros=68689, allocated nonzeros=68689</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">          not using I-node (on process 0) routines</div><div class="">  Up solver (post-smoother) same as down solver (pre-smoother)</div><div class="">  Down solver (pre-smoother) on level 2 -------------------------------</div><div class="">    KSP Object:    (mg_levels_2_)     16 MPI processes</div><div class="">      type: chebyshev</div><div class="">        Chebyshev: eigenvalue estimates:  min = 0.148872, max = 1.63759</div><div class="">        Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]</div><div class="">        KSP Object:        (mg_levels_2_esteig_)         16 MPI processes</div><div class="">          type: gmres</div><div class="">            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div class="">            GMRES: happy breakdown tolerance 1e-30</div><div class="">          maximum iterations=10, initial guess is zero</div><div class="">          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.</div><div class="">          left preconditioning</div><div class="">          using PRECONDITIONED norm type for convergence test</div><div class="">      maximum iterations=2</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using nonzero initial guess</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_levels_2_)     16 MPI processes</div><div class="">      type: sor</div><div class="">        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=14893, cols=14893</div><div class="">        total: nonzeros=1856839, allocated nonzeros=1856839</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">          not using I-node (on process 0) routines</div><div class="">  Up solver (post-smoother) same as down solver (pre-smoother)</div><div class="">  Down solver (pre-smoother) on level 3 -------------------------------</div><div class="">    KSP Object:    (mg_levels_3_)     16 MPI processes</div><div class="">      type: chebyshev</div><div class="">        Chebyshev: eigenvalue estimates:  min = 0.135736, max = 1.49309</div><div class="">        Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]</div><div class="">        KSP Object:        (mg_levels_3_esteig_)         16 MPI processes</div><div class="">          type: gmres</div><div class="">            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div class="">            GMRES: happy breakdown tolerance 1e-30</div><div class="">          maximum iterations=10, initial guess is zero</div><div class="">          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.</div><div class="">          left preconditioning</div><div class="">          using PRECONDITIONED norm type for convergence test</div><div class="">      maximum iterations=2</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using nonzero initial guess</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_levels_3_)     16 MPI processes</div><div class="">      type: sor</div><div class="">        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=190701, cols=190701</div><div class="">        total: nonzeros=6209261, allocated nonzeros=6209261</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">          not using I-node (on process 0) routines</div><div class="">  Up solver (post-smoother) same as down solver (pre-smoother)</div><div class="">  Down solver (pre-smoother) on level 4 -------------------------------</div><div class="">    KSP Object:    (mg_levels_4_)     16 MPI processes</div><div class="">      type: chebyshev</div><div class="">        Chebyshev: eigenvalue estimates:  min = 0.140039, max = 1.54043</div><div class="">        Chebyshev: eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]</div><div class="">        KSP Object:        (mg_levels_4_esteig_)         16 MPI processes</div><div class="">          type: gmres</div><div class="">            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</div><div class="">            GMRES: happy breakdown tolerance 1e-30</div><div class="">          maximum iterations=10, initial guess is zero</div><div class="">          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.</div><div class="">          left preconditioning</div><div class="">          using PRECONDITIONED norm type for convergence test</div><div class="">      maximum iterations=2</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using nonzero initial guess</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_levels_4_)     16 MPI processes</div><div class="">      type: sor</div><div class="">        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=2097152, cols=2097152</div><div class="">        total: nonzeros=14581760, allocated nonzeros=14581760</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">  Up solver (post-smoother) same as down solver (pre-smoother)</div><div class="">  linear system matrix = precond matrix:</div><div class="">  Mat Object:   16 MPI processes</div><div class="">    type: mpiaij</div><div class="">    rows=2097152, cols=2097152</div><div class="">    total: nonzeros=14581760, allocated nonzeros=14581760</div><div class="">    total number of mallocs used during MatSetValues calls =0</div><div class="">Residual 2 norm 0.0230953</div><div class="">Residual infinity norm 0.000240174</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">$ mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_view -ksp_monitor_true_residual -pc_type mg -ksp_type cg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5</div><div class=""><div class="">right hand side 2 norm: 512.</div><div class="">right hand side infinity norm: 0.999097</div><div class="">building operator with Dirichlet boundary conditions, global grid size: 128 x 128 x 128</div><div class="">building operator with Dirichlet boundary conditions, global grid size: 16 x 16 x 16</div><div class="">building operator with Dirichlet boundary conditions, global grid size: 32 x 32 x 32</div><div class="">building operator with Dirichlet boundary conditions, global grid size: 64 x 64 x 64</div><div class="">building operator with Dirichlet boundary conditions, global grid size: 8 x 8 x 8</div><div class="">  0 KSP preconditioned resid norm 1.957390963372e+03 true resid norm 5.120000000000e+02 ||r(i)||/||b|| 1.000000000000e+00</div><div class="">  1 KSP preconditioned resid norm 7.501162328351e+02 true resid norm 3.373318498950e+02 ||r(i)||/||b|| 6.588512693262e-01</div><div class="">  2 KSP preconditioned resid norm 7.658993705113e+01 true resid norm 1.827365322620e+02 ||r(i)||/||b|| 3.569072895742e-01</div><div class="">  3 KSP preconditioned resid norm 9.059824824329e+02 true resid norm 1.426474831278e+02 ||r(i)||/||b|| 2.786083654840e-01</div><div class="">  4 KSP preconditioned resid norm 4.091168582134e+02 true resid norm 1.292495057977e+02 ||r(i)||/||b|| 2.524404410112e-01</div><div class="">  5 KSP preconditioned resid norm 7.422110759274e+01 true resid norm 1.258028404461e+02 ||r(i)||/||b|| 2.457086727463e-01</div><div class="">  6 KSP preconditioned resid norm 4.619015396949e+01 true resid norm 1.213792421102e+02 ||r(i)||/||b|| 2.370688322464e-01</div><div class="">  7 KSP preconditioned resid norm 6.391009527793e+01 true resid norm 1.124510270422e+02 ||r(i)||/||b|| 2.196309121917e-01</div><div class="">  8 KSP preconditioned resid norm 7.446926604265e+01 true resid norm 1.077567310933e+02 ||r(i)||/||b|| 2.104623654166e-01</div><div class="">  9 KSP preconditioned resid norm 4.220904319642e+01 true resid norm 9.988181971539e+01 ||r(i)||/||b|| 1.950816791316e-01</div><div class=""> 10 KSP preconditioned resid norm 2.394387980018e+01 true resid norm 9.127579669592e+01 ||r(i)||/||b|| 1.782730404217e-01</div><div class=""> 11 KSP preconditioned resid norm 1.360843954226e+01 true resid norm 8.771762326371e+01 ||r(i)||/||b|| 1.713234829369e-01</div><div class=""> 12 KSP preconditioned resid norm 4.128223286694e+01 true resid norm 8.529182941649e+01 ||r(i)||/||b|| 1.665856043291e-01</div><div class=""> 13 KSP preconditioned resid norm 2.183532094447e+01 true resid norm 8.263211340769e+01 ||r(i)||/||b|| 1.613908464994e-01</div><div class=""> 14 KSP preconditioned resid norm 1.304178992338e+01 true resid norm 7.971822602122e+01 ||r(i)||/||b|| 1.556996601977e-01</div><div class=""> 15 KSP preconditioned resid norm 7.573349141411e+00 true resid norm 7.520975377445e+01 ||r(i)||/||b|| 1.468940503407e-01</div><div class=""> 16 KSP preconditioned resid norm 9.314890793459e+00 true resid norm 7.304954328407e+01 ||r(i)||/||b|| 1.426748892267e-01</div><div class=""> 17 KSP preconditioned resid norm 4.445933446231e+00 true resid norm 6.978356031428e+01 ||r(i)||/||b|| 1.362960162388e-01</div><div class=""> 18 KSP preconditioned resid norm 5.349719054065e+00 true resid norm 6.667516877214e+01 ||r(i)||/||b|| 1.302249390081e-01</div><div class=""> 19 KSP preconditioned resid norm 3.295861671942e+00 true resid norm 6.182140339659e+01 ||r(i)||/||b|| 1.207449285090e-01</div><div class=""> 20 KSP preconditioned resid norm 1.035616277789e+01 true resid norm 5.734720030036e+01 ||r(i)||/||b|| 1.120062505866e-01</div><div class=""> 21 KSP preconditioned resid norm 3.211186072853e+01 true resid norm 5.552393909940e+01 ||r(i)||/||b|| 1.084451935535e-01</div><div class=""> 22 KSP preconditioned resid norm 1.305589450595e+01 true resid norm 5.499062776214e+01 ||r(i)||/||b|| 1.074035698479e-01</div><div class=""> 23 KSP preconditioned resid norm 2.686432456763e+00 true resid norm 5.207613218582e+01 ||r(i)||/||b|| 1.017111956754e-01</div><div class=""> 24 KSP preconditioned resid norm 2.824784197849e+00 true resid norm 4.838619801451e+01 ||r(i)||/||b|| 9.450429299708e-02</div><div class=""> 25 KSP preconditioned resid norm 1.071690618667e+00 true resid norm 4.607851421273e+01 ||r(i)||/||b|| 8.999709807174e-02</div><div class=""> 26 KSP preconditioned resid norm 1.881879145107e+00 true resid norm 4.001593265961e+01 ||r(i)||/||b|| 7.815611847581e-02</div><div class=""> 27 KSP preconditioned resid norm 1.572862295402e+00 true resid norm 3.838282973517e+01 ||r(i)||/||b|| 7.496646432650e-02</div><div class=""> 28 KSP preconditioned resid norm 1.470751639074e+00 true resid norm 3.480847634691e+01 ||r(i)||/||b|| 6.798530536506e-02</div><div class=""> 29 KSP preconditioned resid norm 1.024975253805e+01 true resid norm 3.242161363347e+01 ||r(i)||/||b|| 6.332346412788e-02</div><div class=""> 30 KSP preconditioned resid norm 2.548780607710e+00 true resid norm 3.146609403253e+01 ||r(i)||/||b|| 6.145721490728e-02</div><div class=""> 31 KSP preconditioned resid norm 1.560691471465e+00 true resid norm 2.970265802267e+01 ||r(i)||/||b|| 5.801300395052e-02</div><div class=""> 32 KSP preconditioned resid norm 2.596714997356e+00 true resid norm 2.766969046763e+01 ||r(i)||/||b|| 5.404236419458e-02</div><div class=""> 33 KSP preconditioned resid norm 7.034818331385e+00 true resid norm 2.684572557056e+01 ||r(i)||/||b|| 5.243305775501e-02</div><div class=""> 34 KSP preconditioned resid norm 1.494072683898e+00 true resid norm 2.475430030960e+01 ||r(i)||/||b|| 4.834824279219e-02</div><div class=""> 35 KSP preconditioned resid norm 2.080781323538e+01 true resid norm 2.334859550417e+01 ||r(i)||/||b|| 4.560272559409e-02</div><div class=""> 36 KSP preconditioned resid norm 2.046655096031e+00 true resid norm 2.240354154839e+01 ||r(i)||/||b|| 4.375691708669e-02</div><div class=""> 37 KSP preconditioned resid norm 7.606846976760e-01 true resid norm 2.109556419574e+01 ||r(i)||/||b|| 4.120227381981e-02</div><div class=""> 38 KSP preconditioned resid norm 2.521301363193e+00 true resid norm 1.843497075964e+01 ||r(i)||/||b|| 3.600580226493e-02</div><div class=""> 39 KSP preconditioned resid norm 3.726976470079e+00 true resid norm 1.794209917279e+01 ||r(i)||/||b|| 3.504316244686e-02</div><div class=""> 40 KSP preconditioned resid norm 8.959884762705e-01 true resid norm 1.573137783532e+01 ||r(i)||/||b|| 3.072534733461e-02</div><div class=""> 41 KSP preconditioned resid norm 1.227682448888e+00 true resid norm 1.501346415860e+01 ||r(i)||/||b|| 2.932317218476e-02</div><div class=""> 42 KSP preconditioned resid norm 1.452770736534e+00 true resid norm 1.433942919922e+01 ||r(i)||/||b|| 2.800669765473e-02</div><div class=""> 43 KSP preconditioned resid norm 5.675352390898e-01 true resid norm 1.216437815936e+01 ||r(i)||/||b|| 2.375855109250e-02</div><div class=""> 44 KSP preconditioned resid norm 4.949409351772e-01 true resid norm 1.042812110399e+01 ||r(i)||/||b|| 2.036742403123e-02</div><div class=""> 45 KSP preconditioned resid norm 2.002853875915e+00 true resid norm 9.309183650084e+00 ||r(i)||/||b|| 1.818199931657e-02</div><div class=""> 46 KSP preconditioned resid norm 3.745525627399e-01 true resid norm 8.522457639380e+00 ||r(i)||/||b|| 1.664542507691e-02</div><div class=""> 47 KSP preconditioned resid norm 1.811694613170e-01 true resid norm 7.531206553361e+00 ||r(i)||/||b|| 1.470938779953e-02</div><div class=""> 48 KSP preconditioned resid norm 1.782171623447e+00 true resid norm 6.764441307706e+00 ||r(i)||/||b|| 1.321179942911e-02</div><div class=""> 49 KSP preconditioned resid norm 2.299828236176e+00 true resid norm 6.702407994976e+00 ||r(i)||/||b|| 1.309064061519e-02</div><div class=""> 50 KSP preconditioned resid norm 1.273834849543e+00 true resid norm 6.053797247633e+00 ||r(i)||/||b|| 1.182382274928e-02</div><div class=""> 51 KSP preconditioned resid norm 2.719578737249e-01 true resid norm 5.470925517497e+00 ||r(i)||/||b|| 1.068540140136e-02</div><div class=""> 52 KSP preconditioned resid norm 4.663757145206e-01 true resid norm 5.005785517882e+00 ||r(i)||/||b|| 9.776924839614e-03</div><div class=""> 53 KSP preconditioned resid norm 1.292565284376e+00 true resid norm 4.881780753946e+00 ||r(i)||/||b|| 9.534728035050e-03</div><div class=""> 54 KSP preconditioned resid norm 1.867369610632e-01 true resid norm 4.496564950399e+00 ||r(i)||/||b|| 8.782353418749e-03</div><div class=""> 55 KSP preconditioned resid norm 5.249392115789e-01 true resid norm 4.092757959067e+00 ||r(i)||/||b|| 7.993667888803e-03</div><div class=""> 56 KSP preconditioned resid norm 1.924525961621e-01 true resid norm 3.780501481010e+00 ||r(i)||/||b|| 7.383791955098e-03</div><div class=""> 57 KSP preconditioned resid norm 5.779420386829e-01 true resid norm 3.213189014725e+00 ||r(i)||/||b|| 6.275759794385e-03</div><div class=""> 58 KSP preconditioned resid norm 5.955339076981e-01 true resid norm 3.112032435949e+00 ||r(i)||/||b|| 6.078188351463e-03</div><div class=""> 59 KSP preconditioned resid norm 3.750139060970e-01 true resid norm 2.999193364090e+00 ||r(i)||/||b|| 5.857799539239e-03</div><div class=""> 60 KSP preconditioned resid norm 1.384679712935e-01 true resid norm 2.745891157615e+00 ||r(i)||/||b|| 5.363068667216e-03</div><div class=""> 61 KSP preconditioned resid norm 7.632834890339e-02 true resid norm 2.176299405671e+00 ||r(i)||/||b|| 4.250584776702e-03</div><div class=""> 62 KSP preconditioned resid norm 3.147491994853e-01 true resid norm 1.832893972188e+00 ||r(i)||/||b|| 3.579871039430e-03</div><div class=""> 63 KSP preconditioned resid norm 5.052243308649e-01 true resid norm 1.775115122392e+00 ||r(i)||/||b|| 3.467021723421e-03</div><div class=""> 64 KSP preconditioned resid norm 8.956523831283e-01 true resid norm 1.731441975933e+00 ||r(i)||/||b|| 3.381722609244e-03</div><div class=""> 65 KSP preconditioned resid norm 7.897527588669e-01 true resid norm 1.682654829619e+00 ||r(i)||/||b|| 3.286435214100e-03</div><div class=""> 66 KSP preconditioned resid norm 5.770941160165e-02 true resid norm 1.560734518349e+00 ||r(i)||/||b|| 3.048309606150e-03</div><div class=""> 67 KSP preconditioned resid norm 3.553024960194e-02 true resid norm 1.389699750667e+00 ||r(i)||/||b|| 2.714257325521e-03</div><div class=""> 68 KSP preconditioned resid norm 4.316233667769e-02 true resid norm 1.147051776028e+00 ||r(i)||/||b|| 2.240335500054e-03</div><div class=""> 69 KSP preconditioned resid norm 3.793691994632e-02 true resid norm 1.012385825627e+00 ||r(i)||/||b|| 1.977316065678e-03</div><div class=""> 70 KSP preconditioned resid norm 2.383460701011e-02 true resid norm 8.696480161436e-01 ||r(i)||/||b|| 1.698531281530e-03</div><div class=""> 71 KSP preconditioned resid norm 6.376655007996e-02 true resid norm 7.779779636534e-01 ||r(i)||/||b|| 1.519488210261e-03</div><div class=""> 72 KSP preconditioned resid norm 5.714768085413e-02 true resid norm 7.153671793501e-01 ||r(i)||/||b|| 1.397201522168e-03</div><div class=""> 73 KSP preconditioned resid norm 1.708395350387e-01 true resid norm 6.312992319936e-01 ||r(i)||/||b|| 1.233006312487e-03</div><div class=""> 74 KSP preconditioned resid norm 1.498516783452e-01 true resid norm 6.006527781743e-01 ||r(i)||/||b|| 1.173149957372e-03</div><div class=""> 75 KSP preconditioned resid norm 1.218071938641e-01 true resid norm 5.769463903876e-01 ||r(i)||/||b|| 1.126848418726e-03</div><div class=""> 76 KSP preconditioned resid norm 2.682030144251e-02 true resid norm 5.214035118381e-01 ||r(i)||/||b|| 1.018366234059e-03</div><div class=""> 77 KSP preconditioned resid norm 9.794744927328e-02 true resid norm 4.660318995939e-01 ||r(i)||/||b|| 9.102185538943e-04</div><div class=""> 78 KSP preconditioned resid norm 3.311394355245e-01 true resid norm 4.581129176231e-01 ||r(i)||/||b|| 8.947517922325e-04</div><div class=""> 79 KSP preconditioned resid norm 7.771705063438e-02 true resid norm 4.103510898511e-01 ||r(i)||/||b|| 8.014669723654e-04</div><div class=""> 80 KSP preconditioned resid norm 3.078123608908e-02 true resid norm 3.918493012988e-01 ||r(i)||/||b|| 7.653306665991e-04</div><div class=""> 81 KSP preconditioned resid norm 2.759088686744e-02 true resid norm 3.289360804743e-01 ||r(i)||/||b|| 6.424532821763e-04</div><div class=""> 82 KSP preconditioned resid norm 1.147671489846e-01 true resid norm 3.190902200515e-01 ||r(i)||/||b|| 6.232230860381e-04</div><div class=""> 83 KSP preconditioned resid norm 1.101306468440e-02 true resid norm 2.900815313985e-01 ||r(i)||/||b|| 5.665654910126e-04</div><div class="">KSP Object: 16 MPI processes</div><div class="">  type: cg</div><div class="">  maximum iterations=10000, initial guess is zero</div><div class="">  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">  left preconditioning</div><div class="">  using PRECONDITIONED norm type for convergence test</div><div class="">PC Object: 16 MPI processes</div><div class="">  type: mg</div><div class="">    MG: type is MULTIPLICATIVE, levels=5 cycles=v</div><div class="">      Cycles per PCApply=1</div><div class="">      Not using Galerkin computed coarse grid matrices</div><div class="">  Coarse grid solver -- level -------------------------------</div><div class="">    KSP Object:    (mg_coarse_)     16 MPI processes</div><div class="">      type: preonly</div><div class="">      maximum iterations=10000, initial guess is zero</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_coarse_)     16 MPI processes</div><div class="">      type: redundant</div><div class="">        Redundant preconditioner: First (color=0) of 16 PCs follows</div><div class="">        KSP Object:        (mg_coarse_redundant_)         1 MPI processes</div><div class="">          type: preonly</div><div class="">          maximum iterations=10000, initial guess is zero</div><div class="">          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">          left preconditioning</div><div class="">          using NONE norm type for convergence test</div><div class="">        PC Object:        (mg_coarse_redundant_)         1 MPI processes</div><div class="">          type: lu</div><div class="">            LU: out-of-place factorization</div><div class="">            tolerance for zero pivot 2.22045e-14</div><div class="">            using diagonal shift on blocks to prevent zero pivot [INBLOCKS]</div><div class="">            matrix ordering: nd</div><div class="">            factor fill ratio given 5., needed 7.56438</div><div class="">              Factored matrix follows:</div><div class="">                Mat Object:                 1 MPI processes</div><div class="">                  type: seqaij</div><div class="">                  rows=512, cols=512</div><div class="">                  package used to perform factorization: petsc</div><div class="">                  total: nonzeros=24206, allocated nonzeros=24206</div><div class="">                  total number of mallocs used during MatSetValues calls =0</div><div class="">                    not using I-node routines</div><div class="">          linear system matrix = precond matrix:</div><div class="">          Mat Object:           1 MPI processes</div><div class="">            type: seqaij</div><div class="">            rows=512, cols=512</div><div class="">            total: nonzeros=3200, allocated nonzeros=3200</div><div class="">            total number of mallocs used during MatSetValues calls =0</div><div class="">              not using I-node routines</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=512, cols=512</div><div class="">        total: nonzeros=3200, allocated nonzeros=3200</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">  Down solver (pre-smoother) on level 1 -------------------------------</div><div class="">    KSP Object:    (mg_levels_1_)     16 MPI processes</div><div class="">      type: richardson</div><div class="">        Richardson: using self-scale best computed damping factor</div><div class="">      maximum iterations=5</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using nonzero initial guess</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_levels_1_)     16 MPI processes</div><div class="">      type: sor</div><div class="">        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=4096, cols=4096</div><div class="">        total: nonzeros=27136, allocated nonzeros=27136</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">  Up solver (post-smoother) same as down solver (pre-smoother)</div><div class="">  Down solver (pre-smoother) on level 2 -------------------------------</div><div class="">    KSP Object:    (mg_levels_2_)     16 MPI processes</div><div class="">      type: richardson</div><div class="">        Richardson: using self-scale best computed damping factor</div><div class="">      maximum iterations=5</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using nonzero initial guess</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_levels_2_)     16 MPI processes</div><div class="">      type: sor</div><div class="">        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=32768, cols=32768</div><div class="">        total: nonzeros=223232, allocated nonzeros=223232</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">  Up solver (post-smoother) same as down solver (pre-smoother)</div><div class="">  Down solver (pre-smoother) on level 3 -------------------------------</div><div class="">    KSP Object:    (mg_levels_3_)     16 MPI processes</div><div class="">      type: richardson</div><div class="">        Richardson: using self-scale best computed damping factor</div><div class="">      maximum iterations=5</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using nonzero initial guess</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_levels_3_)     16 MPI processes</div><div class="">      type: sor</div><div class="">        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=262144, cols=262144</div><div class="">        total: nonzeros=1810432, allocated nonzeros=1810432</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">  Up solver (post-smoother) same as down solver (pre-smoother)</div><div class="">  Down solver (pre-smoother) on level 4 -------------------------------</div><div class="">    KSP Object:    (mg_levels_4_)     16 MPI processes</div><div class="">      type: richardson</div><div class="">        Richardson: using self-scale best computed damping factor</div><div class="">      maximum iterations=5</div><div class="">      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.</div><div class="">      left preconditioning</div><div class="">      using nonzero initial guess</div><div class="">      using NONE norm type for convergence test</div><div class="">    PC Object:    (mg_levels_4_)     16 MPI processes</div><div class="">      type: sor</div><div class="">        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1.</div><div class="">      linear system matrix = precond matrix:</div><div class="">      Mat Object:       16 MPI processes</div><div class="">        type: mpiaij</div><div class="">        rows=2097152, cols=2097152</div><div class="">        total: nonzeros=14581760, allocated nonzeros=14581760</div><div class="">        total number of mallocs used during MatSetValues calls =0</div><div class="">  Up solver (post-smoother) same as down solver (pre-smoother)</div><div class="">  linear system matrix = precond matrix:</div><div class="">  Mat Object:   16 MPI processes</div><div class="">    type: mpiaij</div><div class="">    rows=2097152, cols=2097152</div><div class="">    total: nonzeros=14581760, allocated nonzeros=14581760</div><div class="">    total number of mallocs used during MatSetValues calls =0</div><div class="">Residual 2 norm 0.290082</div><div class="">Residual infinity norm 0.00192869</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">solver_test.c:</div><div class=""><br class=""></div><div class=""><div class="">// modified version of ksp/ksp/examples/tutorials/ex34.c</div><div class="">// related: ksp/ksp/examples/tutorials/ex29.c</div><div class="">//          ksp/ksp/examples/tutorials/ex32.c</div><div class="">//          ksp/ksp/examples/tutorials/ex50.c</div><div class=""><br class=""></div><div class="">#include <petscdm.h></div><div class="">#include <petscdmda.h></div><div class="">#include <petscksp.h></div><div class=""><br class=""></div><div class="">extern PetscErrorCode ComputeMatrix(KSP,Mat,Mat,void*);</div><div class="">extern PetscErrorCode ComputeRHS(KSP,Vec,void*);</div><div class=""><br class=""></div><div class="">typedef enum</div><div class="">{</div><div class="">    DIRICHLET,</div><div class="">    NEUMANN</div><div class="">} BCType;</div><div class=""><br class=""></div><div class="">#undef __FUNCT__</div><div class="">#define __FUNCT__ "main"</div><div class="">int main(int argc,char **argv)</div><div class="">{</div><div class="">    KSP            ksp;</div><div class="">    DM             da;</div><div class="">    PetscReal      norm;</div><div class="">    PetscErrorCode ierr;</div><div class=""><br class=""></div><div class="">    PetscInt    i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs;</div><div class="">    PetscScalar Hx,Hy,Hz;</div><div class="">    PetscScalar ***array;</div><div class="">    Vec         x,b,r;</div><div class="">    Mat         J;</div><div class="">    const char* bcTypes[2] = { "dirichlet", "neumann" };</div><div class="">    PetscInt    bcType = (PetscInt)DIRICHLET;</div><div class=""><br class=""></div><div class="">    PetscInitialize(&argc,&argv,(char*)0,0);</div><div class=""><br class=""></div><div class="">    ierr = PetscOptionsBegin(PETSC_COMM_WORLD, "", "", "");CHKERRQ(ierr);</div><div class="">    ierr = PetscOptionsEList("-bc_type", "Type of boundary condition", "", bcTypes, 2, bcTypes[0], &bcType, NULL);CHKERRQ(ierr);</div><div class="">    ierr = PetscOptionsEnd();CHKERRQ(ierr);</div><div class=""><br class=""></div><div class="">    ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);</div><div class="">    ierr = DMDACreate3d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,-12,-12,-12,PETSC_DECIDE,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr);</div><div class="">    ierr = DMDASetInterpolationType(da, DMDA_Q0);CHKERRQ(ierr);</div><div class=""><br class=""></div><div class="">    ierr = KSPSetDM(ksp,da);CHKERRQ(ierr);</div><div class=""><br class=""></div><div class="">    ierr = KSPSetComputeRHS(ksp,ComputeRHS,&bcType);CHKERRQ(ierr);</div><div class="">    ierr = KSPSetComputeOperators(ksp,ComputeMatrix,&bcType);CHKERRQ(ierr);</div><div class="">    ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);</div><div class="">    ierr = KSPSolve(ksp,NULL,NULL);CHKERRQ(ierr);</div><div class="">    ierr = KSPGetSolution(ksp,&x);CHKERRQ(ierr);</div><div class="">    ierr = KSPGetRhs(ksp,&b);CHKERRQ(ierr);</div><div class="">    ierr = KSPGetOperators(ksp,NULL,&J);CHKERRQ(ierr);</div><div class="">    ierr = VecDuplicate(b,&r);CHKERRQ(ierr);</div><div class=""><br class=""></div><div class="">    ierr = MatMult(J,x,r);CHKERRQ(ierr);</div><div class="">    ierr = VecAYPX(r,-1.0,b);CHKERRQ(ierr);</div><div class="">    ierr = VecNorm(r,NORM_2,&norm);CHKERRQ(ierr);</div><div class="">    ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual 2 norm %g\n",(double)norm);CHKERRQ(ierr);</div><div class="">    ierr = VecNorm(r,NORM_INFINITY,&norm);CHKERRQ(ierr);</div><div class="">    ierr = PetscPrintf(PETSC_COMM_WORLD,"Residual infinity norm %g\n",(double)norm);CHKERRQ(ierr);</div><div class=""><br class=""></div><div class="">    ierr = VecDestroy(&r);CHKERRQ(ierr);</div><div class="">    ierr = KSPDestroy(&ksp);CHKERRQ(ierr);</div><div class="">    ierr = DMDestroy(&da);CHKERRQ(ierr);</div><div class="">    ierr = PetscFinalize();</div><div class="">    return 0;</div><div class="">}</div><div class=""><br class=""></div><div class="">#undef __FUNCT__</div><div class="">#define __FUNCT__ "ComputeRHS"</div><div class="">PetscErrorCode ComputeRHS(KSP ksp,Vec b,void *ctx)</div><div class="">{</div><div class="">    PetscErrorCode ierr;</div><div class="">    PetscInt       i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs;</div><div class="">    PetscScalar    Hx,Hy,Hz;</div><div class="">    PetscScalar    ***array;</div><div class="">    DM             da;</div><div class="">    BCType bcType = *(BCType*)ctx;</div><div class=""><br class=""></div><div class="">    PetscFunctionBeginUser;</div><div class="">    ierr = KSPGetDM(ksp,&da);CHKERRQ(ierr);</div><div class="">    ierr = DMDAGetInfo(da, 0, &mx, &my, &mz, 0,0,0,0,0,0,0,0,0);CHKERRQ(ierr);</div><div class="">    Hx   = 1.0 / (PetscReal)(mx);</div><div class="">    Hy   = 1.0 / (PetscReal)(my);</div><div class="">    Hz   = 1.0 / (PetscReal)(mz);</div><div class="">    ierr = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr);</div><div class="">    ierr = DMDAVecGetArray(da, b, &array);CHKERRQ(ierr);</div><div class="">    for (k = zs; k < zs + zm; k++)</div><div class="">    {</div><div class="">        for (j = ys; j < ys + ym; j++)</div><div class="">        {</div><div class="">            for (i = xs; i < xs + xm; i++)</div><div class="">            {</div><div class="">                PetscReal x = ((PetscReal)i + 0.5) * Hx;</div><div class="">                PetscReal y = ((PetscReal)j + 0.5) * Hy;</div><div class="">                PetscReal z = ((PetscReal)k + 0.5) * Hz;</div><div class="">                array[k][j][i] = PetscSinReal(x * 2.0 * PETSC_PI) * PetscCosReal(y * 2.0 * PETSC_PI) * PetscSinReal(z * 2.0 * PETSC_PI);</div><div class="">            }</div><div class="">        }</div><div class="">    }</div><div class="">    ierr = DMDAVecRestoreArray(da, b, &array);CHKERRQ(ierr);</div><div class="">    ierr = VecAssemblyBegin(b);CHKERRQ(ierr);</div><div class="">    ierr = VecAssemblyEnd(b);CHKERRQ(ierr);</div><div class=""><br class=""></div><div class="">    PetscReal norm;</div><div class="">    VecNorm(b, NORM_2, &norm);</div><div class="">    PetscPrintf(PETSC_COMM_WORLD, "right hand side 2 norm: %g\n", (double)norm);</div><div class="">    VecNorm(b, NORM_INFINITY, &norm);</div><div class="">    PetscPrintf(PETSC_COMM_WORLD, "right hand side infinity norm: %g\n", (double)norm);</div><div class=""><br class=""></div><div class="">    /* force right hand side to be consistent for singular matrix */</div><div class="">    /* note this is really a hack, normally the model would provide you with a consistent right handside */</div><div class=""><br class=""></div><div class="">    if (bcType == NEUMANN)</div><div class="">    {</div><div class="">        MatNullSpace nullspace;</div><div class="">        ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr);</div><div class="">        ierr = MatNullSpaceRemove(nullspace,b);CHKERRQ(ierr);</div><div class="">        ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr);</div><div class="">    }</div><div class="">    PetscFunctionReturn(0);</div><div class="">}</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">#undef __FUNCT__</div><div class="">#define __FUNCT__ "ComputeMatrix"</div><div class="">PetscErrorCode ComputeMatrix(KSP ksp, Mat J,Mat jac, void *ctx)</div><div class="">{</div><div class="">    PetscErrorCode ierr;</div><div class="">    PetscInt       i,j,k,mx,my,mz,xm,ym,zm,xs,ys,zs,num, numi, numj, numk;</div><div class="">    PetscScalar    v[7],Hx,Hy,Hz;</div><div class="">    MatStencil     row, col[7];</div><div class="">    DM             da;</div><div class="">    BCType bcType = *(BCType*)ctx;</div><div class=""><br class=""></div><div class="">    PetscFunctionBeginUser;</div><div class=""><br class=""></div><div class="">    if (bcType == DIRICHLET)</div><div class="">        PetscPrintf(PETSC_COMM_WORLD, "building operator with Dirichlet boundary conditions, ");</div><div class="">    else if (bcType == NEUMANN)</div><div class="">        PetscPrintf(PETSC_COMM_WORLD, "building operator with Neumann boundary conditions, ");</div><div class="">    else</div><div class="">        SETERRQ(PETSC_COMM_WORLD, PETSC_ERR_SUP, "unrecognized boundary condition type\n");</div><div class=""><br class=""></div><div class="">    ierr    = KSPGetDM(ksp,&da);CHKERRQ(ierr);</div><div class="">    ierr    = DMDAGetInfo(da,0,&mx,&my,&mz,0,0,0,0,0,0,0,0,0);CHKERRQ(ierr);</div><div class=""><br class=""></div><div class="">    PetscPrintf(PETSC_COMM_WORLD, "global grid size: %d x %d x %d\n", mx, my, mz);</div><div class=""><br class=""></div><div class="">    Hx      = 1.0 / (PetscReal)(mx);</div><div class="">    Hy      = 1.0 / (PetscReal)(my);</div><div class="">    Hz      = 1.0 / (PetscReal)(mz);</div><div class=""><br class=""></div><div class="">    PetscReal Hx2 = Hx * Hx;</div><div class="">    PetscReal Hy2 = Hy * Hy;</div><div class="">    PetscReal Hz2 = Hz * Hz;</div><div class=""><br class=""></div><div class="">    PetscReal scaleX = 1.0 / Hx2;</div><div class="">    PetscReal scaleY = 1.0 / Hy2;</div><div class="">    PetscReal scaleZ = 1.0 / Hz2;</div><div class=""><br class=""></div><div class="">    ierr    = DMDAGetCorners(da,&xs,&ys,&zs,&xm,&ym,&zm);CHKERRQ(ierr);</div><div class="">    for (k = zs; k < zs + zm; k++)</div><div class="">    {</div><div class="">        for (j = ys; j < ys + ym; j++)</div><div class="">        {</div><div class="">            for (i = xs; i < xs + xm; i++)</div><div class="">            {</div><div class="">                row.i = i;</div><div class="">                row.j = j;</div><div class="">                row.k = k;</div><div class="">                if (i == 0 || j == 0 || k == 0 || i == mx - 1 || j == my - 1 || k == mz - 1)</div><div class="">                {</div><div class="">                    num = 0;</div><div class="">                    numi = 0;</div><div class="">                    numj = 0;</div><div class="">                    numk = 0;</div><div class="">                    if (k != 0)</div><div class="">                    {</div><div class="">                        v[num] = -scaleZ;</div><div class="">                        col[num].i = i;</div><div class="">                        col[num].j = j;</div><div class="">                        col[num].k = k - 1;</div><div class="">                        num++;</div><div class="">                        numk++;</div><div class="">                    }</div><div class="">                    if (j != 0)</div><div class="">                    {</div><div class="">                        v[num] = -scaleY;</div><div class="">                        col[num].i = i;</div><div class="">                        col[num].j = j - 1;</div><div class="">                        col[num].k = k;</div><div class="">                        num++;</div><div class="">                        numj++;</div><div class="">                    }</div><div class="">                    if (i != 0)</div><div class="">                    {</div><div class="">                        v[num] = -scaleX;</div><div class="">                        col[num].i = i - 1;</div><div class="">                        col[num].j = j;</div><div class="">                        col[num].k = k;</div><div class="">                        num++;</div><div class="">                        numi++;</div><div class="">                    }</div><div class="">                    if (i != mx - 1)</div><div class="">                    {</div><div class="">                        v[num] = -scaleX;</div><div class="">                        col[num].i = i + 1;</div><div class="">                        col[num].j = j;</div><div class="">                        col[num].k = k;</div><div class="">                        num++;</div><div class="">                        numi++;</div><div class="">                    }</div><div class="">                    if (j != my - 1)</div><div class="">                    {</div><div class="">                        v[num] = -scaleY;</div><div class="">                        col[num].i = i;</div><div class="">                        col[num].j = j + 1;</div><div class="">                        col[num].k = k;</div><div class="">                        num++;</div><div class="">                        numj++;</div><div class="">                    }</div><div class="">                    if (k != mz - 1)</div><div class="">                    {</div><div class="">                        v[num] = -scaleZ;</div><div class="">                        col[num].i = i;</div><div class="">                        col[num].j = j;</div><div class="">                        col[num].k = k + 1;</div><div class="">                        num++;</div><div class="">                        numk++;</div><div class="">                    }</div><div class=""><br class=""></div><div class="">                    if (bcType == NEUMANN)</div><div class="">                    {</div><div class="">                        v[num] = (PetscReal) (numk) * scaleZ + (PetscReal) (numj) * scaleY + (PetscReal) (numi) * scaleX;</div><div class="">                    }</div><div class="">                    else if (bcType == DIRICHLET)</div><div class="">                    {</div><div class="">                        v[num] = 2.0 * (scaleX + scaleY + scaleZ);</div><div class="">                    }</div><div class=""><br class=""></div><div class="">                    col[num].i = i;</div><div class="">                    col[num].j = j;</div><div class="">                    col[num].k = k;</div><div class="">                    num++;</div><div class="">                    ierr = MatSetValuesStencil(jac, 1, &row, num, col, v, INSERT_VALUES);</div><div class="">                    CHKERRQ(ierr);</div><div class="">                }</div><div class="">                else</div><div class="">                {</div><div class="">                    v[0] = -scaleZ;</div><div class="">                    col[0].i = i;</div><div class="">                    col[0].j = j;</div><div class="">                    col[0].k = k - 1;</div><div class="">                    v[1] = -scaleY;</div><div class="">                    col[1].i = i;</div><div class="">                    col[1].j = j - 1;</div><div class="">                    col[1].k = k;</div><div class="">                    v[2] = -scaleX;</div><div class="">                    col[2].i = i - 1;</div><div class="">                    col[2].j = j;</div><div class="">                    col[2].k = k;</div><div class="">                    v[3] = 2.0 * (scaleX + scaleY + scaleZ);</div><div class="">                    col[3].i = i;</div><div class="">                    col[3].j = j;</div><div class="">                    col[3].k = k;</div><div class="">                    v[4] = -scaleX;</div><div class="">                    col[4].i = i + 1;</div><div class="">                    col[4].j = j;</div><div class="">                    col[4].k = k;</div><div class="">                    v[5] = -scaleY;</div><div class="">                    col[5].i = i;</div><div class="">                    col[5].j = j + 1;</div><div class="">                    col[5].k = k;</div><div class="">                    v[6] = -scaleZ;</div><div class="">                    col[6].i = i;</div><div class="">                    col[6].j = j;</div><div class="">                    col[6].k = k + 1;</div><div class="">                    ierr = MatSetValuesStencil(jac, 1, &row, 7, col, v, INSERT_VALUES);</div><div class="">                    CHKERRQ(ierr);</div><div class="">                }</div><div class="">            }</div><div class="">        }</div><div class="">    }</div><div class="">    ierr = MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);</div><div class="">    ierr = MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);</div><div class="">    if (bcType == NEUMANN)</div><div class="">    {</div><div class="">        MatNullSpace   nullspace;</div><div class="">        ierr = MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_TRUE,0,0,&nullspace);CHKERRQ(ierr);</div><div class="">        ierr = MatSetNullSpace(J,nullspace);CHKERRQ(ierr);</div><div class="">        ierr = MatNullSpaceDestroy(&nullspace);CHKERRQ(ierr);</div><div class="">    }</div><div class="">    PetscFunctionReturn(0);</div><div class="">}</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div><blockquote type="cite" class=""><div class="">On Jun 22, 2017, at 9:23 AM, Matthew Knepley <<a href="mailto:knepley@gmail.com" class="">knepley@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_extra"><div class="gmail_quote">On Wed, Jun 21, 2017 at 8:12 PM, Jason Lefley<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:jason.lefley@aclectic.com" target="_blank" class="">jason.lefley@aclectic.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;">Hello,<br class=""><br class="">We are attempting to use the PETSc KSP solver framework in a fluid dynamics simulation we developed. The solution is part of a pressure projection and solves a Poisson problem. We use a cell-centered layout with a regular grid in 3d. We started with ex34.c from the KSP tutorials since it has the correct calls for the 3d DMDA, uses a cell-centered layout, and states that it works with multi-grid. We modified the operator construction function to match the coefficients and Dirichlet boundary conditions used in our problem (we’d also like to support Neumann but left those out for now to keep things simple). As a result of the modified boundary conditions, our version does not perform a null space removal on the right hand side or operator as the original did. We also modified the right hand side to contain a sinusoidal pattern for testing. Other than these changes, our code is the same as the original ex34.c<br class=""><br class="">With the default KSP options and using CG with the default pre-conditioner and without a pre-conditioner, we see good convergence. However, we’d like to accelerate the time to solution further and target larger problem sizes (>= 1024^3) if possible. Given these objectives, multi-grid as a pre-conditioner interests us. To understand the improvement that multi-grid provides, we ran ex45 from the KSP tutorials. ex34 with CG and no pre-conditioner appears to converge in a single iteration and we wanted to compare against a problem that has similar convergence patterns to our problem. Here’s the tests we ran with ex45:<br class=""><br class="">mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 7.0178e+00<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 157<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 3.16874e-05<br class=""><br class="">mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg -pc_type none<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 4.1072e+00<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 213<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 0.000138866<br class=""><br class="">mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -ksp_type cg<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 3.3962e+00<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 88<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 6.46242e-05<br class=""><br class="">mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 1.3201e+00<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 4<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 8.13339e-05<br class=""><br class="">mpirun -n 16 ./ex45 -da_grid_x 129 -da_grid_y 129 -da_grid_z 129 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 1.3008e+00<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 4<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 2.21474e-05<br class=""><br class="">We found the multi-grid pre-conditioner options in the KSP tutorials makefile. These results make sense; both the default GMRES and CG solvers converge and CG without a pre-conditioner takes more iterations. The multi-grid pre-conditioned runs are pretty dramatically accelerated and require only a handful of iterations.<br class=""><br class="">We ran our code (modified ex34.c as described above) with the same parameters:<br class=""><br class="">mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 5.3729e+00<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 123<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 0.00595066<br class=""><br class="">mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg -pc_type none<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 3.6154e+00<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 188<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 0.00505943<br class=""><br class="">mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -ksp_type cg<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 3.5661e+00<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 98<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 0.00967462<br class=""><br class="">mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 4.5606e+00<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 44<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 949.553<br class=""></blockquote><div class=""><br class=""></div><div class="">1) Dave is right</div><div class=""><br class=""></div><div class="">2) In order to see how many iterates to expect, first try using algebraic multigrid</div><div class=""><br class=""></div><div class="">  -pc_type gamg</div><div class=""><br class=""></div><div class="">This should work out of the box for Poisson</div><div class=""><br class=""></div><div class="">3) For questions like this, we really need to see</div><div class=""><br class=""></div><div class="">  -ksp_view -ksp_monitor_true_residual</div><div class=""><br class=""></div><div class="">4) It sounds like you smoother is not strong enough. You could try</div><div class=""><br class=""></div><div class="">  -mg_levels_ksp_type richardson -mg_levels_ksp_richardson_self_scale -mg_levels_ksp_max_it 5</div><div class=""><br class=""></div><div class="">or maybe GMRES until it works.</div><div class=""><br class=""></div><div class=""> Thanks,</div><div class=""><br class=""></div><div class="">    Matt</div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;">mpirun -n 16 ./solver_test -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -pc_type mg -pc_mg_levels 5 -mg_levels_ksp_type richardson -mg_levels_ksp_max_it 1 -mg_levels_pc_type bjacobi -ksp_type cg<br class="">       <span class="Apple-converted-space"> </span>time in KSPSolve(): 1.5481e+01<br class="">       <span class="Apple-converted-space"> </span>solver iterations: 198<br class="">       <span class="Apple-converted-space"> </span>KSP final norm of residual: 0.916558<br class=""><br class="">We performed all tests with petsc-3.7.6.<br class=""><br class="">The trends with CG and GMRES seem consistent with the results from ex45. However, with multi-grid, something doesn’t seem right. Convergence seems poor and the solves run for many more iterations than ex45 with multi-grid as a pre-conditioner. I extensively validated the code that builds the matrix and also confirmed that the solution produced by CG, when evaluated with the system of equations elsewhere in our simulation, produces the same residual as indicated by PETSc. Given that we only made minimal modifications to the original example code, it seems likely that the operators constructed for the multi-grid levels are ok.<br class=""><br class="">We also tried a variety of other suggested parameters for the multi-grid pre-conditioner as suggested in related mailing list posts but we didn’t observe any significant improvements over the results above.<br class=""><br class="">Is there anything we can do to check the validity of the coefficient matrices built for the different multi-grid levels? Does it look like there could be problems there? Or any other suggestions to achieve better results with multi-grid? I have the -log_view, -ksp_view, and convergence monitor output from the above tests and can post any of it if it would assist.<br class=""><br class="">Thanks</blockquote></div><br class=""><br clear="all" class=""><div class=""><br class=""></div>--<span class="Apple-converted-space"> </span><br class=""><div class="gmail-m_-5669117813700017836gmail_signature"><div dir="ltr" class=""><div class="">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br class="">-- Norbert Wiener</div><div class=""><br class=""></div><div class=""><a href="http://www.caam.rice.edu/~mk51/" target="_blank" class="">http://www.caam.rice.edu/~<wbr class="">mk51/</a></div></div></div></div></div></div></blockquote></div><br class=""></div></body></html>