[petsc-users] strange convergence
Barry Smith
bsmith at mcs.anl.gov
Mon Apr 24 17:17:57 CDT 2017
This can happen in the matrix is singular or nearly singular or if the factorization generates small pivots, which can occur for even nonsingular problems if the matrix is poorly scaled or just plain nasty.
> On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
>
> It took a while, here I send you the output
>
> 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00
> 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05
> 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06
> 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06
> Linear solve converged due to CONVERGED_ATOL iterations 3
> KSP Object: 4 MPI processes
> type: gmres
> GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=1000, initial guess is zero
> tolerances: relative=1e-20, absolute=1e-09, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object: 4 MPI processes
> type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 0, needed 0
> Factored matrix follows:
> Mat Object: 4 MPI processes
> type: mpiaij
> rows=973051, cols=973051
> package used to perform factorization: pastix
> Error : 3.24786e-14
> total: nonzeros=0, allocated nonzeros=0
> total number of mallocs used during MatSetValues calls =0
> PaStiX run parameters:
> Matrix type : Unsymmetric
> Level of printing (0,1,2): 0
> Number of refinements iterations : 3
> Error : 3.24786e-14
> linear system matrix = precond matrix:
> Mat Object: 4 MPI processes
> type: mpiaij
> rows=973051, cols=973051
> Error : 3.24786e-14
> total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07
> total number of mallocs used during MatSetValues calls =0
> using I-node (on process 0) routines: found 78749 nodes, limit used is 5
> Error : 3.24786e-14
>
> It doesn't do as you said. Something is not right here. I will look in depth.
>
> Giang
>
> On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
> >
> > Good catch. I get this for the very first step, maybe at that time the rhs_w is zero.
>
> With the multiplicative composition the right hand side of the second solve is the initial right hand side of the second solve minus A_10*x where x is the solution to the first sub solve and A_10 is the lower left block of the outer matrix. So unless both the initial right hand side has a zero for the second block and A_10 is identically zero the right hand side for the second sub solve should not be zero. Is A_10 == 0?
>
>
> > In the later step, it shows 2 step convergence
> >
> > Residual norms for fieldsplit_u_ solve.
> > 0 KSP Residual norm 3.165886479830e+04
> > 1 KSP Residual norm 2.905922877684e-01
> > Residual norms for fieldsplit_wp_ solve.
> > 0 KSP Residual norm 2.397669419027e-01
> > 1 KSP Residual norm 0.000000000000e+00
> > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00
> > Residual norms for fieldsplit_u_ solve.
> > 0 KSP Residual norm 9.999891813771e-01
> > 1 KSP Residual norm 1.512000395579e-05
> > Residual norms for fieldsplit_wp_ solve.
> > 0 KSP Residual norm 8.192702188243e-06
> > 1 KSP Residual norm 0.000000000000e+00
> > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02
>
> The outer residual norms are still wonky, the preconditioned residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a huge drop but the 7.963616922323e+05 drops very much less 7.135927677844e+04. This is not normal.
>
> What if you just use -pc_type lu for the entire system (no fieldsplit), does the true residual drop to almost zero in the first iteration (as it should?). Send the output.
>
>
>
> > Residual norms for fieldsplit_u_ solve.
> > 0 KSP Residual norm 6.946213936597e-01
> > 1 KSP Residual norm 1.195514007343e-05
> > Residual norms for fieldsplit_wp_ solve.
> > 0 KSP Residual norm 1.025694497535e+00
> > 1 KSP Residual norm 0.000000000000e+00
> > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02
> > Residual norms for fieldsplit_u_ solve.
> > 0 KSP Residual norm 7.255149996405e-01
> > 1 KSP Residual norm 6.583512434218e-06
> > Residual norms for fieldsplit_wp_ solve.
> > 0 KSP Residual norm 1.015229700337e+00
> > 1 KSP Residual norm 0.000000000000e+00
> > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04
> > Residual norms for fieldsplit_u_ solve.
> > 0 KSP Residual norm 3.512243341400e-01
> > 1 KSP Residual norm 2.032490351200e-06
> > Residual norms for fieldsplit_wp_ solve.
> > 0 KSP Residual norm 1.282327290982e+00
> > 1 KSP Residual norm 0.000000000000e+00
> > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05
> > Residual norms for fieldsplit_u_ solve.
> > 0 KSP Residual norm 3.423609338053e-01
> > 1 KSP Residual norm 4.213703301972e-07
> > Residual norms for fieldsplit_wp_ solve.
> > 0 KSP Residual norm 1.157384757538e+00
> > 1 KSP Residual norm 0.000000000000e+00
> > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06
> > Residual norms for fieldsplit_u_ solve.
> > 0 KSP Residual norm 3.838596289995e-01
> > 1 KSP Residual norm 9.927864176103e-08
> > Residual norms for fieldsplit_wp_ solve.
> > 0 KSP Residual norm 1.066298905618e+00
> > 1 KSP Residual norm 0.000000000000e+00
> > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06
> > Residual norms for fieldsplit_u_ solve.
> > 0 KSP Residual norm 4.624964188094e-01
> > 1 KSP Residual norm 6.418229775372e-08
> > Residual norms for fieldsplit_wp_ solve.
> > 0 KSP Residual norm 9.800784311614e-01
> > 1 KSP Residual norm 0.000000000000e+00
> > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06
> > Linear solve converged due to CONVERGED_ATOL iterations 7
> >
> > The outer operator is an explicit matrix.
> >
> > Giang
> >
> > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
> > >
> > > Thanks Barry, trying with -fieldsplit_u_type lu gives better convergence. I still used 4 procs though, probably with 1 proc it should also be the same.
> > >
> > > The u block used a Nitsche-type operator to connect two non-matching domains. I don't think it will leave some rigid body motion leads to not sufficient constraints. Maybe you have other idea?
> > >
> > > Residual norms for fieldsplit_u_ solve.
> > > 0 KSP Residual norm 3.129067184300e+05
> > > 1 KSP Residual norm 5.906261468196e-01
> > > Residual norms for fieldsplit_wp_ solve.
> > > 0 KSP Residual norm 0.000000000000e+00
> >
> > ^^^^ something is wrong here. The sub solve should not be starting with a 0 residual (this means the right hand side for this sub solve is zero which it should not be).
> >
> > > FieldSplit with MULTIPLICATIVE composition: total splits = 2
> >
> >
> > How are you providing the outer operator? As an explicit matrix or with some shell matrix?
> >
> >
> >
> > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00
> > > Residual norms for fieldsplit_u_ solve.
> > > 0 KSP Residual norm 9.999955993437e-01
> > > 1 KSP Residual norm 4.019774691831e-06
> > > Residual norms for fieldsplit_wp_ solve.
> > > 0 KSP Residual norm 0.000000000000e+00
> > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06
> > > Residual norms for fieldsplit_u_ solve.
> > > 0 KSP Residual norm 1.000012180204e+00
> > > 1 KSP Residual norm 1.017367950422e-05
> > > Residual norms for fieldsplit_wp_ solve.
> > > 0 KSP Residual norm 0.000000000000e+00
> > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06
> > > Residual norms for fieldsplit_u_ solve.
> > > 0 KSP Residual norm 1.000004200085e+00
> > > 1 KSP Residual norm 6.231613102458e-06
> > > Residual norms for fieldsplit_wp_ solve.
> > > 0 KSP Residual norm 0.000000000000e+00
> > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06
> > > Linear solve converged due to CONVERGED_ATOL iterations 3
> > > KSP Object: 4 MPI processes
> > > type: gmres
> > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization
> > > GMRES: happy breakdown tolerance 1e-30
> > > maximum iterations=1000, initial guess is zero
> > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000
> > > left preconditioning
> > > using PRECONDITIONED norm type for convergence test
> > > PC Object: 4 MPI processes
> > > type: fieldsplit
> > > FieldSplit with MULTIPLICATIVE composition: total splits = 2
> > > Solver info for each split is in the following KSP objects:
> > > Split number 0 Defined by IS
> > > KSP Object: (fieldsplit_u_) 4 MPI processes
> > > type: richardson
> > > Richardson: damping factor=1
> > > maximum iterations=1, initial guess is zero
> > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> > > left preconditioning
> > > using PRECONDITIONED norm type for convergence test
> > > PC Object: (fieldsplit_u_) 4 MPI processes
> > > type: lu
> > > LU: out-of-place factorization
> > > tolerance for zero pivot 2.22045e-14
> > > matrix ordering: natural
> > > factor fill ratio given 0, needed 0
> > > Factored matrix follows:
> > > Mat Object: 4 MPI processes
> > > type: mpiaij
> > > rows=938910, cols=938910
> > > package used to perform factorization: pastix
> > > total: nonzeros=0, allocated nonzeros=0
> > > Error : 3.36878e-14
> > > total number of mallocs used during MatSetValues calls =0
> > > PaStiX run parameters:
> > > Matrix type : Unsymmetric
> > > Level of printing (0,1,2): 0
> > > Number of refinements iterations : 3
> > > Error : 3.36878e-14
> > > linear system matrix = precond matrix:
> > > Mat Object: (fieldsplit_u_) 4 MPI processes
> > > type: mpiaij
> > > rows=938910, cols=938910, bs=3
> > > Error : 3.36878e-14
> > > Error : 3.36878e-14
> > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07
> > > total number of mallocs used during MatSetValues calls =0
> > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5
> > > Split number 1 Defined by IS
> > > KSP Object: (fieldsplit_wp_) 4 MPI processes
> > > type: richardson
> > > Richardson: damping factor=1
> > > maximum iterations=1, initial guess is zero
> > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> > > left preconditioning
> > > using PRECONDITIONED norm type for convergence test
> > > PC Object: (fieldsplit_wp_) 4 MPI processes
> > > type: lu
> > > LU: out-of-place factorization
> > > tolerance for zero pivot 2.22045e-14
> > > matrix ordering: natural
> > > factor fill ratio given 0, needed 0
> > > Factored matrix follows:
> > > Mat Object: 4 MPI processes
> > > type: mpiaij
> > > rows=34141, cols=34141
> > > package used to perform factorization: pastix
> > > Error : -nan
> > > Error : -nan
> > > Error : -nan
> > > total: nonzeros=0, allocated nonzeros=0
> > > total number of mallocs used during MatSetValues calls =0
> > > PaStiX run parameters:
> > > Matrix type : Symmetric
> > > Level of printing (0,1,2): 0
> > > Number of refinements iterations : 0
> > > Error : -nan
> > > linear system matrix = precond matrix:
> > > Mat Object: (fieldsplit_wp_) 4 MPI processes
> > > type: mpiaij
> > > rows=34141, cols=34141
> > > total: nonzeros=485655, allocated nonzeros=485655
> > > total number of mallocs used during MatSetValues calls =0
> > > not using I-node (on process 0) routines
> > > linear system matrix = precond matrix:
> > > Mat Object: 4 MPI processes
> > > type: mpiaij
> > > rows=973051, cols=973051
> > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07
> > > total number of mallocs used during MatSetValues calls =0
> > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5
> > >
> > >
> > >
> > > Giang
> > >
> > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
> > > >
> > > > Dear Matt/Barry
> > > >
> > > > With your options, it results in
> > > >
> > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00
> > > > Residual norms for fieldsplit_u_ solve.
> > > > 0 KSP Residual norm 2.407308987203e+36
> > > > 1 KSP Residual norm 5.797185652683e+72
> > >
> > > It looks like Matt is right, hypre is seemly producing useless garbage.
> > >
> > > First how do things run on one process. If you have similar problems then debug on one process (debugging any kind of problem is always far easy on one process).
> > >
> > > First run with -fieldsplit_u_type lu (instead of using hypre) to see if that works or also produces something bad.
> > >
> > > What is the operator and the boundary conditions for u? It could be singular.
> > >
> > >
> > >
> > >
> > >
> > >
> > > > Residual norms for fieldsplit_wp_ solve.
> > > > 0 KSP Residual norm 0.000000000000e+00
> > > > ...
> > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00
> > > > Residual norms for fieldsplit_u_ solve.
> > > > 0 KSP Residual norm 1.533726746719e+36
> > > > 1 KSP Residual norm 3.692757392261e+72
> > > > Residual norms for fieldsplit_wp_ solve.
> > > > 0 KSP Residual norm 0.000000000000e+00
> > > >
> > > > Do you suggest that the pastix solver for the "wp" block encounters small pivot? In addition, seem like the "u" block is also singular.
> > > >
> > > > Giang
> > > >
> > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > >
> > > > Huge preconditioned norms but normal unpreconditioned norms almost always come from a very small pivot in an LU or ILU factorization.
> > > >
> > > > The first thing to do is monitor the two sub solves. Run with the additional options -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1
> > > >
> > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui <hgbk2008 at gmail.com> wrote:
> > > > >
> > > > > Hello
> > > > >
> > > > > I encountered a strange convergence behavior that I have trouble to understand
> > > > >
> > > > > KSPSetFromOptions completed
> > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00
> > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00
> > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00
> > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00
> > > > > .....
> > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00
> > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00
> > > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000
> > > > > KSP Object: 4 MPI processes
> > > > > type: gmres
> > > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization
> > > > > GMRES: happy breakdown tolerance 1e-30
> > > > > maximum iterations=1000, initial guess is zero
> > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000
> > > > > left preconditioning
> > > > > using PRECONDITIONED norm type for convergence test
> > > > > PC Object: 4 MPI processes
> > > > > type: fieldsplit
> > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2
> > > > > Solver info for each split is in the following KSP objects:
> > > > > Split number 0 Defined by IS
> > > > > KSP Object: (fieldsplit_u_) 4 MPI processes
> > > > > type: preonly
> > > > > maximum iterations=10000, initial guess is zero
> > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> > > > > left preconditioning
> > > > > using NONE norm type for convergence test
> > > > > PC Object: (fieldsplit_u_) 4 MPI processes
> > > > > type: hypre
> > > > > HYPRE BoomerAMG preconditioning
> > > > > HYPRE BoomerAMG: Cycle type V
> > > > > HYPRE BoomerAMG: Maximum number of levels 25
> > > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
> > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
> > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6
> > > > > HYPRE BoomerAMG: Interpolation truncation factor 0
> > > > > HYPRE BoomerAMG: Interpolation: max elements per row 0
> > > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
> > > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
> > > > > HYPRE BoomerAMG: Maximum row sums 0.9
> > > > > HYPRE BoomerAMG: Sweeps down 1
> > > > > HYPRE BoomerAMG: Sweeps up 1
> > > > > HYPRE BoomerAMG: Sweeps on coarse 1
> > > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
> > > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
> > > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
> > > > > HYPRE BoomerAMG: Relax weight (all) 1
> > > > > HYPRE BoomerAMG: Outer relax weight (all) 1
> > > > > HYPRE BoomerAMG: Using CF-relaxation
> > > > > HYPRE BoomerAMG: Measure type local
> > > > > HYPRE BoomerAMG: Coarsen type PMIS
> > > > > HYPRE BoomerAMG: Interpolation type classical
> > > > > linear system matrix = precond matrix:
> > > > > Mat Object: (fieldsplit_u_) 4 MPI processes
> > > > > type: mpiaij
> > > > > rows=938910, cols=938910, bs=3
> > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07
> > > > > total number of mallocs used during MatSetValues calls =0
> > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5
> > > > > Split number 1 Defined by IS
> > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes
> > > > > type: preonly
> > > > > maximum iterations=10000, initial guess is zero
> > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> > > > > left preconditioning
> > > > > using NONE norm type for convergence test
> > > > > PC Object: (fieldsplit_wp_) 4 MPI processes
> > > > > type: lu
> > > > > LU: out-of-place factorization
> > > > > tolerance for zero pivot 2.22045e-14
> > > > > matrix ordering: natural
> > > > > factor fill ratio given 0, needed 0
> > > > > Factored matrix follows:
> > > > > Mat Object: 4 MPI processes
> > > > > type: mpiaij
> > > > > rows=34141, cols=34141
> > > > > package used to perform factorization: pastix
> > > > > Error : -nan
> > > > > Error : -nan
> > > > > total: nonzeros=0, allocated nonzeros=0
> > > > > Error : -nan
> > > > > total number of mallocs used during MatSetValues calls =0
> > > > > PaStiX run parameters:
> > > > > Matrix type : Symmetric
> > > > > Level of printing (0,1,2): 0
> > > > > Number of refinements iterations : 0
> > > > > Error : -nan
> > > > > linear system matrix = precond matrix:
> > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes
> > > > > type: mpiaij
> > > > > rows=34141, cols=34141
> > > > > total: nonzeros=485655, allocated nonzeros=485655
> > > > > total number of mallocs used during MatSetValues calls =0
> > > > > not using I-node (on process 0) routines
> > > > > linear system matrix = precond matrix:
> > > > > Mat Object: 4 MPI processes
> > > > > type: mpiaij
> > > > > rows=973051, cols=973051
> > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07
> > > > > total number of mallocs used during MatSetValues calls =0
> > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5
> > > > >
> > > > > The pattern of convergence gives a hint that this system is somehow bad/singular. But I don't know why the preconditioned error goes up too high. Anyone has an idea?
> > > > >
> > > > > Best regards
> > > > > Giang Bui
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
More information about the petsc-users
mailing list