[petsc-users] Question with setting up KSP solver parameters.

Barry Smith bsmith at mcs.anl.gov
Mon May 5 09:03:01 CDT 2014


  If you run valgrind with the debug version of the libraries it will provide more information about the line numbers where the problem occurred, etc. recommend doing that.

   Either your initial solution or right hand side has garbage in it or the wrong blas may be being linked in. But there is definitely a problem

   Barry

On May 5, 2014, at 8:28 AM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:

> Thanks for reply. What do you mean by a “happier” state? I check the converged solution (the one which call kspgmressetrestart twice), the solution should be correct.
> 
> I run with valgrind both codes (one call kspgmressetrestart once and another call kspgmressetrestart twice) 
> Both of them have the errors:                          what does this mean? Thank you in advance.
> ==7858== Conditional jump or move depends on uninitialised value(s)
> ==7858==    at 0xE71DFB: SearchPath (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0xE71640: mkl_cfg_file (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0xE6E068: DDOT (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x73281A: VecNorm_Seq (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x730BF4: VecNormalize (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x7BC5A8: KSPSolve_GMRES (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0xB8A06E: KSPSolve (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x7B659F: kspsolve_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x5EAE84: petsolv_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x4ECD46: flowsol_ng_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x507E4E: iterprc_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x51D1B4: solnalg_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858== 
> ==7858== Conditional jump or move depends on uninitialised value(s)
> ==7858==    at 0xE71E25: SearchPath (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0xE71640: mkl_cfg_file (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0xE6E068: DDOT (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x73281A: VecNorm_Seq (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x730BF4: VecNormalize (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x7BC5A8: KSPSolve_GMRES (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0xB8A06E: KSPSolve (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x7B659F: kspsolve_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x5EAE84: petsolv_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x4ECD46: flowsol_ng_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x507E4E: iterprc_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==    by 0x51D1B4: solnalg_ (in /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
> ==7858==
> 
> 
> On Fri, May 2, 2014 at 7:25 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> On May 2, 2014, at 5:29 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> 
> > Thanks for your quick reply.  What confused me is that why would the code works fine if I reset the gmres restart number by recalling kspgmressetrestart just before kspsolve?
> 
>    It isn’t really working. Something is going wrong (run with valgrind) and setting that restart number and starting the solver just puts it in a “happier” state so it seems to make more progress.
> 
>    Barry
> 
> >
> > Sent from my iPhone
> >
> >> On May 2, 2014, at 6:03 PM, "Barry Smith" <bsmith at mcs.anl.gov> wrote:
> >>
> >>
> >> Your shell matrix is buggy in some way. Whenever the residual norm jumps like crazy at a restart it means that something is wrong with the operator.
> >>
> >>  Barry
> >>
> >>> On May 2, 2014, at 4:41 PM, Song Gao <song.gao2 at mail.mcgill.ca> wrote:
> >>>
> >>> Dear PETSc users,
> >>>
> >>> I'm solving a linear system in KSP and trying to setup the solver in codes. But I feel strange because my codes don't converge unless I call KSPGMRESSetRestart twice.
> >>>
> >>> My codes looks like
> >>>
> >>> call KSPSetOperators ( pet_solv, pet_mat_mf_shell, pet_matp, DIFFERENT_NONZERO_PATTERN, ierpetsc )
> >>> call KSPSetType ( pet_solv, 'gmres', ierpetsc )
> >>> call KSPGMRESSetRestart ( pet_solv, 30, ierpetsc )
> >>> call KSPGetPC ( pet_solv, pet_precon, ierpetsc )
> >>> call PCSetType ( pet_precon, 'asm', ierpetsc )
> >>> call PCASMSetOverlap ( pet_precon, 1, ierpetsc )
> >>> call KSPSetUp ( pet_solv, ierpetsc )
> >>> call PCASMGetSubKSP ( pet_precon, n_local, first_local, pet_solv_sub, ierpetsc )  ! n_local is one
> >>> call KSPGetPC ( pet_solv_sub(1), pet_precon_sub, ierpetsc )
> >>> call PCSetType ( pet_precon_sub, 'jacobi', ierpetsc )
> >>> call PCJacobiSetUseRowMax ( pet_precon_sub, ierpetsc )
> >>> call KSPSetFromOptions ( pet_solv, ierpetsc )
> >>> call KSPGMRESSetRestart ( pet_solv, 29, ierpetsc )           ! adding this line, the codes converge
> >>> call KSPSolve ( pet_solv, pet_rhsp, pet_solup, ierpetsc )
> >>>
> >>> runing with 1 CPU  WITHOUT the line with red color and the codes don't converge
> >>>
> >>> runtime options:   -ksp_monitor_true_residual -ksp_view
> >>> 0 KSP preconditioned resid norm 6.585278940829e+00 true resid norm 9.619278462343e-03 ||r(i)||/||b|| 1.000000000000e+00
> >>> 1 KSP preconditioned resid norm 6.585278219510e+00 true resid norm 9.619278462343e-03 ||r(i)||/||b|| 1.000000000000e+00
> >>> 2 KSP preconditioned resid norm 2.198638170622e+00 true resid norm 1.365132713014e-01 ||r(i)||/||b|| 1.419163317039e+01
> >>> 3 KSP preconditioned resid norm 1.599896387215e+00 true resid norm 1.445988845022e-01 ||r(i)||/||b|| 1.503219654865e+01
> >>> .......
> >>> 28 KSP preconditioned resid norm 4.478466011191e-01 true resid norm 1.529879309381e-01 ||r(i)||/||b|| 1.590430420920e+01
> >>> 29 KSP preconditioned resid norm 4.398129572260e-01 true resid norm 1.530132924055e-01 ||r(i)||/||b|| 1.590694073413e+01
> >>> 30 KSP preconditioned resid norm 2.783227613716e+12 true resid norm 1.530369123550e-01 ||r(i)||/||b|| 1.590939621450e+01
> >>>
> >>> KSP Object: 1 MPI processes
> >>> type: gmres
> >>>   GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> >>>   GMRES: happy breakdown tolerance 1e-30
> >>> maximum iterations=10000, initial guess is zero
> >>> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >>> left preconditioning
> >>> using PRECONDITIONED norm type for convergence test
> >>> PC Object: 1 MPI processes
> >>> type: asm
> >>>   Additive Schwarz: total subdomain blocks = 1, amount of overlap = 1
> >>>   Additive Schwarz: restriction/interpolation type - RESTRICT
> >>>   [0] number of local blocks = 1
> >>>   Local solve info for each block is in the following KSP and PC objects:
> >>>   - - - - - - - - - - - - - - - - - -
> >>>   [0] local block number 0, size = 22905
> >>>   KSP Object:    (sub_)     1 MPI processes
> >>>     type: preonly
> >>>     maximum iterations=10000, initial guess is zero
> >>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >>>     left preconditioning
> >>>     using NONE norm type for convergence test
> >>>   PC Object:    (sub_)     1 MPI processes
> >>>     type: jacobi
> >>>     linear system matrix = precond matrix:
> >>>     Matrix Object:       1 MPI processes
> >>>       type: seqbaij
> >>>       rows=22905, cols=22905, bs=5
> >>>       total: nonzeros=785525, allocated nonzeros=785525
> >>>       total number of mallocs used during MatSetValues calls =0
> >>>           block size is 5
> >>>   - - - - - - - - - - - - - - - - - -
> >>> linear system matrix followed by preconditioner matrix:
> >>> Matrix Object:   1 MPI processes
> >>>   type: shell
> >>>   rows=22905, cols=22905
> >>> Matrix Object:   1 MPI processes
> >>>   type: seqbaij
> >>>   rows=22905, cols=22905, bs=5
> >>>   total: nonzeros=785525, allocated nonzeros=785525
> >>>   total number of mallocs used during MatSetValues calls =0
> >>>       block size is 5
> >>>    WARNING: zero iteration in iterative solver
> >>>
> >>> runing with 1 CPU  WITH  the line with red color and the codes converge
> >>>
> >>> runtime options:   -ksp_monitor_true_residual -ksp_view
> >>> 0 KSP preconditioned resid norm 6.585278940829e+00 true resid norm 9.619278462343e-03 ||r(i)||/||b|| 1.000000000000e+00
> >>> 1 KSP preconditioned resid norm 2.566248171026e+00 true resid norm 4.841043870812e-03 ||r(i)||/||b|| 5.032647604250e-01
> >>> 2 KSP preconditioned resid norm 1.410418402651e+00 true resid norm 3.347509391208e-03 ||r(i)||/||b|| 3.480000505561e-01
> >>> 3 KSP preconditioned resid norm 9.665409287757e-01 true resid norm 2.289877121679e-03 ||r(i)||/||b|| 2.380508195748e-01
> >>> 4 KSP preconditioned resid norm 4.469486152454e-01 true resid norm 1.283813398084e-03 ||r(i)||/||b|| 1.334625463968e-01
> >>> 5 KSP preconditioned resid norm 2.474889829653e-01 true resid norm 7.956009139680e-04 ||r(i)||/||b|| 8.270900120862e-02
> >>> ............
> >>> 24 KSP preconditioned resid norm 9.518780877620e-05 true resid norm 6.273993696172e-07 ||r(i)||/||b|| 6.522312167937e-05
> >>> 25 KSP preconditioned resid norm 6.837876679998e-05 true resid norm 4.612861071815e-07 ||r(i)||/||b|| 4.795433555514e-05
> >>> 26 KSP preconditioned resid norm 4.864361942316e-05 true resid norm 3.394754589076e-07 ||r(i)||/||b|| 3.529115621682e-05
> >>> KSP Object: 1 MPI processes
> >>> type: gmres
> >>>   GMRES: restart=29, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
> >>>   GMRES: happy breakdown tolerance 1e-30
> >>> maximum iterations=10000, initial guess is zero
> >>> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >>> left preconditioning
> >>> using PRECONDITIONED norm type for convergence test
> >>> PC Object: 1 MPI processes
> >>> type: asm
> >>>   Additive Schwarz: total subdomain blocks = 1, amount of overlap = 1
> >>>   Additive Schwarz: restriction/interpolation type - RESTRICT
> >>>   [0] number of local blocks = 1
> >>>   Local solve info for each block is in the following KSP and PC objects:
> >>>   - - - - - - - - - - - - - - - - - -
> >>>   [0] local block number 0, size = 22905
> >>>   KSP Object:    (sub_)     1 MPI processes
> >>>     type: preonly
> >>>     maximum iterations=10000, initial guess is zero
> >>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> >>>     left preconditioning
> >>>     using NONE norm type for convergence test
> >>>   PC Object:    (sub_)     1 MPI processes
> >>>     type: jacobi
> >>>     linear system matrix = precond matrix:
> >>>     Matrix Object:       1 MPI processes
> >>>       type: seqbaij
> >>>       rows=22905, cols=22905, bs=5
> >>>       total: nonzeros=785525, allocated nonzeros=785525
> >>>       total number of mallocs used during MatSetValues calls =0
> >>>           block size is 5
> >>>   - - - - - - - - - - - - - - - - - -
> >>> linear system matrix followed by preconditioner matrix:
> >>> Matrix Object:   1 MPI processes
> >>>   type: shell
> >>>   rows=22905, cols=22905
> >>> Matrix Object:   1 MPI processes
> >>>   type: seqbaij
> >>>   rows=22905, cols=22905, bs=5
> >>>   total: nonzeros=785525, allocated nonzeros=785525
> >>>   total number of mallocs used during MatSetValues calls =0
> >>>       block size is 5
> >>>    WARNING: zero iteration in iterative solver
> >>>
> >>>
> >>> What would be my error here? Thank you.
> >>
> 
> 



More information about the petsc-users mailing list