[petsc-users] KSP_CONVERGED_STEP_LENGTH
Harshad Sahasrabudhe
hsahasra at purdue.edu
Wed Sep 14 12:39:53 CDT 2016
Hi Barry,
I put a watchpoint on *((KSP_CONVERGED_REASON*) &( ((_p_KSP*)ksp)->reason ))
in gdb. The ksp->reason switched between:
Old value = KSP_CONVERGED_ITERATING
New value = KSP_CONVERGED_RTOL
0x00002b143054bef2 in KSPConvergedDefault (ksp=0x23c3090, n=12,
rnorm=5.3617149831259514e-08, reason=0x23c3310, ctx=0x2446210)
at
/depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/ksp/ksp/interface/iterativ.c:764
764 *reason = KSP_CONVERGED_RTOL;
*and*
Old value = KSP_CONVERGED_RTOL
New value = KSP_CONVERGED_ITERATING
KSPSetUp (ksp=0x23c3090) at
/depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/ksp/ksp/interface/itfunc.c:226
226 if (!((PetscObject)ksp)->type_name) {
However, after iteration 6, it changed to KSP_CONVERGED_STEP_LENGTH
Old value = KSP_CONVERGED_ITERATING
New value = KSP_CONVERGED_STEP_LENGTH
SNES_TR_KSPConverged_Private (ksp=0x23c3090, n=1,
rnorm=0.097733468578376406, reason=0x23c3310, cctx=0x1d8f3e0)
at
/depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/impls/tr/tr.c:36
36 PetscFunctionReturn(0);
Any ideas why that function was executed? Backtrace when the program
stopped here:
#0 SNES_TR_KSPConverged_Private (ksp=0x23c3090, n=1,
rnorm=0.097733468578376406, reason=0x23c3310, cctx=0x1d8f3e0)
at
/depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/impls/tr/tr.c:36
#1 0x00002b14305d3fda in KSPGMRESCycle (itcount=0x7ffdcf2d4ffc,
ksp=0x23c3090)
at
/depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/ksp/ksp/impls/gmres/gmres.c:182
#2 0x00002b14305d4711 in KSPSolve_GMRES (ksp=0x23c3090) at
/depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/ksp/ksp/impls/gmres/gmres.c:235
#3 0x00002b1430526a8a in KSPSolve (ksp=0x23c3090, b=0x1a916c0, x=0x1d89dc0)
at
/depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/ksp/ksp/interface/itfunc.c:460
#4 0x00002b1430bb3905 in SNESSolve_NEWTONTR (snes=0x1ea2490) at
/depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/impls/tr/tr.c:160
#5 0x00002b1430655b57 in SNESSolve (snes=0x1ea2490, b=0x0, x=0x1a27420)
at
/depot/ncn/apps/conte/conte-gcc-petsc35-dbg/libs/petsc/build-real/src/snes/interface/snes.c:3743
#6 0x00002b142f606780 in libMesh::PetscNonlinearSolver<double>::solve
(this=0x1a1d8c0, jac_in=..., x_in=..., r_in=...) at
src/solvers/petsc_nonlinear_solver.C:714
#7 0x00002b142f67561d in libMesh::NonlinearImplicitSystem::solve
(this=0x1a14fe0) at src/systems/nonlinear_implicit_system.C:183
#8 0x00002b1429548ceb in NonlinearPoisson::execute_solver (this=0x1110500)
at NonlinearPoisson.cpp:1191
#9 0x00002b142952233c in NonlinearPoisson::do_solve (this=0x1110500) at
NonlinearPoisson.cpp:948
#10 0x00002b1429b9e785 in Simulation::solve (this=0x1110500) at
Simulation.cpp:781
#11 0x00002b1429ac326e in Nemo::run_simulations (this=0x63b020) at
Nemo.cpp:1313
#12 0x0000000000426d0d in main (argc=6, argv=0x7ffdcf2d7908) at main.cpp:447
Thanks!
Harshad
On Wed, Sep 14, 2016 at 10:10 AM, Harshad Sahasrabudhe <hsahasra at purdue.edu>
wrote:
> I think I found the problem. I configured PETSc with COPTFLAGS=-O3. I'll
> remove that option and try again.
>
> Thanks!
> Harshad
>
> On Wed, Sep 14, 2016 at 10:06 AM, Harshad Sahasrabudhe <
> hsahasra at purdue.edu> wrote:
>
>> Hi Barry,
>>
>> Thanks for your inputs. I tried to set a watchpoint on
>> ((_p_KSP*)ksp)->reason, but gdb says no symbol _p_KSP in context.
>> Basically, GDB isn't able to find the PETSc source code. I built PETSc with
>> --with-debugging=1 statically and -fPIC, but it seems the libpetsc.a I get
>> doesn't contain debugging symbols (checked using objdump -g). How do I get
>> PETSc library to have debugging info?
>>
>> Thanks,
>> Harshad
>>
>> On Tue, Sep 13, 2016 at 2:47 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>> > On Sep 13, 2016, at 1:01 PM, Harshad Sahasrabudhe <hsahasra at purdue.edu>
>>> wrote:
>>> >
>>> > Hi Barry,
>>> >
>>> > I compiled with mpich configured using --enable-g=meminit to get rid
>>> of MPI errors in Valgrind. Doing this reduced the number of errors to 2. I
>>> have attached the Valgrind output.
>>>
>>> This isn't helpful but it seems not to be a memory corruption issue
>>> :-(
>>> >
>>> > I'm using GAMG+GMRES for in each linear iteration of SNES. The linear
>>> solver converges with CONVERGED_RTOL for the first 6 iterations and with
>>> CONVERGED_STEP_LENGTH after that. I'm still very confused about why this is
>>> happening. Any thoughts/ideas?
>>>
>>> Does this happen on one process? If so I would run in the debugger
>>> and track the variable to see everyplace the variable is changed, this
>>> would point to exactly what piece of code is changing the variable to this
>>> unexpected value.
>>>
>>> For example with lldb one can use watch
>>> http://lldb.llvm.org/tutorial.html to see each time a variable gets
>>> changed. Similar thing with gdb.
>>>
>>> The variable to watch is ksp->reason Once you get the hang of this
>>> it can take just a few minutes to track down the code that is making this
>>> unexpected value, though I understand if you haven't done it before it can
>>> be intimidating.
>>>
>>> Barry
>>>
>>> You can do the same thing in parallel (like on two processes) if you
>>> need to but it is more cumbersome since you need run multiple debuggers.
>>> You can have PETSc start up multiple debuggers with mpiexec -n 2 ./ex
>>> -start_in_debugger
>>>
>>>
>>>
>>>
>>> >
>>> > Thanks,
>>> > Harshad
>>> >
>>> > On Thu, Sep 8, 2016 at 11:26 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> > Install your MPI with --download-mpich as a PETSc ./configure
>>> option, this will eliminate all the MPICH valgrind errors. Then send as an
>>> attachment the resulting valgrind file.
>>> >
>>> > I do not 100 % trust any code that produces such valgrind errors.
>>> >
>>> > Barry
>>> >
>>> >
>>> >
>>> > > On Sep 8, 2016, at 10:12 PM, Harshad Sahasrabudhe <
>>> hsahasra at purdue.edu> wrote:
>>> > >
>>> > > Hi Barry,
>>> > >
>>> > > Thanks for the reply. My code is in C. I ran with Valgrind and found
>>> many "Conditional jump or move depends on uninitialized value(s)", "Invalid
>>> read" and "Use of uninitialized value" errors. I think all of them are from
>>> the libraries I'm using (LibMesh, Boost, MPI, etc.). I'm not really sure
>>> what I'm looking for in the Valgrind output. At the end of the file, I get:
>>> > >
>>> > > ==40223== More than 10000000 total errors detected. I'm not
>>> reporting any more.
>>> > > ==40223== Final error counts will be inaccurate. Go fix your
>>> program!
>>> > > ==40223== Rerun with --error-limit=no to disable this cutoff. Note
>>> > > ==40223== that errors may occur in your program without prior
>>> warning from
>>> > > ==40223== Valgrind, because errors are no longer being displayed.
>>> > >
>>> > > Can you give some suggestions on how I should proceed?
>>> > >
>>> > > Thanks,
>>> > > Harshad
>>> > >
>>> > > On Thu, Sep 8, 2016 at 1:59 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> > >
>>> > > This is very odd. CONVERGED_STEP_LENGTH for KSP is very
>>> specialized and should never occur with GMRES.
>>> > >
>>> > > Can you run with valgrind to make sure there is no memory
>>> corruption? http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> > >
>>> > > Is your code fortran or C?
>>> > >
>>> > > Barry
>>> > >
>>> > > > On Sep 8, 2016, at 10:38 AM, Harshad Sahasrabudhe <
>>> hsahasra at purdue.edu> wrote:
>>> > > >
>>> > > > Hi,
>>> > > >
>>> > > > I'm using GAMG + GMRES for my Poisson problem. The solver
>>> converges with KSP_CONVERGED_STEP_LENGTH at a residual of
>>> 9.773346857844e-02, which is much higher than what I need (I need a
>>> tolerance of at least 1E-8). I am not able to figure out which tolerance I
>>> need to set to avoid convergence due to CONVERGED_STEP_LENGTH.
>>> > > >
>>> > > > Any help is appreciated! Output of -ksp_view and -ksp_monitor:
>>> > > >
>>> > > > 0 KSP Residual norm 3.121347818142e+00
>>> > > > 1 KSP Residual norm 9.773346857844e-02
>>> > > > Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>>> > > > KSP Object: 1 MPI processes
>>> > > > type: gmres
>>> > > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>>> Orthogonalization with no iterative refinement
>>> > > > GMRES: happy breakdown tolerance 1e-30
>>> > > > maximum iterations=10000, initial guess is zero
>>> > > > tolerances: relative=1e-08, absolute=1e-50, divergence=10000
>>> > > > left preconditioning
>>> > > > using PRECONDITIONED norm type for convergence test
>>> > > > PC Object: 1 MPI processes
>>> > > > type: gamg
>>> > > > MG: type is MULTIPLICATIVE, levels=2 cycles=v
>>> > > > Cycles per PCApply=1
>>> > > > Using Galerkin computed coarse grid matrices
>>> > > > Coarse grid solver -- level -------------------------------
>>> > > > KSP Object: (mg_coarse_) 1 MPI processes
>>> > > > type: preonly
>>> > > > maximum iterations=1, initial guess is zero
>>> > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000
>>> > > > left preconditioning
>>> > > > using NONE norm type for convergence test
>>> > > > PC Object: (mg_coarse_) 1 MPI processes
>>> > > > type: bjacobi
>>> > > > block Jacobi: number of blocks = 1
>>> > > > Local solve is same for all blocks, in the following KSP
>>> and PC objects:
>>> > > > KSP Object: (mg_coarse_sub_) 1 MPI processes
>>> > > > type: preonly
>>> > > > maximum iterations=1, initial guess is zero
>>> > > > tolerances: relative=1e-05, absolute=1e-50,
>>> divergence=10000
>>> > > > left preconditioning
>>> > > > using NONE norm type for convergence test
>>> > > > PC Object: (mg_coarse_sub_) 1 MPI processes
>>> > > > type: lu
>>> > > > LU: out-of-place factorization
>>> > > > tolerance for zero pivot 2.22045e-14
>>> > > > using diagonal shift on blocks to prevent zero pivot
>>> [INBLOCKS]
>>> > > > matrix ordering: nd
>>> > > > factor fill ratio given 5, needed 1.91048
>>> > > > Factored matrix follows:
>>> > > > Mat Object: 1 MPI processes
>>> > > > type: seqaij
>>> > > > rows=284, cols=284
>>> > > > package used to perform factorization: petsc
>>> > > > total: nonzeros=7726, allocated nonzeros=7726
>>> > > > total number of mallocs used during MatSetValues
>>> calls =0
>>> > > > using I-node routines: found 133 nodes, limit
>>> used is 5
>>> > > > linear system matrix = precond matrix:
>>> > > > Mat Object: 1 MPI processes
>>> > > > type: seqaij
>>> > > > rows=284, cols=284
>>> > > > total: nonzeros=4044, allocated nonzeros=4044
>>> > > > total number of mallocs used during MatSetValues calls
>>> =0
>>> > > > not using I-node routines
>>> > > > linear system matrix = precond matrix:
>>> > > > Mat Object: 1 MPI processes
>>> > > > type: seqaij
>>> > > > rows=284, cols=284
>>> > > > total: nonzeros=4044, allocated nonzeros=4044
>>> > > > total number of mallocs used during MatSetValues calls =0
>>> > > > not using I-node routines
>>> > > > Down solver (pre-smoother) on level 1
>>> -------------------------------
>>> > > > KSP Object: (mg_levels_1_) 1 MPI processes
>>> > > > type: chebyshev
>>> > > > Chebyshev: eigenvalue estimates: min = 0.195339, max =
>>> 4.10212
>>> > > > maximum iterations=2
>>> > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000
>>> > > > left preconditioning
>>> > > > using nonzero initial guess
>>> > > > using NONE norm type for convergence test
>>> > > > PC Object: (mg_levels_1_) 1 MPI processes
>>> > > > type: sor
>>> > > > SOR: type = local_symmetric, iterations = 1, local
>>> iterations = 1, omega = 1
>>> > > > linear system matrix = precond matrix:
>>> > > > Mat Object: () 1 MPI processes
>>> > > > type: seqaij
>>> > > > rows=9036, cols=9036
>>> > > > total: nonzeros=192256, allocated nonzeros=192256
>>> > > > total number of mallocs used during MatSetValues calls =0
>>> > > > not using I-node routines
>>> > > > Up solver (post-smoother) same as down solver (pre-smoother)
>>> > > > linear system matrix = precond matrix:
>>> > > > Mat Object: () 1 MPI processes
>>> > > > type: seqaij
>>> > > > rows=9036, cols=9036
>>> > > > total: nonzeros=192256, allocated nonzeros=192256
>>> > > > total number of mallocs used during MatSetValues calls =0
>>> > > > not using I-node routines
>>> > > >
>>> > > > Thanks,
>>> > > > Harshad
>>> > >
>>> > >
>>> >
>>> >
>>> > <valgrind.log.33199>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160914/f7df08a4/attachment.html>
More information about the petsc-users
mailing list