[petsc-users] KSP_CONVERGED_STEP_LENGTH
Harshad Sahasrabudhe
hsahasra at purdue.edu
Wed Sep 14 09:10:36 CDT 2016
I think I found the problem. I configured PETSc with COPTFLAGS=-O3. I'll
remove that option and try again.
Thanks!
Harshad
On Wed, Sep 14, 2016 at 10:06 AM, Harshad Sahasrabudhe <hsahasra at purdue.edu>
wrote:
> Hi Barry,
>
> Thanks for your inputs. I tried to set a watchpoint on
> ((_p_KSP*)ksp)->reason, but gdb says no symbol _p_KSP in context.
> Basically, GDB isn't able to find the PETSc source code. I built PETSc with
> --with-debugging=1 statically and -fPIC, but it seems the libpetsc.a I get
> doesn't contain debugging symbols (checked using objdump -g). How do I get
> PETSc library to have debugging info?
>
> Thanks,
> Harshad
>
> On Tue, Sep 13, 2016 at 2:47 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>> > On Sep 13, 2016, at 1:01 PM, Harshad Sahasrabudhe <hsahasra at purdue.edu>
>> wrote:
>> >
>> > Hi Barry,
>> >
>> > I compiled with mpich configured using --enable-g=meminit to get rid of
>> MPI errors in Valgrind. Doing this reduced the number of errors to 2. I
>> have attached the Valgrind output.
>>
>> This isn't helpful but it seems not to be a memory corruption issue :-(
>> >
>> > I'm using GAMG+GMRES for in each linear iteration of SNES. The linear
>> solver converges with CONVERGED_RTOL for the first 6 iterations and with
>> CONVERGED_STEP_LENGTH after that. I'm still very confused about why this is
>> happening. Any thoughts/ideas?
>>
>> Does this happen on one process? If so I would run in the debugger and
>> track the variable to see everyplace the variable is changed, this would
>> point to exactly what piece of code is changing the variable to this
>> unexpected value.
>>
>> For example with lldb one can use watch http://lldb.llvm.org/tutorial.
>> html to see each time a variable gets changed. Similar thing with gdb.
>>
>> The variable to watch is ksp->reason Once you get the hang of this it
>> can take just a few minutes to track down the code that is making this
>> unexpected value, though I understand if you haven't done it before it can
>> be intimidating.
>>
>> Barry
>>
>> You can do the same thing in parallel (like on two processes) if you need
>> to but it is more cumbersome since you need run multiple debuggers. You can
>> have PETSc start up multiple debuggers with mpiexec -n 2 ./ex
>> -start_in_debugger
>>
>>
>>
>>
>> >
>> > Thanks,
>> > Harshad
>> >
>> > On Thu, Sep 8, 2016 at 11:26 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> > Install your MPI with --download-mpich as a PETSc ./configure option,
>> this will eliminate all the MPICH valgrind errors. Then send as an
>> attachment the resulting valgrind file.
>> >
>> > I do not 100 % trust any code that produces such valgrind errors.
>> >
>> > Barry
>> >
>> >
>> >
>> > > On Sep 8, 2016, at 10:12 PM, Harshad Sahasrabudhe <
>> hsahasra at purdue.edu> wrote:
>> > >
>> > > Hi Barry,
>> > >
>> > > Thanks for the reply. My code is in C. I ran with Valgrind and found
>> many "Conditional jump or move depends on uninitialized value(s)", "Invalid
>> read" and "Use of uninitialized value" errors. I think all of them are from
>> the libraries I'm using (LibMesh, Boost, MPI, etc.). I'm not really sure
>> what I'm looking for in the Valgrind output. At the end of the file, I get:
>> > >
>> > > ==40223== More than 10000000 total errors detected. I'm not
>> reporting any more.
>> > > ==40223== Final error counts will be inaccurate. Go fix your program!
>> > > ==40223== Rerun with --error-limit=no to disable this cutoff. Note
>> > > ==40223== that errors may occur in your program without prior warning
>> from
>> > > ==40223== Valgrind, because errors are no longer being displayed.
>> > >
>> > > Can you give some suggestions on how I should proceed?
>> > >
>> > > Thanks,
>> > > Harshad
>> > >
>> > > On Thu, Sep 8, 2016 at 1:59 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> > >
>> > > This is very odd. CONVERGED_STEP_LENGTH for KSP is very
>> specialized and should never occur with GMRES.
>> > >
>> > > Can you run with valgrind to make sure there is no memory
>> corruption? http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> > >
>> > > Is your code fortran or C?
>> > >
>> > > Barry
>> > >
>> > > > On Sep 8, 2016, at 10:38 AM, Harshad Sahasrabudhe <
>> hsahasra at purdue.edu> wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > I'm using GAMG + GMRES for my Poisson problem. The solver converges
>> with KSP_CONVERGED_STEP_LENGTH at a residual of 9.773346857844e-02, which
>> is much higher than what I need (I need a tolerance of at least 1E-8). I am
>> not able to figure out which tolerance I need to set to avoid convergence
>> due to CONVERGED_STEP_LENGTH.
>> > > >
>> > > > Any help is appreciated! Output of -ksp_view and -ksp_monitor:
>> > > >
>> > > > 0 KSP Residual norm 3.121347818142e+00
>> > > > 1 KSP Residual norm 9.773346857844e-02
>> > > > Linear solve converged due to CONVERGED_STEP_LENGTH iterations 1
>> > > > KSP Object: 1 MPI processes
>> > > > type: gmres
>> > > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> > > > GMRES: happy breakdown tolerance 1e-30
>> > > > maximum iterations=10000, initial guess is zero
>> > > > tolerances: relative=1e-08, absolute=1e-50, divergence=10000
>> > > > left preconditioning
>> > > > using PRECONDITIONED norm type for convergence test
>> > > > PC Object: 1 MPI processes
>> > > > type: gamg
>> > > > MG: type is MULTIPLICATIVE, levels=2 cycles=v
>> > > > Cycles per PCApply=1
>> > > > Using Galerkin computed coarse grid matrices
>> > > > Coarse grid solver -- level -------------------------------
>> > > > KSP Object: (mg_coarse_) 1 MPI processes
>> > > > type: preonly
>> > > > maximum iterations=1, initial guess is zero
>> > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000
>> > > > left preconditioning
>> > > > using NONE norm type for convergence test
>> > > > PC Object: (mg_coarse_) 1 MPI processes
>> > > > type: bjacobi
>> > > > block Jacobi: number of blocks = 1
>> > > > Local solve is same for all blocks, in the following KSP
>> and PC objects:
>> > > > KSP Object: (mg_coarse_sub_) 1 MPI processes
>> > > > type: preonly
>> > > > maximum iterations=1, initial guess is zero
>> > > > tolerances: relative=1e-05, absolute=1e-50,
>> divergence=10000
>> > > > left preconditioning
>> > > > using NONE norm type for convergence test
>> > > > PC Object: (mg_coarse_sub_) 1 MPI processes
>> > > > type: lu
>> > > > LU: out-of-place factorization
>> > > > tolerance for zero pivot 2.22045e-14
>> > > > using diagonal shift on blocks to prevent zero pivot
>> [INBLOCKS]
>> > > > matrix ordering: nd
>> > > > factor fill ratio given 5, needed 1.91048
>> > > > Factored matrix follows:
>> > > > Mat Object: 1 MPI processes
>> > > > type: seqaij
>> > > > rows=284, cols=284
>> > > > package used to perform factorization: petsc
>> > > > total: nonzeros=7726, allocated nonzeros=7726
>> > > > total number of mallocs used during MatSetValues
>> calls =0
>> > > > using I-node routines: found 133 nodes, limit
>> used is 5
>> > > > linear system matrix = precond matrix:
>> > > > Mat Object: 1 MPI processes
>> > > > type: seqaij
>> > > > rows=284, cols=284
>> > > > total: nonzeros=4044, allocated nonzeros=4044
>> > > > total number of mallocs used during MatSetValues calls
>> =0
>> > > > not using I-node routines
>> > > > linear system matrix = precond matrix:
>> > > > Mat Object: 1 MPI processes
>> > > > type: seqaij
>> > > > rows=284, cols=284
>> > > > total: nonzeros=4044, allocated nonzeros=4044
>> > > > total number of mallocs used during MatSetValues calls =0
>> > > > not using I-node routines
>> > > > Down solver (pre-smoother) on level 1
>> -------------------------------
>> > > > KSP Object: (mg_levels_1_) 1 MPI processes
>> > > > type: chebyshev
>> > > > Chebyshev: eigenvalue estimates: min = 0.195339, max =
>> 4.10212
>> > > > maximum iterations=2
>> > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000
>> > > > left preconditioning
>> > > > using nonzero initial guess
>> > > > using NONE norm type for convergence test
>> > > > PC Object: (mg_levels_1_) 1 MPI processes
>> > > > type: sor
>> > > > SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1
>> > > > linear system matrix = precond matrix:
>> > > > Mat Object: () 1 MPI processes
>> > > > type: seqaij
>> > > > rows=9036, cols=9036
>> > > > total: nonzeros=192256, allocated nonzeros=192256
>> > > > total number of mallocs used during MatSetValues calls =0
>> > > > not using I-node routines
>> > > > Up solver (post-smoother) same as down solver (pre-smoother)
>> > > > linear system matrix = precond matrix:
>> > > > Mat Object: () 1 MPI processes
>> > > > type: seqaij
>> > > > rows=9036, cols=9036
>> > > > total: nonzeros=192256, allocated nonzeros=192256
>> > > > total number of mallocs used during MatSetValues calls =0
>> > > > not using I-node routines
>> > > >
>> > > > Thanks,
>> > > > Harshad
>> > >
>> > >
>> >
>> >
>> > <valgrind.log.33199>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160914/00eea091/attachment-0001.html>
More information about the petsc-users
mailing list