[petsc-users] Question with setting up KSP solver parameters.
Åsmund Ervik
asmund.ervik at ntnu.no
Mon May 5 12:33:52 CDT 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
You should also compile your own code "fensapngnew" with debug flags,
specifically "-g" for gcc/gfortran or icc/ifort. This tells the
compiler to generate the information necessary for gdb or valgrind to
do their job. Then you would get more detailed information than just
'''
==8222== Uninitialised value was created by a stack allocation
==8222== at 0x216E97F: SearchPath (in
/home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG)
'''
E.g. when I have an error, I get a full backtrace with line numbers in
my source code, like:
'''
==5277== Uninitialised value was created by a heap allocation
==5277== at 0x4C277AB: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5277== by 0x6BE3FE: __navier_stokes_MOD_rhs_ns (navier_stokes.f90:59)
==5277== by 0x6C712A: __rhs_MOD_dfdt_1phase (rhs.f90:109)
==5277== by 0x4EF52C: __rk_MOD_forward_euler (rk.f90:2168)
==5277== by 0x642764: __rk_wrapper_MOD_rk_step (rk_wrapper.f90:313)
==5277== by 0x7FA8B8: MAIN__ (meph.F90:179)
==5277== by 0x7FC5B9: main (meph.F90:2)
==5277==
'''
On 05. mai 2014 17:00, Song Gao wrote:
> Thank you. Runing with mpirun -np 1 valgrind --track-origins=yes
> ~/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG
> -ksp_monitor_true_residual -ksp_view
>
> gives the following information.
>
> ==8222== Conditional jump or move depends on uninitialised
> value(s) ==8222== at 0x216E9A7: SearchPath (in
> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG)
> ==8222== by 0x216E1EC: mkl_cfg_file (in
> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG)
> ==8222== by 0x216AC14: DDOT (in
> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG)
> ==8222== by 0x1291100: VecNorm_Seq (bvec2.c:239) ==8222== by
> 0x126F431: VecNorm (rvector.c:166) ==8222== by 0x127039D:
> VecNormalize (rvector.c:261) ==8222== by 0x1405507:
> KSPGMRESCycle (gmres.c:127) ==8222== by 0x140695F:
> KSPSolve_GMRES (gmres.c:231) ==8222== by 0x1BEEF66: KSPSolve
> (itfunc.c:446) ==8222== by 0x13F95E8: kspsolve_ (itfuncf.c:219)
> ==8222== by 0xC5E51F: petsolv_ (PETSOLV.F:375) ==8222== by
> 0x612C35: flowsol_ng_ (flowsol_ng.F:275) ==8222== Uninitialised
> value was created by a stack allocation ==8222== at 0x216E97F:
> SearchPath (in
> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG)
> ==8222== ==8222== Conditional jump or move depends on uninitialised
> value(s) ==8222== at 0x216E9D1: SearchPath (in
> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG)
> ==8222== by 0x216E1EC: mkl_cfg_file (in
> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG)
> ==8222== by 0x216AC14: DDOT (in
> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG)
> ==8222== by 0x1291100: VecNorm_Seq (bvec2.c:239) ==8222== by
> 0x126F431: VecNorm (rvector.c:166) ==8222== by 0x127039D:
> VecNormalize (rvector.c:261) ==8222== by 0x1405507:
> KSPGMRESCycle (gmres.c:127) ==8222== by 0x140695F:
> KSPSolve_GMRES (gmres.c:231) ==8222== by 0x1BEEF66: KSPSolve
> (itfunc.c:446) ==8222== by 0x13F95E8: kspsolve_ (itfuncf.c:219)
> ==8222== by 0xC5E51F: petsolv_ (PETSOLV.F:375) ==8222== by
> 0x612C35: flowsol_ng_ (flowsol_ng.F:275) ==8222== Uninitialised
> value was created by a stack allocation ==8222== at 0x216E97F:
> SearchPath (in
> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64_DEBUG)
> ==8222==
>
>
>
> On Mon, May 5, 2014 at 10:03 AM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
>
>>
>> If you run valgrind with the debug version of the libraries it
>> will provide more information about the line numbers where the
>> problem occurred, etc. recommend doing that.
>>
>> Either your initial solution or right hand side has garbage in it
>> or the wrong blas may be being linked in. But there is definitely
>> a problem
>>
>> Barry
>>
>> On May 5, 2014, at 8:28 AM, Song Gao <song.gao2 at mail.mcgill.ca>
>> wrote:
>>
>>> Thanks for reply. What do you mean by a “happier” state? I
>>> check the
>> converged solution (the one which call kspgmressetrestart twice),
>> the solution should be correct.
>>>
>>> I run with valgrind both codes (one call kspgmressetrestart
>>> once and
>> another call kspgmressetrestart twice)
>>> Both of them have the errors: what
>>> does this
>> mean? Thank you in advance.
>>> ==7858== Conditional jump or move depends on uninitialised
>>> value(s) ==7858== at 0xE71DFB: SearchPath (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0xE71640: mkl_cfg_file (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0xE6E068: DDOT (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x73281A: VecNorm_Seq (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x730BF4: VecNormalize (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x7BC5A8: KSPSolve_GMRES (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0xB8A06E: KSPSolve (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x7B659F: kspsolve_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x5EAE84: petsolv_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x4ECD46: flowsol_ng_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x507E4E: iterprc_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x51D1B4: solnalg_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== ==7858== Conditional jump or move depends on
>>> uninitialised value(s) ==7858== at 0xE71E25: SearchPath (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0xE71640: mkl_cfg_file (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0xE6E068: DDOT (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x73281A: VecNorm_Seq (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x730BF4: VecNormalize (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x7BC5A8: KSPSolve_GMRES (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0xB8A06E: KSPSolve (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x7B659F: kspsolve_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x5EAE84: petsolv_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x4ECD46: flowsol_ng_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x507E4E: iterprc_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858== by 0x51D1B4: solnalg_ (in
>> /home/cfd/sgao/mycodes/fensapngnew/bin/fensapMPI_LINUX64)
>>> ==7858==
>>>
>>>
>>> On Fri, May 2, 2014 at 7:25 PM, Barry Smith
>>> <bsmith at mcs.anl.gov> wrote:
>>>
>>> On May 2, 2014, at 5:29 PM, Song Gao <song.gao2 at mail.mcgill.ca>
>>> wrote:
>>>
>>>> Thanks for your quick reply. What confused me is that why
>>>> would the
>> code works fine if I reset the gmres restart number by recalling
>> kspgmressetrestart just before kspsolve?
>>>
>>> It isn’t really working. Something is going wrong (run with
>>> valgrind)
>> and setting that restart number and starting the solver just puts
>> it in a “happier” state so it seems to make more progress.
>>>
>>> Barry
>>>
>>>>
>>>> Sent from my iPhone
>>>>
>>>>> On May 2, 2014, at 6:03 PM, "Barry Smith"
>>>>> <bsmith at mcs.anl.gov> wrote:
>>>>>
>>>>>
>>>>> Your shell matrix is buggy in some way. Whenever the
>>>>> residual norm
>> jumps like crazy at a restart it means that something is wrong
>> with the operator.
>>>>>
>>>>> Barry
>>>>>
>>>>>> On May 2, 2014, at 4:41 PM, Song Gao
>>>>>> <song.gao2 at mail.mcgill.ca>
>> wrote:
>>>>>>
>>>>>> Dear PETSc users,
>>>>>>
>>>>>> I'm solving a linear system in KSP and trying to setup
>>>>>> the solver in
>> codes. But I feel strange because my codes don't converge unless
>> I call KSPGMRESSetRestart twice.
>>>>>>
>>>>>> My codes looks like
>>>>>>
>>>>>> call KSPSetOperators ( pet_solv, pet_mat_mf_shell,
>>>>>> pet_matp,
>> DIFFERENT_NONZERO_PATTERN, ierpetsc )
>>>>>> call KSPSetType ( pet_solv, 'gmres', ierpetsc ) call
>>>>>> KSPGMRESSetRestart ( pet_solv, 30, ierpetsc ) call
>>>>>> KSPGetPC ( pet_solv, pet_precon, ierpetsc ) call
>>>>>> PCSetType ( pet_precon, 'asm', ierpetsc ) call
>>>>>> PCASMSetOverlap ( pet_precon, 1, ierpetsc ) call KSPSetUp
>>>>>> ( pet_solv, ierpetsc ) call PCASMGetSubKSP ( pet_precon,
>>>>>> n_local, first_local,
>> pet_solv_sub, ierpetsc ) ! n_local is one
>>>>>> call KSPGetPC ( pet_solv_sub(1), pet_precon_sub, ierpetsc
>>>>>> ) call PCSetType ( pet_precon_sub, 'jacobi', ierpetsc )
>>>>>> call PCJacobiSetUseRowMax ( pet_precon_sub, ierpetsc )
>>>>>> call KSPSetFromOptions ( pet_solv, ierpetsc ) call
>>>>>> KSPGMRESSetRestart ( pet_solv, 29, ierpetsc )
>>>>>> !
>> adding this line, the codes converge
>>>>>> call KSPSolve ( pet_solv, pet_rhsp, pet_solup, ierpetsc
>>>>>> )
>>>>>>
>>>>>> runing with 1 CPU WITHOUT the line with red color and
>>>>>> the codes
>> don't converge
>>>>>>
>>>>>> runtime options: -ksp_monitor_true_residual -ksp_view 0
>>>>>> KSP preconditioned resid norm 6.585278940829e+00 true
>>>>>> resid norm
>> 9.619278462343e-03 ||r(i)||/||b|| 1.000000000000e+00
>>>>>> 1 KSP preconditioned resid norm 6.585278219510e+00 true
>>>>>> resid norm
>> 9.619278462343e-03 ||r(i)||/||b|| 1.000000000000e+00
>>>>>> 2 KSP preconditioned resid norm 2.198638170622e+00 true
>>>>>> resid norm
>> 1.365132713014e-01 ||r(i)||/||b|| 1.419163317039e+01
>>>>>> 3 KSP preconditioned resid norm 1.599896387215e+00 true
>>>>>> resid norm
>> 1.445988845022e-01 ||r(i)||/||b|| 1.503219654865e+01
>>>>>> ....... 28 KSP preconditioned resid norm
>>>>>> 4.478466011191e-01 true resid norm
>> 1.529879309381e-01 ||r(i)||/||b|| 1.590430420920e+01
>>>>>> 29 KSP preconditioned resid norm 4.398129572260e-01 true
>>>>>> resid norm
>> 1.530132924055e-01 ||r(i)||/||b|| 1.590694073413e+01
>>>>>> 30 KSP preconditioned resid norm 2.783227613716e+12 true
>>>>>> resid norm
>> 1.530369123550e-01 ||r(i)||/||b|| 1.590939621450e+01
>>>>>>
>>>>>> KSP Object: 1 MPI processes type: gmres GMRES:
>>>>>> restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>>>>>> GMRES: happy breakdown tolerance 1e-30 maximum
>>>>>> iterations=10000, initial guess is zero tolerances:
>>>>>> relative=1e-05, absolute=1e-50, divergence=10000 left
>>>>>> preconditioning using PRECONDITIONED norm type for
>>>>>> convergence test PC Object: 1 MPI processes type: asm
>>>>>> Additive Schwarz: total subdomain blocks = 1, amount of
>>>>>> overlap = 1 Additive Schwarz: restriction/interpolation
>>>>>> type - RESTRICT [0] number of local blocks = 1 Local
>>>>>> solve info for each block is in the following KSP and PC
>> objects:
>>>>>> - - - - - - - - - - - - - - - - - - [0] local block
>>>>>> number 0, size = 22905 KSP Object: (sub_) 1 MPI
>>>>>> processes type: preonly maximum iterations=10000, initial
>>>>>> guess is zero tolerances: relative=1e-05,
>>>>>> absolute=1e-50, divergence=10000 left preconditioning
>>>>>> using NONE norm type for convergence test PC Object:
>>>>>> (sub_) 1 MPI processes type: jacobi linear system
>>>>>> matrix = precond matrix: Matrix Object: 1 MPI
>>>>>> processes type: seqbaij rows=22905, cols=22905, bs=5
>>>>>> total: nonzeros=785525, allocated nonzeros=785525 total
>>>>>> number of mallocs used during MatSetValues calls =0 block
>>>>>> size is 5 - - - - - - - - - - - - - - - - - - linear
>>>>>> system matrix followed by preconditioner matrix: Matrix
>>>>>> Object: 1 MPI processes type: shell rows=22905,
>>>>>> cols=22905 Matrix Object: 1 MPI processes type:
>>>>>> seqbaij rows=22905, cols=22905, bs=5 total:
>>>>>> nonzeros=785525, allocated nonzeros=785525 total number
>>>>>> of mallocs used during MatSetValues calls =0 block size
>>>>>> is 5 WARNING: zero iteration in iterative solver
>>>>>>
>>>>>> runing with 1 CPU WITH the line with red color and the
>>>>>> codes
>> converge
>>>>>>
>>>>>> runtime options: -ksp_monitor_true_residual -ksp_view 0
>>>>>> KSP preconditioned resid norm 6.585278940829e+00 true
>>>>>> resid norm
>> 9.619278462343e-03 ||r(i)||/||b|| 1.000000000000e+00
>>>>>> 1 KSP preconditioned resid norm 2.566248171026e+00 true
>>>>>> resid norm
>> 4.841043870812e-03 ||r(i)||/||b|| 5.032647604250e-01
>>>>>> 2 KSP preconditioned resid norm 1.410418402651e+00 true
>>>>>> resid norm
>> 3.347509391208e-03 ||r(i)||/||b|| 3.480000505561e-01
>>>>>> 3 KSP preconditioned resid norm 9.665409287757e-01 true
>>>>>> resid norm
>> 2.289877121679e-03 ||r(i)||/||b|| 2.380508195748e-01
>>>>>> 4 KSP preconditioned resid norm 4.469486152454e-01 true
>>>>>> resid norm
>> 1.283813398084e-03 ||r(i)||/||b|| 1.334625463968e-01
>>>>>> 5 KSP preconditioned resid norm 2.474889829653e-01 true
>>>>>> resid norm
>> 7.956009139680e-04 ||r(i)||/||b|| 8.270900120862e-02
>>>>>> ............ 24 KSP preconditioned resid norm
>>>>>> 9.518780877620e-05 true resid norm
>> 6.273993696172e-07 ||r(i)||/||b|| 6.522312167937e-05
>>>>>> 25 KSP preconditioned resid norm 6.837876679998e-05 true
>>>>>> resid norm
>> 4.612861071815e-07 ||r(i)||/||b|| 4.795433555514e-05
>>>>>> 26 KSP preconditioned resid norm 4.864361942316e-05 true
>>>>>> resid norm
>> 3.394754589076e-07 ||r(i)||/||b|| 3.529115621682e-05
>>>>>> KSP Object: 1 MPI processes type: gmres GMRES:
>>>>>> restart=29, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>>>>>> GMRES: happy breakdown tolerance 1e-30 maximum
>>>>>> iterations=10000, initial guess is zero tolerances:
>>>>>> relative=1e-05, absolute=1e-50, divergence=10000 left
>>>>>> preconditioning using PRECONDITIONED norm type for
>>>>>> convergence test PC Object: 1 MPI processes type: asm
>>>>>> Additive Schwarz: total subdomain blocks = 1, amount of
>>>>>> overlap = 1 Additive Schwarz: restriction/interpolation
>>>>>> type - RESTRICT [0] number of local blocks = 1 Local
>>>>>> solve info for each block is in the following KSP and PC
>> objects:
>>>>>> - - - - - - - - - - - - - - - - - - [0] local block
>>>>>> number 0, size = 22905 KSP Object: (sub_) 1 MPI
>>>>>> processes type: preonly maximum iterations=10000, initial
>>>>>> guess is zero tolerances: relative=1e-05,
>>>>>> absolute=1e-50, divergence=10000 left preconditioning
>>>>>> using NONE norm type for convergence test PC Object:
>>>>>> (sub_) 1 MPI processes type: jacobi linear system
>>>>>> matrix = precond matrix: Matrix Object: 1 MPI
>>>>>> processes type: seqbaij rows=22905, cols=22905, bs=5
>>>>>> total: nonzeros=785525, allocated nonzeros=785525 total
>>>>>> number of mallocs used during MatSetValues calls =0 block
>>>>>> size is 5 - - - - - - - - - - - - - - - - - - linear
>>>>>> system matrix followed by preconditioner matrix: Matrix
>>>>>> Object: 1 MPI processes type: shell rows=22905,
>>>>>> cols=22905 Matrix Object: 1 MPI processes type:
>>>>>> seqbaij rows=22905, cols=22905, bs=5 total:
>>>>>> nonzeros=785525, allocated nonzeros=785525 total number
>>>>>> of mallocs used during MatSetValues calls =0 block size
>>>>>> is 5 WARNING: zero iteration in iterative solver
>>>>>>
>>>>>>
>>>>>> What would be my error here? Thank you.
>>>>>
>>>
>>>
>>
>>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJTZ8t/AAoJED+FDAHgGz19xocH/i2A2Ccw3BTypkyicy6dqAQE
wgVqukXnBI//adXHSe60uQBtL4OmjMiGOSt/Egye6N2QF/29yMzNdwTmHw6DZSRC
C8yyPpVMEOPwB2WED0ui+IGSYq6JglOVplT5lCf2T99Y/gZNiqugCNz0ydnA5KnP
9W0O1yO2/2xgE4bMEibVhFIPsaXKGyTLv1ZjZLgdnbnTYFbCZqJk+9lVOOpQlqBZ
mrzE+9GjO+0+BucEwI4Ekw4b9PI/Yctl0JW7zx+ZmviRsXRF4L3aO2SeFm1fBSnh
XPIreXBNB6vyAmPFBx9TJZHQFucJIsFLHrlrea6onePKBx4Eg3JcpOlX8GdJr5w=
=MkKg
-----END PGP SIGNATURE-----
More information about the petsc-users
mailing list