[petsc-dev] Iteration counter depends on KSP monitor!

Fri Nov 9 13:20:23 CST 2012

Thanks you guys for clarifying this!

Thomas

Am 08.11.2012 21:21, schrieb Barry Smith:
> On Nov 8, 2012, at 2:48 AM, Thomas Witkowski <thomas.witkowski at tu-dresden.de> wrote:
>
>> Maybe the following information is also useful to find the problem or/and to understand what's going on: The inner solver solves with a Laplace matrix (discretized with free boundaries). So I set the constant nullspace to the KSP object. This inner solver is used inside the Schur complement solver of a PCFIELDSPLIT object. The Schur complement solver (KSPPREONLY and PCSHELL) is also set to have a constant null space. But the vector, on which the PCSHELL is applied, is not orthogonal to the constant null space. Can you help me to understand why this is still the case?
>>
>> To go back to the problem of the influence of the KSP monitor to the solution process: When I project out the constant null space before calling the KSPRICHARDSON with PCHYPRE, the monitor has no influence anymore on the solution process.
>      Jed figured this out. Our KSP_PCApply() projects out the given null space, when hypre is given the job of applying the preconditioner it doesn't know about the null space and hence does not remove it; hence it is like you didn't provide a null space.
>
>       The fix is that when a null space has been attached to the KSP we should always do the standard richardson and not call the "short cut" routine.
>
>      So now if you are using petsc-dev you should see no difference between monitoring or not
>
>     Barry
>
>
>> Thomas
>>
>>
>> Am 08.11.2012 08:27, schrieb Thomas Witkowski:
>>> I run the code in the debugger, and you are right with your conjecture. For the case without monitoring the richardson KSP, PCApply_HYPRE is called once per outer iteration with the following call stack:
>>>
>>> KSPSolve (itfunc.c:446)
>>> KSPSolve_Richardson (rich.c:69)
>>> PCApplyRichardson (precon.c:780)
>>> PCApplyRichardson_HYPRE_BoomerAMG (hypre.c:597)
>>> PCApply_HYPRE (hypre.c:151)
>>>
>>>
>>> When the monitor is enabled, PCApply_HYPRE is called twice per outer iteration from two different places in KSPSolve_Richardson:
>>>
>>> KSPSolve (itfunc.c:446)
>>> KSPSolve_Richardson (rich.c:124)
>>> PCApply (precon.c:384)
>>> PCApply_HYPRE (hypre.c:151)
>>>
>>> and
>>>
>>> KSPSolve (itfunc.c:446)
>>> KSPSolve_Richardson (rich.c:147)
>>> PCApply (precon.c:384)
>>> PCApply_HYPRE (hypre.c:151)
>>>
>>>
>>>
>>> By the way, regardless of the options I provide to the inner solver which we are talking about, I cannot reproduce the small outer iteration number I see when using the KSP monitor. Can this be due to the tolerances that are set in PCApplyRichardson_HYPRE_BoomerAMG ? What's the reason you set the options before calling PCApply_HYPRE, and than the same options are reseted before leaving this function?
>>>
>>> Thomas
>>>
>>> Am 07.11.2012 22:28, schrieb Barry Smith:
>>>>    Thomas,
>>>>
>>>>       I don't have a complete explanation why in this case it changes but I can point you in the right direction of how this happens. You may need to put breakpoints in the debugger to see exactly what goes different with and without that option.
>>>>
>>>>       1) When richardson is used and no monitoring is done then PCApplyRichardson_HYPRE_BoomerAMG() is called to apply the boomerAMG v-cycle.  Note that it changes the tolerance and its before calling PCApply_HYPRE()
>>>>
>>>>       2) When monitoring is turned on we need to compute the residual norm at each iteration so PCApply_HYPRE() is instead called directly by KSPSolve_Richardson() once for each iteration.
>>>>
>>>>      Now since you are trying to use just one smoothing step inside richardson the two approaches (I think) should be identical. Somehow when KSPSolve_Richardson() is used instead of PCApplyRichardson() more inner iterations (iterations on the monitored thing) must be happening, thus leading to a stronger preconditioner and hence less iterations on the entire thing.
>>>>
>>>>      You can run (for example on one process but two is ok also) both cases with -start_in_debugger and put a breakpoint in PCApply_HYPRE() and then when it gets to that function do where to see how it is being called. Continue repeatedly to see why the one case triggers more (of these inner calls) then the other case.
>>>>
>>>>      Barry
>>>>
>>>>
>>>>    Depending on the outcome (reason for the difference) I might call this issue a bug or a strange feature. I am leaning toward bug.
>>>>
>>>>
>>>> On Nov 7, 2012, at 12:55 PM, Thomas Witkowski <thomas.witkowski at tu-dresden.de> wrote:
>>>>
>>>>> Okay, the outer KSP is as follows:
>>>>>
>>>>> KSP Object:(ns_) 2 MPI processes
>>>>>   type: fgmres
>>>>>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>>>>     GMRES: happy breakdown tolerance 1e-30
>>>>>   maximum iterations=100, initial guess is zero
>>>>>   tolerances:  relative=1e-06, absolute=1e-08, divergence=10000
>>>>>   right preconditioning
>>>>>   has attached null space
>>>>>   using UNPRECONDITIONED norm type for convergence test
>>>>> PC Object:(ns_) 2 MPI processes
>>>>>   type: fieldsplit
>>>>>     FieldSplit with Schur preconditioner, factorization FULL
>>>>>     Preconditioner for the Schur complement formed from the block diagonal part of A11
>>>>>     Split info:
>>>>>     Split number 0 Defined by IS
>>>>>     Split number 1 Defined by IS
>>>>>     KSP solver for A00 block
>>>>>       KSP Object:      (velocity_)       2 MPI processes
>>>>>         type: richardson
>>>>>           Richardson: damping factor=1
>>>>>         maximum iterations=1, initial guess is zero
>>>>>         tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>>         left preconditioning
>>>>>         using PRECONDITIONED norm type for convergence test
>>>>>       PC Object:      (velocity_)       2 MPI processes
>>>>>         type: hypre
>>>>>           HYPRE BoomerAMG preconditioning
>>>>>           HYPRE BoomerAMG: Cycle type V
>>>>>           HYPRE BoomerAMG: Maximum number of levels 25
>>>>>           HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>>           HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>>           HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>>           HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>>           HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>>           HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>>           HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>>           HYPRE BoomerAMG: Maximum row sums 0.9
>>>>>           HYPRE BoomerAMG: Sweeps down         1
>>>>>           HYPRE BoomerAMG: Sweeps up           1
>>>>>           HYPRE BoomerAMG: Sweeps on coarse    1
>>>>>           HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>>>>           HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>>>>           HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>>>>           HYPRE BoomerAMG: Relax weight  (all)      1
>>>>>           HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>>           HYPRE BoomerAMG: Using CF-relaxation
>>>>>           HYPRE BoomerAMG: Measure type        local
>>>>>           HYPRE BoomerAMG: Coarsen type        Falgout
>>>>>           HYPRE BoomerAMG: Interpolation type  classical
>>>>>         linear system matrix = precond matrix:
>>>>>         Matrix Object:         2 MPI processes
>>>>>           type: mpiaij
>>>>>           rows=2754, cols=2754
>>>>>           total: nonzeros=25026, allocated nonzeros=25026
>>>>>           total number of mallocs used during MatSetValues calls =0
>>>>>             not using I-node (on process 0) routines
>>>>>     KSP solver for S = A11 - A10 inv(A00) A01
>>>>>       KSP Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>>>>>         type: preonly
>>>>>         maximum iterations=10000, initial guess is zero
>>>>>         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>>>         left preconditioning
>>>>>         has attached null space
>>>>>         using NONE norm type for convergence test
>>>>>       PC Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>>>>>         type: shell
>>>>>           Shell: no name
>>>>>         linear system matrix followed by preconditioner matrix:
>>>>>         Matrix Object:         2 MPI processes
>>>>>           type: schurcomplement
>>>>>           rows=369, cols=369
>>>>>             Schur complement A11 - A10 inv(A00) A01
>>>>>             A11
>>>>>               Matrix Object:               2 MPI processes
>>>>>                 type: mpiaij
>>>>>                 rows=369, cols=369
>>>>>                 total: nonzeros=0, allocated nonzeros=0
>>>>>                 total number of mallocs used during MatSetValues calls =0
>>>>>                   using I-node (on process 0) routines: found 33 nodes, limit used is 5
>>>>>             A10
>>>>>               Matrix Object:               2 MPI processes
>>>>>                 type: mpiaij
>>>>>                 rows=369, cols=2754
>>>>>                 total: nonzeros=8973, allocated nonzeros=8973
>>>>>                 total number of mallocs used during MatSetValues calls =0
>>>>>                   not using I-node (on process 0) routines
>>>>>             KSP of A00
>>>>>               KSP Object: (velocity_)               2 MPI processes
>>>>>                 type: richardson
>>>>>                   Richardson: damping factor=1
>>>>>                 maximum iterations=1, initial guess is zero
>>>>>                 tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>>                 left preconditioning
>>>>>                 using PRECONDITIONED norm type for convergence test
>>>>>               PC Object: (velocity_)               2 MPI processes
>>>>>                 type: hypre
>>>>>                   HYPRE BoomerAMG preconditioning
>>>>>                   HYPRE BoomerAMG: Cycle type V
>>>>>                   HYPRE BoomerAMG: Maximum number of levels 25
>>>>>                   HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>>                   HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>>                   HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>>                   HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>>                   HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>>                   HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>>                   HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>>                   HYPRE BoomerAMG: Maximum row sums 0.9
>>>>>                   HYPRE BoomerAMG: Sweeps down         1
>>>>>                   HYPRE BoomerAMG: Sweeps up           1
>>>>>                   HYPRE BoomerAMG: Sweeps on coarse    1
>>>>>                   HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>>>>>                   HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>>>>>                   HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>>>>>                   HYPRE BoomerAMG: Relax weight  (all)      1
>>>>>                   HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>>                   HYPRE BoomerAMG: Using CF-relaxation
>>>>>                   HYPRE BoomerAMG: Measure type        local
>>>>>                   HYPRE BoomerAMG: Coarsen type        Falgout
>>>>>                   HYPRE BoomerAMG: Interpolation type classical
>>>>>                 linear system matrix = precond matrix:
>>>>>                 Matrix Object:                 2 MPI processes
>>>>>                   type: mpiaij
>>>>>                   rows=2754, cols=2754
>>>>>                   total: nonzeros=25026, allocated nonzeros=25026
>>>>>                   total number of mallocs used during MatSetValues calls =0
>>>>>                     not using I-node (on process 0) routines
>>>>>             A01
>>>>>               Matrix Object:               2 MPI processes
>>>>>                 type: mpiaij
>>>>>                 rows=2754, cols=369
>>>>>                 total: nonzeros=7883, allocated nonzeros=7883
>>>>>                 total number of mallocs used during MatSetValues calls =0
>>>>>                   not using I-node (on process 0) routines
>>>>>         Matrix Object:         2 MPI processes
>>>>>           type: mpiaij
>>>>>           rows=369, cols=369
>>>>>           total: nonzeros=0, allocated nonzeros=0
>>>>>           total number of mallocs used during MatSetValues calls =0
>>>>>             using I-node (on process 0) routines: found 33 nodes, limit used is 5
>>>>>   linear system matrix = precond matrix:
>>>>>   Matrix Object:   2 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=3123, cols=3123
>>>>>     total: nonzeros=41882, allocated nonzeros=52732
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       not using I-node (on process 0) routines
>>>>>
>>>>>
>>>>>
>>>>> Note that "ns_fieldsplit_pressure_" is a PCShell. This make again use of two KSP objects "mass_" and "laplace_"
>>>>>
>>>>>
>>>>>
>>>>> KSP Object:(mass_) 2 MPI processes
>>>>>   type: cg
>>>>>   maximum iterations=2
>>>>>   tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>>   left preconditioning
>>>>>   using nonzero initial guess
>>>>>   using PRECONDITIONED norm type for convergence test
>>>>> PC Object:(mass_) 2 MPI processes
>>>>>   type: jacobi
>>>>>   linear system matrix = precond matrix:
>>>>>   Matrix Object:   2 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=369, cols=369
>>>>>     total: nonzeros=2385, allocated nonzeros=2506
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       not using I-node (on process 0) routines
>>>>>
>>>>>
>>>>>
>>>>> AND
>>>>>
>>>>>
>>>>>
>>>>> KSP Object:(laplace_) 2 MPI processes
>>>>>   type: richardson
>>>>>     Richardson: damping factor=1
>>>>>   maximum iterations=1
>>>>>   tolerances:  relative=0, absolute=1e-14, divergence=10000
>>>>>   left preconditioning
>>>>>   has attached null space
>>>>>   using nonzero initial guess
>>>>>   using PRECONDITIONED norm type for convergence test
>>>>> PC Object:(laplace_) 2 MPI processes
>>>>>   type: hypre
>>>>>     HYPRE BoomerAMG preconditioning
>>>>>     HYPRE BoomerAMG: Cycle type V
>>>>>     HYPRE BoomerAMG: Maximum number of levels 25
>>>>>     HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>>     HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>>>>>     HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>>     HYPRE BoomerAMG: Interpolation truncation factor 0
>>>>>     HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>>     HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>>     HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>>     HYPRE BoomerAMG: Maximum row sums 0.9
>>>>>     HYPRE BoomerAMG: Sweeps down         1
>>>>>     HYPRE BoomerAMG: Sweeps up           1
>>>>>     HYPRE BoomerAMG: Sweeps on coarse    1
>>>>>     HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>>>>     HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>>>>     HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>>>>     HYPRE BoomerAMG: Relax weight  (all)      1
>>>>>     HYPRE BoomerAMG: Outer relax weight (all) 1
>>>>>     HYPRE BoomerAMG: Using CF-relaxation
>>>>>     HYPRE BoomerAMG: Measure type        local
>>>>>     HYPRE BoomerAMG: Coarsen type        Falgout
>>>>>     HYPRE BoomerAMG: Interpolation type  classical
>>>>>   linear system matrix = precond matrix:
>>>>>   Matrix Object:   2 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=369, cols=369
>>>>>     total: nonzeros=1745, allocated nonzeros=2506
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       not using I-node (on process 0) routines
>>>>>
>>>>>
>>>>> The outer iteration count is now influenced when adding "-laplace_ksp_monitor" to the command line options.
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>> Am 07.11.2012 19:49, schrieb Barry Smith:
>>>>>>      This is normally not expected but might happen under some combination of solver options. Please send the output of -ksp_view and the options you use and we'll try to understand the situation.
>>>>>>
>>>>>>     Barry
>>>>>>
>>>>>>
>>>>>> On Nov 7, 2012, at 12:12 PM, Thomas Witkowski <thomas.witkowski at tu-dresden.de> wrote:
>>>>>>
>>>>>>> I have a very curious behavior in one of my codes: Whenever I enable a KSP Monitor for an inner solver, the outer iteration count goes down from 25 to 18! Okay, this is great :) I like it so see iteration counts decreasing, but I would like to know what's going on, and eventually, a KSP monitor should not influence the whole game. An to answer your first question, I run the code through valgrind and its free of any errors. Any idea what to check next? Thanks for any advice.
>>>>>>>
>>>>>>> Thomas
>>>>>>>