[petsc-dev] Iteration counter depends on KSP monitor!

Wed Nov 7 15:28:00 CST 2012

  Thomas,

     I don't have a complete explanation why in this case it changes but I can point you in the right direction of how this happens. You may need to put breakpoints in the debugger to see exactly what goes different with and without that option.

     1) When richardson is used and no monitoring is done then PCApplyRichardson_HYPRE_BoomerAMG() is called to apply the boomerAMG v-cycle.  Note that it changes the tolerance and its before calling PCApply_HYPRE()

     2) When monitoring is turned on we need to compute the residual norm at each iteration so PCApply_HYPRE() is instead called directly by KSPSolve_Richardson() once for each iteration. 

    Now since you are trying to use just one smoothing step inside richardson the two approaches (I think) should be identical. Somehow when KSPSolve_Richardson() is used instead of PCApplyRichardson() more inner iterations (iterations on the monitored thing) must be happening, thus leading to a stronger preconditioner and hence less iterations on the entire thing.  

    You can run (for example on one process but two is ok also) both cases with -start_in_debugger and put a breakpoint in PCApply_HYPRE() and then when it gets to that function do where to see how it is being called. Continue repeatedly to see why the one case triggers more (of these inner calls) then the other case. 

    Barry

  Depending on the outcome (reason for the difference) I might call this issue a bug or a strange feature. I am leaning toward bug.

On Nov 7, 2012, at 12:55 PM, Thomas Witkowski <thomas.witkowski at tu-dresden.de> wrote:

> Okay, the outer KSP is as follows:
> 
> KSP Object:(ns_) 2 MPI processes
>  type: fgmres
>    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>    GMRES: happy breakdown tolerance 1e-30
>  maximum iterations=100, initial guess is zero
>  tolerances:  relative=1e-06, absolute=1e-08, divergence=10000
>  right preconditioning
>  has attached null space
>  using UNPRECONDITIONED norm type for convergence test
> PC Object:(ns_) 2 MPI processes
>  type: fieldsplit
>    FieldSplit with Schur preconditioner, factorization FULL
>    Preconditioner for the Schur complement formed from the block diagonal part of A11
>    Split info:
>    Split number 0 Defined by IS
>    Split number 1 Defined by IS
>    KSP solver for A00 block
>      KSP Object:      (velocity_)       2 MPI processes
>        type: richardson
>          Richardson: damping factor=1
>        maximum iterations=1, initial guess is zero
>        tolerances:  relative=0, absolute=1e-14, divergence=10000
>        left preconditioning
>        using PRECONDITIONED norm type for convergence test
>      PC Object:      (velocity_)       2 MPI processes
>        type: hypre
>          HYPRE BoomerAMG preconditioning
>          HYPRE BoomerAMG: Cycle type V
>          HYPRE BoomerAMG: Maximum number of levels 25
>          HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>          HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>          HYPRE BoomerAMG: Threshold for strong coupling 0.25
>          HYPRE BoomerAMG: Interpolation truncation factor 0
>          HYPRE BoomerAMG: Interpolation: max elements per row 0
>          HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>          HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>          HYPRE BoomerAMG: Maximum row sums 0.9
>          HYPRE BoomerAMG: Sweeps down         1
>          HYPRE BoomerAMG: Sweeps up           1
>          HYPRE BoomerAMG: Sweeps on coarse    1
>          HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>          HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>          HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>          HYPRE BoomerAMG: Relax weight  (all)      1
>          HYPRE BoomerAMG: Outer relax weight (all) 1
>          HYPRE BoomerAMG: Using CF-relaxation
>          HYPRE BoomerAMG: Measure type        local
>          HYPRE BoomerAMG: Coarsen type        Falgout
>          HYPRE BoomerAMG: Interpolation type  classical
>        linear system matrix = precond matrix:
>        Matrix Object:         2 MPI processes
>          type: mpiaij
>          rows=2754, cols=2754
>          total: nonzeros=25026, allocated nonzeros=25026
>          total number of mallocs used during MatSetValues calls =0
>            not using I-node (on process 0) routines
>    KSP solver for S = A11 - A10 inv(A00) A01
>      KSP Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>        type: preonly
>        maximum iterations=10000, initial guess is zero
>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>        left preconditioning
>        has attached null space
>        using NONE norm type for convergence test
>      PC Object:      (ns_fieldsplit_pressure_)       2 MPI processes
>        type: shell
>          Shell: no name
>        linear system matrix followed by preconditioner matrix:
>        Matrix Object:         2 MPI processes
>          type: schurcomplement
>          rows=369, cols=369
>            Schur complement A11 - A10 inv(A00) A01
>            A11
>              Matrix Object:               2 MPI processes
>                type: mpiaij
>                rows=369, cols=369
>                total: nonzeros=0, allocated nonzeros=0
>                total number of mallocs used during MatSetValues calls =0
>                  using I-node (on process 0) routines: found 33 nodes, limit used is 5
>            A10
>              Matrix Object:               2 MPI processes
>                type: mpiaij
>                rows=369, cols=2754
>                total: nonzeros=8973, allocated nonzeros=8973
>                total number of mallocs used during MatSetValues calls =0
>                  not using I-node (on process 0) routines
>            KSP of A00
>              KSP Object:              (velocity_)               2 MPI processes
>                type: richardson
>                  Richardson: damping factor=1
>                maximum iterations=1, initial guess is zero
>                tolerances:  relative=0, absolute=1e-14, divergence=10000
>                left preconditioning
>                using PRECONDITIONED norm type for convergence test
>              PC Object:              (velocity_)               2 MPI processes
>                type: hypre
>                  HYPRE BoomerAMG preconditioning
>                  HYPRE BoomerAMG: Cycle type V
>                  HYPRE BoomerAMG: Maximum number of levels 25
>                  HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>                  HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>                  HYPRE BoomerAMG: Threshold for strong coupling 0.25
>                  HYPRE BoomerAMG: Interpolation truncation factor 0
>                  HYPRE BoomerAMG: Interpolation: max elements per row 0
>                  HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>                  HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>                  HYPRE BoomerAMG: Maximum row sums 0.9
>                  HYPRE BoomerAMG: Sweeps down         1
>                  HYPRE BoomerAMG: Sweeps up           1
>                  HYPRE BoomerAMG: Sweeps on coarse    1
>                  HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
>                  HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
>                  HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
>                  HYPRE BoomerAMG: Relax weight  (all)      1
>                  HYPRE BoomerAMG: Outer relax weight (all) 1
>                  HYPRE BoomerAMG: Using CF-relaxation
>                  HYPRE BoomerAMG: Measure type        local
>                  HYPRE BoomerAMG: Coarsen type        Falgout
>                  HYPRE BoomerAMG: Interpolation type  classical
>                linear system matrix = precond matrix:
>                Matrix Object:                 2 MPI processes
>                  type: mpiaij
>                  rows=2754, cols=2754
>                  total: nonzeros=25026, allocated nonzeros=25026
>                  total number of mallocs used during MatSetValues calls =0
>                    not using I-node (on process 0) routines
>            A01
>              Matrix Object:               2 MPI processes
>                type: mpiaij
>                rows=2754, cols=369
>                total: nonzeros=7883, allocated nonzeros=7883
>                total number of mallocs used during MatSetValues calls =0
>                  not using I-node (on process 0) routines
>        Matrix Object:         2 MPI processes
>          type: mpiaij
>          rows=369, cols=369
>          total: nonzeros=0, allocated nonzeros=0
>          total number of mallocs used during MatSetValues calls =0
>            using I-node (on process 0) routines: found 33 nodes, limit used is 5
>  linear system matrix = precond matrix:
>  Matrix Object:   2 MPI processes
>    type: mpiaij
>    rows=3123, cols=3123
>    total: nonzeros=41882, allocated nonzeros=52732
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
> 
> 
> 
> Note that "ns_fieldsplit_pressure_" is a PCShell. This make again use of two KSP objects "mass_" and "laplace_"
> 
> 
> 
> KSP Object:(mass_) 2 MPI processes
>  type: cg
>  maximum iterations=2
>  tolerances:  relative=0, absolute=1e-14, divergence=10000
>  left preconditioning
>  using nonzero initial guess
>  using PRECONDITIONED norm type for convergence test
> PC Object:(mass_) 2 MPI processes
>  type: jacobi
>  linear system matrix = precond matrix:
>  Matrix Object:   2 MPI processes
>    type: mpiaij
>    rows=369, cols=369
>    total: nonzeros=2385, allocated nonzeros=2506
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
> 
> 
> 
> AND
> 
> 
> 
> KSP Object:(laplace_) 2 MPI processes
>  type: richardson
>    Richardson: damping factor=1
>  maximum iterations=1
>  tolerances:  relative=0, absolute=1e-14, divergence=10000
>  left preconditioning
>  has attached null space
>  using nonzero initial guess
>  using PRECONDITIONED norm type for convergence test
> PC Object:(laplace_) 2 MPI processes
>  type: hypre
>    HYPRE BoomerAMG preconditioning
>    HYPRE BoomerAMG: Cycle type V
>    HYPRE BoomerAMG: Maximum number of levels 25
>    HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>    HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
>    HYPRE BoomerAMG: Threshold for strong coupling 0.25
>    HYPRE BoomerAMG: Interpolation truncation factor 0
>    HYPRE BoomerAMG: Interpolation: max elements per row 0
>    HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>    HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>    HYPRE BoomerAMG: Maximum row sums 0.9
>    HYPRE BoomerAMG: Sweeps down         1
>    HYPRE BoomerAMG: Sweeps up           1
>    HYPRE BoomerAMG: Sweeps on coarse    1
>    HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>    HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>    HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>    HYPRE BoomerAMG: Relax weight  (all)      1
>    HYPRE BoomerAMG: Outer relax weight (all) 1
>    HYPRE BoomerAMG: Using CF-relaxation
>    HYPRE BoomerAMG: Measure type        local
>    HYPRE BoomerAMG: Coarsen type        Falgout
>    HYPRE BoomerAMG: Interpolation type  classical
>  linear system matrix = precond matrix:
>  Matrix Object:   2 MPI processes
>    type: mpiaij
>    rows=369, cols=369
>    total: nonzeros=1745, allocated nonzeros=2506
>    total number of mallocs used during MatSetValues calls =0
>      not using I-node (on process 0) routines
> 
> 
> The outer iteration count is now influenced when adding "-laplace_ksp_monitor" to the command line options.
> 
> Thomas
> 
> 
> Am 07.11.2012 19:49, schrieb Barry Smith:
>>     This is normally not expected but might happen under some combination of solver options. Please send the output of -ksp_view and the options you use and we'll try to understand the situation.
>> 
>>    Barry
>> 
>> 
>> On Nov 7, 2012, at 12:12 PM, Thomas Witkowski <thomas.witkowski at tu-dresden.de> wrote:
>> 
>>> I have a very curious behavior in one of my codes: Whenever I enable a KSP Monitor for an inner solver, the outer iteration count goes down from 25 to 18! Okay, this is great :) I like it so see iteration counts decreasing, but I would like to know what's going on, and eventually, a KSP monitor should not influence the whole game. An to answer your first question, I run the code through valgrind and its free of any errors. Any idea what to check next? Thanks for any advice.
>>> 
>>> Thomas
>>> 
>