[petsc-users] CPR-AMG: SNES with two cores worse than with one

Fri Jul 7 13:48:02 CDT 2017

Attached the PFLOTRAN default run (KSPBCGS + PCBJacobi + PCILU) for the
same input file.

NP 1

FLOW TS BE steps =     43 newton =       43 linear =       3950 cuts
=      0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 27.6 seconds

NP 2

FLOW TS BE steps =     43 newton =       43 linear =       4585 cuts
=      0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 21.5 seconds

Based on your idea to fix all knobs but one I did a direct solve test on
all the PC and KSP's within the SNES I'm using. Simply put my SNES looks
like this:

SNES
  - LS:   basic
  - KSP:  FGMRES
  - PC:   Composite, multiplicative

    - PC:   Galerkin
      - KSP:  GMRES
      - PC:   Hypre BoomerAMG
      - SetRestriction:       user-defined
      - SetComputeSubMatrix:  user-defined
      - SetInterpolation:     User-defined

    - PC:   BJacobi
      - KSP:  Preonly
      - PC:   LU

I want to test direct solves on both PCComposite branches sub 0 and sub
1 and without PCGalerkin (2 tests), and then include PCGalerkin (2
tests), with a final test emulating the Two Stage PC to also stress the
user-defined functions in PCGalerkin by doing a double direct solve (1
test, pressure solve first, than whole system with the hyperbolic
component included).

Long story short: Most tests require 43 Newton and Linear Steps. The
last test shows problematic behaviour, and consumes all my memory
including swap (16 GB + 16 GB).  I can recall an issue with
automatically generated Fortran callback (?) for ComputeSubMatrix()
issue Matt was mentioning a long while ago..

*Test 2 (Check PCComposite Sub 0)*

mpirun \
  -n ${NP} pflotran \
  -pflotranin ${INPUTFILE}.pflinput \
  -flow_ksp_type preonly \
  -flow_sub_0_pc_type bjacobi \
  -flow_sub_0_sub_pc_type lu \
  -flow_sub_0_sub_pc_factor_
nonzeros_along_diagonal \
  -flow_sub_1_pc_type none \
  -snes_monitor \
  -log_view \
  2>&1 > ${OUTPUTFILE}_test_2.log | tee -a ${OUTPUTFILE}_test_2.log

NP 1

FLOW TS BE steps =     43 newton =       43 linear =         43 cuts
=      0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 215.2 seconds

NP 2

FLOW TS BE steps =     43 newton =       43 linear =         43 cuts
=      0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 53.7 seconds

OK

*Test 3 (Check PCComposite Sub 1)*

mpirun \
  -n ${NP} pflotran \
  -pflotranin ${INPUTFILE}.pflinput \
  -flow_ksp_type preonly \
  -flow_sub_0_pc_type none \
  -flow_sub_1_pc_type bjacobi \
  -flow_sub_1_sub_pc_type lu \
  -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
  -snes_monitor \
  -log_view \
  2>&1 > ${OUTPUTFILE}_test_3.log | tee -a ${OUTPUTFILE}_test_3.log

NP 1

FLOW TS BE steps =     43 newton =       43 linear =         43 cuts
=      0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 195.3 seconds

NP 2

FLOW TS BE steps =     43 newton =       43 linear =         43 cuts
=      0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 53.1 seconds

OK

*Test 4 (Check PCComposite Sub 1 with empty PCGalerkin on Sub 0)*

mpirun \
  -n ${NP} pflotran \
  -pflotranin ${INPUTFILE}.pflinput \
  -flow_ksp_type preonly \
  -flow_sub_0_galerkin_ksp_type preonly \
  -flow_sub_0_galerkin_pc_type none \
  -flow_sub_1_pc_type bjacobi \
  -flow_sub_1_sub_pc_type lu \
  -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
  -snes_monitor \
  -log_view \
  2>&1 > ${OUTPUTFILE}_test_4.log | tee -a ${OUTPUTFILE}_test_4.log

NP 1

FLOW TS BE steps =     43 newton =       43 linear =         43 cuts
=      0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 201.0 seconds

NP2

FLOW TS BE steps =     43 newton =       43 linear =         43 cuts
=      0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 61.1 seconds

OK

*Test 5 (Check **PCComposite**Sub 0 with populated PCGalerkin on Sub 0)*

mpirun \
  -n ${NP} pflotran \
  -pflotranin ${INPUTFILE}.pflinput \
  -flow_ksp_type preonly \
  -flow_sub_0_galerkin_ksp_type preonly \
  -flow_sub_0_galerkin_pc_type bjacobi \
  -flow_sub_0_galerkin_sub_pc_type lu \
  -flow_sub_0_galerkin_sub_pc_factor_nonzeros_along_diagonal \
  -flow_sub_1_ksp_type preonly \
  -flow_sub_1_pc_type none \
  -snes_monitor \
  -log_view \
  2>&1 > ${OUTPUTFILE}_test_5.log | tee -a ${OUTPUTFILE}_test_5.log

NP 1

FLOW TS BE steps =    142 newton =      294 linear =        294 cuts
=     19
FLOW TS BE Wasted Linear Iterations = 152
FLOW TS BE SNES time = 291.5 seconds

NP 2

FLOW TS BE steps =    118 newton =      238 linear =        238 cuts
=     15
FLOW TS BE Wasted Linear Iterations = 120
FLOW TS BE SNES time = 113.4 seconds

LU only solves sub problem Ap

*Test 6 (Check PCComposite-PCGalerkin interaction)*

mpirun \
  -n ${NP} pflotran \
  -pflotranin ${INPUTFILE}.pflinput \
  -flow_ksp_type preonly \
  -flow_sub_0_galerkin_ksp_type preonly \
  -flow_sub_0_galerkin_pc_type bjacobi \
  -flow_sub_0_galerkin_sub_pc_type lu \
  -flow_sub_0_galerkin_sub_pc_factor_nonzeros_along_diagonal \
  -flow_sub_1_ksp_type preonly \
  -flow_sub_1_pc_type bjacobi \
  -flow_sub_1_sub_pc_type lu \
  -flow_sub_1_sub_pc_type bjacobi \
  -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
  -snes_monitor \
  -log_view \
  2>&1 > ${OUTPUTFILE}_test_6.log | tee -a ${OUTPUTFILE}_test_6.log

NP = 1

 Step   1236 Time=  5.58118E-01 Dt=  2.97885E-05 [d] snes_conv_reason:    4
  newton =   1 [    3135] linear =     1 [      3135] cuts =  0 [   4]
  --> SNES Linear/Non-Linear Iterations =            1  /            1
  --> SNES Residual:   7.849082E-03  9.811352E-07  7.480307E-03
  --> max chng: dpl=   3.3471E+03 dpo=   3.3471E+03  dso=   1.5938E-05
dt=   0.0000E+00

Simulation takes forever and is ongoing as of writing.

NP = 2

 Step    950 Time=  4.45196E-01 Dt=  1.14579E-04 [d] snes_conv_reason:    4
  newton =   3 [    3217] linear =     3 [      3217] cuts =  0 [   0]
  --> SNES Linear/Non-Linear Iterations =            3  /            3
  --> SNES Residual:   2.777416E-02  3.471770E-06  1.976402E-02
  --> max chng: dpl=   9.3491E+02 dpo=   9.3491E+02  dso=   5.8911E-05
dt=   0.0000E+00

== TOIL_IMS FLOW
================================================================
  0 SNES Function norm 2.830927483272e-02
  0 2r: 2.83E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 1.98E-02 iu: 0.00E+00
rsn:   0

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 586 RUNNING AT mother
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

On 07/07/17 17:10, Barry Smith wrote:
>    I don't have a clue. It looks like the np 2 case just takes a different trajectory that runs into trouble that doesn't happen to np 1. Since the linear solves give very good convergence for both np 2 and np 1 I don't think the preconditioner is really the "problem".
>
>    I absolutely do not like the fact that the code is not using a line search. If you look at even the sequential case SNES is sometimes stopping due to snorm even though the function norm has actually increased. Frankly I'm very suspicious of the "solution" of the time integration, it is too much "hit the engine a few times with the hammer until it does what it wants" engineering.
>
>    What happens if you run the 1 and 2 process case with the "default" Pflotran linear solver? 
>
>  
>> On Jul 7, 2017, at 6:05 AM, Robert Annewandter <robert.annewandter at opengosim.com> wrote:
>>
>> Yes indeed, PFLOTRAN cuts timestep after 8 failed iterations of SNES. 
>>
>> I've rerun with -snes_monitor (attached with canonical suffix), their -pc_type is always PCBJACOBI + PCLU (though we'd like to try SUPERLU in the future, however it works only with -mat_type aij..)
>>
>>
>> The sequential and parallel runs I did  with
>>  
>>     -ksp_type preonly -pc_type lu -pc_factor_nonzeros_along_diagonal -snes_monitor
>>
>> and 
>>
>>     -ksp_type preonly -pc_type bjacobi -sub_pc_type lu -sub_pc_factor_nonzeros_along_diagonal -snes_monitor
>>
>> As expected, the sequential are bot identical and the parallel takes half the time compared to sequential.
>>
>>
>>
>>
>> On 07/07/17 01:20, Barry Smith wrote:
>>>    Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and cuts the timestep if that fails.
>>>
>>>    Please run with -snes_monitor I don't understand the strange densely packed information that PFLOTRAN is printing.
>>>
>>>    It looks like the linear solver is converging fine in parallel, normally then there is absolutely no reason that the Newton should behave different on 2 processors than 1 unless there is something wrong with the Jacobian. What is the -pc_type for the two cases LU or your fancy thing? 
>>>
>>>    Please run sequential and parallel with -pc_type lu and also with -snes_monitor.  We need to fix all the knobs but one in order to understand what is going on.
>>>
>>>
>>>    Barry
>>>
>>>
>>>   
>>>
>>>> On Jul 6, 2017, at 5:11 PM, Robert Annewandter <robert.annewandter at opengosim.com>
>>>>  wrote:
>>>>
>>>> Thanks Barry!
>>>>
>>>> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES time: 600 s). PFLOTRAN final output:
>>>>
>>>> NP 1
>>>>
>>>> FLOW TS BE steps =     43 newton =       43 linear =         43 cuts =      0
>>>> FLOW TS BE Wasted Linear Iterations = 0
>>>> FLOW TS BE SNES time = 218.9 seconds
>>>>
>>>> NP 2
>>>>
>>>> FLOW TS BE steps =     67 newton =      176 linear =        314 cuts =     13
>>>> FLOW TS BE Wasted Linear Iterations = 208
>>>> FLOW TS BE SNES time = 600.0 seconds
>>>>
>>>>
>>>> Robert
>>>>
>>>> On 06/07/17 21:24, Barry Smith wrote:
>>>>
>>>>>    So on one process the outer linear solver takes a single iteration this is because the block Jacobi with LU and one block is a direct solver.
>>>>>
>>>>>
>>>>>
>>>>>>     11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.148515820410e-14 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 1.581814306485e-02 at iteration 1
>>>>>>     1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>>>>>>
>>>>>>
>>>>>    On two processes the outer linear solver takes a few iterations to solver, this is to be expected. 
>>>>>
>>>>>    But what you sent doesn't give any indication about SNES not converging. Please turn off all inner linear solver monitoring and just run with -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor -snes_converged_reason
>>>>>
>>>>>    Barry
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Jul 6, 2017, at 2:03 PM, Robert Annewandter <robert.annewandter at opengosim.com>
>>>>>>
>>>>>>  wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner (with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES + BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles to converge when using two cores instead of one. Because of the adaptive time stepping of the Newton, this leads to severe cuts in time step.
>>>>>>
>>>>>> This is how I run it with two cores
>>>>>>
>>>>>> mpirun \
>>>>>>   -n 2 pflotran \
>>>>>>   -pflotranin het.pflinput \
>>>>>>   -ksp_monitor_true_residual \
>>>>>>   -flow_snes_view \
>>>>>>   -flow_snes_converged_reason \
>>>>>>   -flow_sub_1_pc_type bjacobi \
>>>>>>   -flow_sub_1_sub_pc_type lu \
>>>>>>   -flow_sub_1_sub_pc_factor_pivot_in_blocks true\
>>>>>>   -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
>>>>>>   -options_left \
>>>>>>   -log_summary \
>>>>>>   -info 
>>>>>>
>>>>>>
>>>>>> With one core I get (after grepping the crap away from -info):
>>>>>>
>>>>>>  Step     32 Time=  1.80000E+01 
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>   0 2r: 1.58E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 7.18E-03 iu: 0.00E+00 rsn:   0
>>>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>>>     Residual norms for flow_ solve.
>>>>>>     0 KSP unpreconditioned resid norm 1.581814306485e-02 true resid norm 1.581814306485e-02 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>>>       0 KSP preconditioned resid norm 5.697603110484e+07 true resid norm 5.175721849125e+03 ||r(i)||/||b|| 5.037527476892e+03
>>>>>>       1 KSP preconditioned resid norm 5.041509073319e+06 true resid norm 3.251596928176e+02 ||r(i)||/||b|| 3.164777657484e+02
>>>>>>       2 KSP preconditioned resid norm 1.043761838360e+06 true resid norm 8.957519558348e+01 ||r(i)||/||b|| 8.718349288342e+01
>>>>>>       3 KSP preconditioned resid norm 1.129189815646e+05 true resid norm 2.722436912053e+00 ||r(i)||/||b|| 2.649746479496e+00
>>>>>>       4 KSP preconditioned resid norm 8.829637298082e+04 true resid norm 8.026373593492e+00 ||r(i)||/||b|| 7.812065388300e+00
>>>>>>       5 KSP preconditioned resid norm 6.506021637694e+04 true resid norm 3.479889319880e+00 ||r(i)||/||b|| 3.386974527698e+00
>>>>>>       6 KSP preconditioned resid norm 6.392263200180e+04 true resid norm 3.819202631980e+00 ||r(i)||/||b|| 3.717228003987e+00
>>>>>>       7 KSP preconditioned resid norm 2.464946645480e+04 true resid norm 7.329964753388e-01 ||r(i)||/||b|| 7.134251013911e-01
>>>>>>       8 KSP preconditioned resid norm 2.603879153772e+03 true resid norm 2.035525412004e-02 ||r(i)||/||b|| 1.981175861414e-02
>>>>>>       9 KSP preconditioned resid norm 1.774410462754e+02 true resid norm 3.001214973121e-03 ||r(i)||/||b|| 2.921081026352e-03
>>>>>>     10 KSP preconditioned resid norm 1.664227038378e+01 true resid norm 3.413136309181e-04 ||r(i)||/||b|| 3.322003855903e-04
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 1.131868956745e+00 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 2.067297386780e+07 at iteration 11
>>>>>>     11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.148515820410e-14 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 1.581814306485e-02 at iteration 1
>>>>>>     1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>>>>>> [0] SNESSolve_NEWTONLS(): iter=0, linear solve iterations=1
>>>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 3.590873180642e-01 near zero implies inconsistent rhs
>>>>>> [0] SNESSolve_NEWTONLS(): fnorm=1.5818143064846742e-02, gnorm=1.0695649833687331e-02, ynorm=4.6826522561266171e+02, lssucceed=0
>>>>>> [0] SNESConvergedDefault(): Converged due to small update length: 4.682652256127e+02 < 1.000000000000e-05 * 3.702480426117e+09
>>>>>>   1 2r: 1.07E-02 2x: 3.70E+09 2u: 4.68E+02 ir: 5.05E-03 iu: 4.77E+01 rsn: stol
>>>>>> Nonlinear flow_ solve converged due to CONVERGED_SNORM_RELATIVE iterations 1
>>>>>>
>>>>>>
>>>>>>
>>>>>> But with two cores I get:
>>>>>>
>>>>>>
>>>>>>  Step     32 Time=  1.80000E+01 
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>   0 2r: 6.16E-03 2x: 0.00E+00 2u: 0.00E+00 ir: 3.63E-03 iu: 0.00E+00 rsn:   0
>>>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>>>
>>>>>>     Residual norms for flow_ solve.
>>>>>>     0 KSP unpreconditioned resid norm 6.162760088924e-03 true resid norm 6.162760088924e-03 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>>>       0 KSP preconditioned resid norm 8.994949630499e+08 true resid norm 7.982144380936e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       1 KSP preconditioned resid norm 8.950556502615e+08 true resid norm 1.550138696155e+00 ||r(i)||/||b|| 1.942007839218e+00
>>>>>>       2 KSP preconditioned resid norm 1.044849684205e+08 true resid norm 2.166193480531e+00 ||r(i)||/||b|| 2.713798920631e+00
>>>>>>       3 KSP preconditioned resid norm 8.209708619718e+06 true resid norm 3.076045005154e-01 ||r(i)||/||b|| 3.853657436340e-01
>>>>>>       4 KSP preconditioned resid norm 3.027461352422e+05 true resid norm 1.207731865714e-02 ||r(i)||/||b|| 1.513041869549e-02
>>>>>>       5 KSP preconditioned resid norm 1.595302164817e+04 true resid norm 4.123713694368e-04 ||r(i)||/||b|| 5.166172769585e-04
>>>>>>       6 KSP preconditioned resid norm 1.898935810797e+03 true resid norm 8.275885058330e-05 ||r(i)||/||b|| 1.036799719897e-04
>>>>>>       7 KSP preconditioned resid norm 1.429881682558e+02 true resid norm 4.751240525466e-06 ||r(i)||/||b|| 5.952335987324e-06
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 8.404003313455e+00 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 8.994949630499e+08 at iteration 8
>>>>>>       8 KSP preconditioned resid norm 8.404003313455e+00 true resid norm 3.841921844578e-07 ||r(i)||/||b|| 4.813145016211e-07
>>>>>>     1 KSP unpreconditioned resid norm 6.162162548202e-03 true resid norm 6.162162548202e-03 ||r(i)||/||b|| 9.999030400804e-01
>>>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>>>       0 KSP preconditioned resid norm 4.360556381209e+07 true resid norm 1.000000245433e+00 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       1 KSP preconditioned resid norm 5.385519331932e+06 true resid norm 8.785183939860e-02 ||r(i)||/||b|| 8.785181783689e-02
>>>>>>       2 KSP preconditioned resid norm 4.728931283459e+05 true resid norm 2.008708805316e-02 ||r(i)||/||b|| 2.008708312313e-02
>>>>>>       3 KSP preconditioned resid norm 2.734215698319e+04 true resid norm 6.418720397673e-03 ||r(i)||/||b|| 6.418718822309e-03
>>>>>>       4 KSP preconditioned resid norm 1.002270029334e+04 true resid norm 4.040289515991e-03 ||r(i)||/||b|| 4.040288524372e-03
>>>>>>       5 KSP preconditioned resid norm 1.321280190971e+03 true resid norm 1.023292238313e-04 ||r(i)||/||b|| 1.023291987163e-04
>>>>>>       6 KSP preconditioned resid norm 6.594292964815e+01 true resid norm 1.877106733170e-06 ||r(i)||/||b|| 1.877106272467e-06
>>>>>>       7 KSP preconditioned resid norm 7.816325147216e+00 true resid norm 2.552611664980e-07 ||r(i)||/||b|| 2.552611038486e-07
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 6.391568446109e-01 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 4.360556381209e+07 at iteration 8
>>>>>>       8 KSP preconditioned resid norm 6.391568446109e-01 true resid norm 1.680724939670e-08 ||r(i)||/||b|| 1.680724527166e-08
>>>>>>     2 KSP unpreconditioned resid norm 4.328902922753e-07 true resid norm 4.328902922752e-07 ||r(i)||/||b|| 7.024292460341e-05
>>>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>>>       0 KSP preconditioned resid norm 8.794597825780e+08 true resid norm 1.000000094566e+00 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       1 KSP preconditioned resid norm 8.609906572102e+08 true resid norm 2.965044981249e+00 ||r(i)||/||b|| 2.965044700856e+00
>>>>>>       2 KSP preconditioned resid norm 9.318108989314e+07 true resid norm 1.881262939380e+00 ||r(i)||/||b|| 1.881262761477e+00
>>>>>>       3 KSP preconditioned resid norm 6.908723262483e+06 true resid norm 2.639592490398e-01 ||r(i)||/||b|| 2.639592240782e-01
>>>>>>       4 KSP preconditioned resid norm 2.651677791227e+05 true resid norm 9.736480169584e-03 ||r(i)||/||b|| 9.736479248845e-03
>>>>>>       5 KSP preconditioned resid norm 1.192178471172e+04 true resid norm 3.082839752692e-04 ||r(i)||/||b|| 3.082839461160e-04
>>>>>>       6 KSP preconditioned resid norm 1.492201446262e+03 true resid norm 4.633866284506e-05 ||r(i)||/||b|| 4.633865846301e-05
>>>>>>       7 KSP preconditioned resid norm 1.160670017241e+02 true resid norm 2.821157348522e-06 ||r(i)||/||b|| 2.821157081737e-06
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 6.447568262216e+00 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 8.794597825780e+08 at iteration 8
>>>>>>       8 KSP preconditioned resid norm 6.447568262216e+00 true resid norm 1.516068561348e-07 ||r(i)||/||b|| 1.516068417980e-07
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 6.135731709822e-15 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 6.162760088924e-03 at iteration 3
>>>>>>     3 KSP unpreconditioned resid norm 6.135731709822e-15 true resid norm 1.142020328809e-14 ||r(i)||/||b|| 1.853098793933e-12
>>>>>>
>>>>>> [0] SNESSolve_NEWTONLS(): iter=0, linear solve iterations=3
>>>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 1.998388224666e-02 near zero implies inconsistent rhs
>>>>>> [0] SNESSolve_NEWTONLS(): fnorm=6.1627600889243711e-03, gnorm=1.0406503258190572e-02, ynorm=6.2999025681245366e+04, lssucceed=0  
>>>>>>   1 2r: 1.04E-02 2x: 3.70E+09 2u: 6.30E+04 ir: 6.54E-03 iu: 5.00E+04 rsn:   0
>>>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>>>
>>>>>>     Residual norms for flow_ solve.
>>>>>>     0 KSP unpreconditioned resid norm 1.040650325819e-02 true resid norm 1.040650325819e-02 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>>>       0 KSP preconditioned resid norm 6.758906811264e+07 true resid norm 9.814998431686e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       1 KSP preconditioned resid norm 2.503922806424e+06 true resid norm 2.275130113021e-01 ||r(i)||/||b|| 2.318013730574e-01
>>>>>>       2 KSP preconditioned resid norm 3.316753614870e+05 true resid norm 3.820733530238e-02 ||r(i)||/||b|| 3.892750016040e-02
>>>>>>       3 KSP preconditioned resid norm 2.956751700483e+04 true resid norm 2.143772538677e-03 ||r(i)||/||b|| 2.184180215207e-03
>>>>>>       4 KSP preconditioned resid norm 1.277067042524e+03 true resid norm 9.093614251311e-05 ||r(i)||/||b|| 9.265018547485e-05
>>>>>>       5 KSP preconditioned resid norm 1.060996002446e+02 true resid norm 1.042893700050e-05 ||r(i)||/||b|| 1.062551061326e-05
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 5.058127343285e+00 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 6.758906811264e+07 at iteration 6
>>>>>>       6 KSP preconditioned resid norm 5.058127343285e+00 true resid norm 4.054770602120e-07 ||r(i)||/||b|| 4.131198420807e-07
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 4.449606189225e-10 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 1.040650325819e-02 at iteration 1
>>>>>>     1 KSP unpreconditioned resid norm 4.449606189225e-10 true resid norm 4.449606189353e-10 ||r(i)||/||b|| 4.275793779098e-08
>>>>>>
>>>>>> [0] SNESSolve_NEWTONLS(): iter=1, linear solve iterations=1
>>>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 4.300066663571e-02 near zero implies inconsistent rhs
>>>>>> [0] SNESSolve_NEWTONLS(): fnorm=1.0406503258190572e-02, gnorm=7.3566280848133728e-02, ynorm=7.9500485128639993e+04, lssucceed=0
>>>>>>   2 2r: 7.36E-02 2x: 3.70E+09 2u: 7.95E+04 ir: 4.62E-02 iu: 5.00E+04 rsn:   0
>>>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>>>
>>>>>>     Residual norms for flow_ solve.
>>>>>>     0 KSP unpreconditioned resid norm 7.356628084813e-02 true resid norm 7.356628084813e-02 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>>>       0 KSP preconditioned resid norm 7.253424029194e+06 true resid norm 9.647008645250e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       1 KSP preconditioned resid norm 7.126940190688e+06 true resid norm 1.228009197928e+00 ||r(i)||/||b|| 1.272942984800e+00
>>>>>>       2 KSP preconditioned resid norm 9.391591432635e+05 true resid norm 7.804929162756e-01 ||r(i)||/||b|| 8.090517433711e-01
>>>>>>       3 KSP preconditioned resid norm 6.538499674761e+04 true resid norm 5.503467432893e-02 ||r(i)||/||b|| 5.704843475602e-02
>>>>>>       4 KSP preconditioned resid norm 1.593713396575e+04 true resid norm 8.902701363763e-02 ||r(i)||/||b|| 9.228457951208e-02
>>>>>>       5 KSP preconditioned resid norm 4.837260621464e+02 true resid norm 2.966772992825e-03 ||r(i)||/||b|| 3.075329464213e-03
>>>>>>       6 KSP preconditioned resid norm 1.681372767335e+02 true resid norm 5.312467443025e-04 ||r(i)||/||b|| 5.506854651406e-04
>>>>>>       7 KSP preconditioned resid norm 1.271478850717e+01 true resid norm 2.123810020488e-05 ||r(i)||/||b|| 2.201521838103e-05
>>>>>>       8 KSP preconditioned resid norm 1.262723712696e+00 true resid norm 1.150572715331e-06 ||r(i)||/||b|| 1.192673042641e-06
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 9.053072585125e-02 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 7.253424029194e+06 at iteration 9
>>>>>>       9 KSP preconditioned resid norm 9.053072585125e-02 true resid norm 9.475050575058e-08 ||r(i)||/||b|| 9.821749853747e-08
>>>>>>     1 KSP unpreconditioned resid norm 8.171589173162e-03 true resid norm 8.171589173162e-03 ||r(i)||/||b|| 1.110779161180e-01
>>>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>>>       0 KSP preconditioned resid norm 4.345765068989e+07 true resid norm 9.999992231691e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       1 KSP preconditioned resid norm 5.388715093466e+06 true resid norm 8.125387327699e-02 ||r(i)||/||b|| 8.125393639755e-02
>>>>>>       2 KSP preconditioned resid norm 4.763725726436e+05 true resid norm 2.464285618036e-02 ||r(i)||/||b|| 2.464287532371e-02
>>>>>>       3 KSP preconditioned resid norm 2.287746683380e+04 true resid norm 7.224823080100e-03 ||r(i)||/||b|| 7.224828692570e-03
>>>>>>       4 KSP preconditioned resid norm 4.872858764091e+03 true resid norm 3.972261388893e-03 ||r(i)||/||b|| 3.972264474670e-03
>>>>>>       5 KSP preconditioned resid norm 8.670449895323e+02 true resid norm 2.359005963873e-04 ||r(i)||/||b|| 2.359007796423e-04
>>>>>>       6 KSP preconditioned resid norm 4.252589693890e+01 true resid norm 1.471904261226e-06 ||r(i)||/||b|| 1.471905404648e-06
>>>>>>       7 KSP preconditioned resid norm 5.128476471782e+00 true resid norm 1.643725157865e-07 ||r(i)||/||b|| 1.643726434763e-07
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 4.311901915856e-01 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 4.345765068989e+07 at iteration 8
>>>>>>       8 KSP preconditioned resid norm 4.311901915856e-01 true resid norm 1.166123921637e-08 ||r(i)||/||b|| 1.166124827519e-08
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.373662391739e-09 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 7.356628084813e-02 at iteration 2
>>>>>>     2 KSP unpreconditioned resid norm 2.373662391739e-09 true resid norm 2.373662391658e-09 ||r(i)||/||b|| 3.226562990941e-08
>>>>>>
>>>>>> [0] SNESSolve_NEWTONLS(): iter=2, linear solve iterations=2
>>>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 4.343326231305e-02 near zero implies inconsistent rhs
>>>>>> [0] SNESSolve_NEWTONLS(): fnorm=7.3566280848133728e-02, gnorm=7.2259942496422647e-02, ynorm=6.3156901950486099e+04, lssucceed=0
>>>>>>   3 2r: 7.23E-02 2x: 3.70E+09 2u: 6.32E+04 ir: 4.52E-02 iu: 5.00E+04 rsn:   0
>>>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>>>
>>>>>>     Residual norms for flow_ solve.
>>>>>>     0 KSP unpreconditioned resid norm 7.225994249642e-02 true resid norm 7.225994249642e-02 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>>>       0 KSP preconditioned resid norm 7.705582590638e+05 true resid norm 9.649751442741e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>>>       1 KSP preconditioned resid norm 2.444424220392e+04 true resid norm 8.243110200738e-03 ||r(i)||/||b|| 8.542303135630e-03
>>>>>>       2 KSP preconditioned resid norm 2.080899648412e+03 true resid norm 7.642343147053e-04 ||r(i)||/||b|| 7.919730567570e-04
>>>>>>       3 KSP preconditioned resid norm 9.911171129874e+02 true resid norm 5.904182179180e-05 ||r(i)||/||b|| 6.118481096859e-05
>>>>>>       4 KSP preconditioned resid norm 5.258230282482e+02 true resid norm 2.043366677644e-04 ||r(i)||/||b|| 2.117532964210e-04
>>>>>>       5 KSP preconditioned resid norm 5.522830460456e+01 true resid norm 1.710780366056e-05 ||r(i)||/||b|| 1.772875059225e-05
>>>>>>       6 KSP preconditioned resid norm 5.922280741715e+00 true resid norm 1.543198740828e-06 ||r(i)||/||b|| 1.599210870855e-06
>>>>>>       7 KSP preconditioned resid norm 3.339500859115e-01 true resid norm 1.221335666427e-07 ||r(i)||/||b|| 1.265665414984e-07
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.329208597672e-02 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 7.705582590638e+05 at iteration 8
>>>>>>       8 KSP preconditioned resid norm 3.329208597672e-02 true resid norm 9.758240835324e-09 ||r(i)||/||b|| 1.011242713683e-08
>>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.697128456432e-11 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 7.225994249642e-02 at iteration 1
>>>>>>     1 KSP unpreconditioned resid norm 2.697128456432e-11 true resid norm 2.697128457142e-11 ||r(i)||/||b|| 3.732536124389e-10
>>>>>>
>>>>>> [0] SNESSolve_NEWTONLS(): iter=3, linear solve iterations=1
>>>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 4.329227684222e-02 near zero implies inconsistent rhs
>>>>>> [0] SNESSolve_NEWTONLS(): fnorm=7.2259942496422647e-02, gnorm=5.4435602192925014e-01, ynorm=2.7049750229137400e+04, lssucceed=0
>>>>>> [0] SNESConvergedDefault(): Converged due to small update length: 2.704975022914e+04 < 1.000000000000e-05 * 3.702469482296e+09
>>>>>>   4 2r: 5.44E-01 2x: 3.70E+09 2u: 2.70E+04 ir: 3.84E-01 iu: 2.34E+04 rsn: stol
>>>>>> Nonlinear flow_ solve converged due to CONVERGED_SNORM_RELATIVE iterations 4
>>>>>>
>>>>>>
>>>>>> As the simulation advances this behaviour leads to frequent time step cuts because of 8 subsequently failed Newton iterations, which brings the simulation practically to a halt.
>>>>>>
>>>>>> Is the Block Jacobi not a good choice? Better ASM with huge overlap? Or is there something wrong with my RHS? Maybe the SNES, SNESLS, KSP tolerances need better tuning?
>>>>>>
>>>>>> Grateful for any clarifying words!
>>>>>> Robert
>>>>>>
>>>>>>
>>>>>> My SNES_view is:
>>>>>>
>>>>>>
>>>>>> SNES Object: (flow_) 2 MPI processes
>>>>>>   type: newtonls
>>>>>>   maximum iterations=8, maximum function evaluations=10000
>>>>>>   tolerances: relative=1e-05, absolute=1e-05, solution=1e-05
>>>>>>   total number of linear solver iterations=1
>>>>>>   total number of function evaluations=2
>>>>>>   norm schedule ALWAYS
>>>>>>   SNESLineSearch Object: (flow_) 2 MPI processes
>>>>>>     type: basic
>>>>>>     maxstep=1.000000e+08, minlambda=1.000000e-05
>>>>>>     tolerances: relative=1.000000e-05, absolute=1.000000e-05, lambda=1.000000e-08
>>>>>>     maximum iterations=40
>>>>>>     using user-defined precheck step
>>>>>>   KSP Object: (flow_) 2 MPI processes
>>>>>>     type: fgmres
>>>>>>       GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>>>>>       GMRES: happy breakdown tolerance 1e-30
>>>>>>     maximum iterations=200, initial guess is zero
>>>>>>     tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>>>>>>     right preconditioning
>>>>>>     using UNPRECONDITIONED norm type for convergence test
>>>>>>   PC Object: (flow_) 2 MPI processes
>>>>>>     type: composite
>>>>>>     Composite PC type - MULTIPLICATIVE
>>>>>>     PCs on composite preconditioner follow
>>>>>>     ---------------------------------
>>>>>>       PC Object: (flow_sub_0_) 2 MPI processes
>>>>>>         type: galerkin
>>>>>>         Galerkin PC
>>>>>>         KSP on Galerkin follow
>>>>>>         ---------------------------------
>>>>>>         KSP Object: (flow_sub_0_galerkin_) 2 MPI processes
>>>>>>           type: gmres
>>>>>>             GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>>>>>             GMRES: happy breakdown tolerance 1e-30
>>>>>>           maximum iterations=200, initial guess is zero
>>>>>>           tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>>>>>>           left preconditioning
>>>>>>           using PRECONDITIONED norm type for convergence test
>>>>>>         PC Object: (flow_sub_0_galerkin_) 2 MPI processes
>>>>>>           type: hypre
>>>>>>             HYPRE BoomerAMG preconditioning
>>>>>>             HYPRE BoomerAMG: Cycle type V
>>>>>>             HYPRE BoomerAMG: Maximum number of levels 25
>>>>>>             HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>>>             HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
>>>>>>             HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>>>             HYPRE BoomerAMG: Interpolation truncation factor 0.
>>>>>>             HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>>>             HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>>>             HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>>>             HYPRE BoomerAMG: Maximum row sums 0.9
>>>>>>             HYPRE BoomerAMG: Sweeps down         1
>>>>>>             HYPRE BoomerAMG: Sweeps up           1
>>>>>>             HYPRE BoomerAMG: Sweeps on coarse    1
>>>>>>             HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>>>>>             HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>>>>>             HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>>>>>             HYPRE BoomerAMG: Relax weight  (all)      1.
>>>>>>             HYPRE BoomerAMG: Outer relax weight (all) 1.
>>>>>>             HYPRE BoomerAMG: Using CF-relaxation
>>>>>>             HYPRE BoomerAMG: Not using more complex smoothers.
>>>>>>             HYPRE BoomerAMG: Measure type        local
>>>>>>             HYPRE BoomerAMG: Coarsen type        Falgout
>>>>>>             HYPRE BoomerAMG: Interpolation type  classical
>>>>>>           linear system matrix = precond matrix:
>>>>>>           Mat Object: 2 MPI processes
>>>>>>             type: mpiaij
>>>>>>             rows=8000, cols=8000
>>>>>>             total: nonzeros=53600, allocated nonzeros=53600
>>>>>>             total number of mallocs used during MatSetValues calls =0
>>>>>>               not using I-node (on process 0) routines
>>>>>>         linear system matrix = precond matrix:
>>>>>>         Mat Object: (flow_) 2 MPI processes
>>>>>>           type: mpibaij
>>>>>>           rows=24000, cols=24000, bs=3
>>>>>>           total: nonzeros=482400, allocated nonzeros=482400
>>>>>>           total number of mallocs used during MatSetValues calls =0
>>>>>>       PC Object: (flow_sub_1_) 2 MPI processes
>>>>>>         type: bjacobi
>>>>>>           block Jacobi: number of blocks = 2
>>>>>>           Local solve is same for all blocks, in the following KSP and PC objects:
>>>>>>         KSP Object: (flow_sub_1_sub_) 1 MPI processes
>>>>>>           type: preonly
>>>>>>           maximum iterations=10000, initial guess is zero
>>>>>>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.     <------ not working: -flow_sub_1_sub_ksp_rtol 1e-7
>>>>>>           left preconditioning
>>>>>>           using NONE norm type for convergence test
>>>>>>         PC Object: (flow_sub_1_sub_) 1 MPI processes
>>>>>>           type: lu
>>>>>>             out-of-place factorization
>>>>>>             tolerance for zero pivot 2.22045e-14
>>>>>>             matrix ordering: nd
>>>>>>             factor fill ratio given 5., needed 18.3108
>>>>>>               Factored matrix follows:
>>>>>>                 Mat Object: 1 MPI processes
>>>>>>                   type: seqbaij
>>>>>>                   rows=12000, cols=12000, bs=3
>>>>>>                   package used to perform factorization: petsc
>>>>>>                   total: nonzeros=4350654, allocated nonzeros=4350654
>>>>>>                   total number of mallocs used during MatSetValues calls =0
>>>>>>                       block size is 3
>>>>>>           linear system matrix = precond matrix:
>>>>>>           Mat Object: (flow_) 1 MPI processes
>>>>>>             type: seqbaij
>>>>>>             rows=12000, cols=12000, bs=3
>>>>>>             total: nonzeros=237600, allocated nonzeros=237600
>>>>>>             total number of mallocs used during MatSetValues calls =0
>>>>>>                 block size is 3
>>>>>>         linear system matrix = precond matrix:
>>>>>>         Mat Object: (flow_) 2 MPI processes
>>>>>>           type: mpibaij
>>>>>>           rows=24000, cols=24000, bs=3
>>>>>>           total: nonzeros=482400, allocated nonzeros=482400
>>>>>>           total number of mallocs used during MatSetValues calls =0
>>>>>>     ---------------------------------
>>>>>>     linear system matrix = precond matrix:
>>>>>>     Mat Object: (flow_) 2 MPI processes
>>>>>>       type: mpibaij
>>>>>>       rows=24000, cols=24000, bs=3
>>>>>>       total: nonzeros=482400, allocated nonzeros=482400
>>>>>>       total number of mallocs used during MatSetValues calls =0
>>>>>>
>>>>>>
>>>> <het_np_1.log><het_np_2.log>
>>>>
>> <het_np_1_CPR-AMG.log><het_np_2_CPR-AMG.log><het_np_1_PCLU.log><het_np_1 PCBJACOBI_SUBPCLU.log><het_np_2_PCBJACOBI_SUBPCLU.log><snes_view.txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/1081e547/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: het_np_1_default.log
Type: text/x-log
Size: 37690 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/1081e547/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: het_np_2_default.log
Type: text/x-log
Size: 38129 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/1081e547/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Check_SNES_KSP_PC_Tree.tar.gz
Type: application/gzip
Size: 644829 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/1081e547/attachment-0005.bin>