[petsc-users] CPR-AMG: SNES with two cores worse than with one

Robert Annewandter robert.annewandter at opengosim.com
Fri Jul 7 06:05:34 CDT 2017


Yes indeed, PFLOTRAN cuts timestep after 8 failed iterations of SNES.

I've rerun with -snes_monitor (attached with canonical suffix), their
-pc_type is always PCBJACOBI + PCLU (though we'd like to try SUPERLU in
the future, however it works only with -mat_type aij..)


The sequential and parallel runs I did  with
 
    -ksp_type preonly -pc_type lu -pc_factor_nonzeros_along_diagonal
-snes_monitor

and

    -ksp_type preonly -pc_type bjacobi -sub_pc_type lu
-sub_pc_factor_nonzeros_along_diagonal -snes_monitor

As expected, the sequential are bot identical and the parallel takes
half the time compared to sequential.




On 07/07/17 01:20, Barry Smith wrote:
>    Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and cuts the timestep if that fails.
>
>    Please run with -snes_monitor I don't understand the strange densely packed information that PFLOTRAN is printing.
>
>    It looks like the linear solver is converging fine in parallel, normally then there is absolutely no reason that the Newton should behave different on 2 processors than 1 unless there is something wrong with the Jacobian. What is the -pc_type for the two cases LU or your fancy thing? 
>
>    Please run sequential and parallel with -pc_type lu and also with -snes_monitor.  We need to fix all the knobs but one in order to understand what is going on.
>
>
>    Barry
>
>
>   
>> On Jul 6, 2017, at 5:11 PM, Robert Annewandter <robert.annewandter at opengosim.com> wrote:
>>
>> Thanks Barry!
>>
>> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES time: 600 s). PFLOTRAN final output:
>>
>> NP 1
>>
>> FLOW TS BE steps =     43 newton =       43 linear =         43 cuts =      0
>> FLOW TS BE Wasted Linear Iterations = 0
>> FLOW TS BE SNES time = 218.9 seconds
>>
>> NP 2
>>
>> FLOW TS BE steps =     67 newton =      176 linear =        314 cuts =     13
>> FLOW TS BE Wasted Linear Iterations = 208
>> FLOW TS BE SNES time = 600.0 seconds
>>
>>
>> Robert
>>
>> On 06/07/17 21:24, Barry Smith wrote:
>>>    So on one process the outer linear solver takes a single iteration this is because the block Jacobi with LU and one block is a direct solver.
>>>
>>>
>>>>     11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.148515820410e-14 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 1.581814306485e-02 at iteration 1
>>>>     1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>>>>
>>>    On two processes the outer linear solver takes a few iterations to solver, this is to be expected. 
>>>
>>>    But what you sent doesn't give any indication about SNES not converging. Please turn off all inner linear solver monitoring and just run with -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor -snes_converged_reason
>>>
>>>    Barry
>>>
>>>
>>>
>>>
>>>> On Jul 6, 2017, at 2:03 PM, Robert Annewandter <robert.annewandter at opengosim.com>
>>>>  wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner (with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES + BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles to converge when using two cores instead of one. Because of the adaptive time stepping of the Newton, this leads to severe cuts in time step.
>>>>
>>>> This is how I run it with two cores
>>>>
>>>> mpirun \
>>>>   -n 2 pflotran \
>>>>   -pflotranin het.pflinput \
>>>>   -ksp_monitor_true_residual \
>>>>   -flow_snes_view \
>>>>   -flow_snes_converged_reason \
>>>>   -flow_sub_1_pc_type bjacobi \
>>>>   -flow_sub_1_sub_pc_type lu \
>>>>   -flow_sub_1_sub_pc_factor_pivot_in_blocks true\
>>>>   -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
>>>>   -options_left \
>>>>   -log_summary \
>>>>   -info 
>>>>
>>>>
>>>> With one core I get (after grepping the crap away from -info):
>>>>
>>>>  Step     32 Time=  1.80000E+01 
>>>>
>>>> [...]
>>>>
>>>>   0 2r: 1.58E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 7.18E-03 iu: 0.00E+00 rsn:   0
>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>     Residual norms for flow_ solve.
>>>>     0 KSP unpreconditioned resid norm 1.581814306485e-02 true resid norm 1.581814306485e-02 ||r(i)||/||b|| 1.000000000000e+00
>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>       0 KSP preconditioned resid norm 5.697603110484e+07 true resid norm 5.175721849125e+03 ||r(i)||/||b|| 5.037527476892e+03
>>>>       1 KSP preconditioned resid norm 5.041509073319e+06 true resid norm 3.251596928176e+02 ||r(i)||/||b|| 3.164777657484e+02
>>>>       2 KSP preconditioned resid norm 1.043761838360e+06 true resid norm 8.957519558348e+01 ||r(i)||/||b|| 8.718349288342e+01
>>>>       3 KSP preconditioned resid norm 1.129189815646e+05 true resid norm 2.722436912053e+00 ||r(i)||/||b|| 2.649746479496e+00
>>>>       4 KSP preconditioned resid norm 8.829637298082e+04 true resid norm 8.026373593492e+00 ||r(i)||/||b|| 7.812065388300e+00
>>>>       5 KSP preconditioned resid norm 6.506021637694e+04 true resid norm 3.479889319880e+00 ||r(i)||/||b|| 3.386974527698e+00
>>>>       6 KSP preconditioned resid norm 6.392263200180e+04 true resid norm 3.819202631980e+00 ||r(i)||/||b|| 3.717228003987e+00
>>>>       7 KSP preconditioned resid norm 2.464946645480e+04 true resid norm 7.329964753388e-01 ||r(i)||/||b|| 7.134251013911e-01
>>>>       8 KSP preconditioned resid norm 2.603879153772e+03 true resid norm 2.035525412004e-02 ||r(i)||/||b|| 1.981175861414e-02
>>>>       9 KSP preconditioned resid norm 1.774410462754e+02 true resid norm 3.001214973121e-03 ||r(i)||/||b|| 2.921081026352e-03
>>>>     10 KSP preconditioned resid norm 1.664227038378e+01 true resid norm 3.413136309181e-04 ||r(i)||/||b|| 3.322003855903e-04
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 1.131868956745e+00 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 2.067297386780e+07 at iteration 11
>>>>     11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.148515820410e-14 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 1.581814306485e-02 at iteration 1
>>>>     1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>>>> [0] SNESSolve_NEWTONLS(): iter=0, linear solve iterations=1
>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 3.590873180642e-01 near zero implies inconsistent rhs
>>>> [0] SNESSolve_NEWTONLS(): fnorm=1.5818143064846742e-02, gnorm=1.0695649833687331e-02, ynorm=4.6826522561266171e+02, lssucceed=0
>>>> [0] SNESConvergedDefault(): Converged due to small update length: 4.682652256127e+02 < 1.000000000000e-05 * 3.702480426117e+09
>>>>   1 2r: 1.07E-02 2x: 3.70E+09 2u: 4.68E+02 ir: 5.05E-03 iu: 4.77E+01 rsn: stol
>>>> Nonlinear flow_ solve converged due to CONVERGED_SNORM_RELATIVE iterations 1
>>>>
>>>>
>>>>
>>>> But with two cores I get:
>>>>
>>>>
>>>>  Step     32 Time=  1.80000E+01 
>>>>
>>>> [...]
>>>>
>>>>   0 2r: 6.16E-03 2x: 0.00E+00 2u: 0.00E+00 ir: 3.63E-03 iu: 0.00E+00 rsn:   0
>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>
>>>>     Residual norms for flow_ solve.
>>>>     0 KSP unpreconditioned resid norm 6.162760088924e-03 true resid norm 6.162760088924e-03 ||r(i)||/||b|| 1.000000000000e+00
>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>       0 KSP preconditioned resid norm 8.994949630499e+08 true resid norm 7.982144380936e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>       1 KSP preconditioned resid norm 8.950556502615e+08 true resid norm 1.550138696155e+00 ||r(i)||/||b|| 1.942007839218e+00
>>>>       2 KSP preconditioned resid norm 1.044849684205e+08 true resid norm 2.166193480531e+00 ||r(i)||/||b|| 2.713798920631e+00
>>>>       3 KSP preconditioned resid norm 8.209708619718e+06 true resid norm 3.076045005154e-01 ||r(i)||/||b|| 3.853657436340e-01
>>>>       4 KSP preconditioned resid norm 3.027461352422e+05 true resid norm 1.207731865714e-02 ||r(i)||/||b|| 1.513041869549e-02
>>>>       5 KSP preconditioned resid norm 1.595302164817e+04 true resid norm 4.123713694368e-04 ||r(i)||/||b|| 5.166172769585e-04
>>>>       6 KSP preconditioned resid norm 1.898935810797e+03 true resid norm 8.275885058330e-05 ||r(i)||/||b|| 1.036799719897e-04
>>>>       7 KSP preconditioned resid norm 1.429881682558e+02 true resid norm 4.751240525466e-06 ||r(i)||/||b|| 5.952335987324e-06
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 8.404003313455e+00 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 8.994949630499e+08 at iteration 8
>>>>       8 KSP preconditioned resid norm 8.404003313455e+00 true resid norm 3.841921844578e-07 ||r(i)||/||b|| 4.813145016211e-07
>>>>     1 KSP unpreconditioned resid norm 6.162162548202e-03 true resid norm 6.162162548202e-03 ||r(i)||/||b|| 9.999030400804e-01
>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>       0 KSP preconditioned resid norm 4.360556381209e+07 true resid norm 1.000000245433e+00 ||r(i)||/||b|| 1.000000000000e+00
>>>>       1 KSP preconditioned resid norm 5.385519331932e+06 true resid norm 8.785183939860e-02 ||r(i)||/||b|| 8.785181783689e-02
>>>>       2 KSP preconditioned resid norm 4.728931283459e+05 true resid norm 2.008708805316e-02 ||r(i)||/||b|| 2.008708312313e-02
>>>>       3 KSP preconditioned resid norm 2.734215698319e+04 true resid norm 6.418720397673e-03 ||r(i)||/||b|| 6.418718822309e-03
>>>>       4 KSP preconditioned resid norm 1.002270029334e+04 true resid norm 4.040289515991e-03 ||r(i)||/||b|| 4.040288524372e-03
>>>>       5 KSP preconditioned resid norm 1.321280190971e+03 true resid norm 1.023292238313e-04 ||r(i)||/||b|| 1.023291987163e-04
>>>>       6 KSP preconditioned resid norm 6.594292964815e+01 true resid norm 1.877106733170e-06 ||r(i)||/||b|| 1.877106272467e-06
>>>>       7 KSP preconditioned resid norm 7.816325147216e+00 true resid norm 2.552611664980e-07 ||r(i)||/||b|| 2.552611038486e-07
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 6.391568446109e-01 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 4.360556381209e+07 at iteration 8
>>>>       8 KSP preconditioned resid norm 6.391568446109e-01 true resid norm 1.680724939670e-08 ||r(i)||/||b|| 1.680724527166e-08
>>>>     2 KSP unpreconditioned resid norm 4.328902922753e-07 true resid norm 4.328902922752e-07 ||r(i)||/||b|| 7.024292460341e-05
>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>       0 KSP preconditioned resid norm 8.794597825780e+08 true resid norm 1.000000094566e+00 ||r(i)||/||b|| 1.000000000000e+00
>>>>       1 KSP preconditioned resid norm 8.609906572102e+08 true resid norm 2.965044981249e+00 ||r(i)||/||b|| 2.965044700856e+00
>>>>       2 KSP preconditioned resid norm 9.318108989314e+07 true resid norm 1.881262939380e+00 ||r(i)||/||b|| 1.881262761477e+00
>>>>       3 KSP preconditioned resid norm 6.908723262483e+06 true resid norm 2.639592490398e-01 ||r(i)||/||b|| 2.639592240782e-01
>>>>       4 KSP preconditioned resid norm 2.651677791227e+05 true resid norm 9.736480169584e-03 ||r(i)||/||b|| 9.736479248845e-03
>>>>       5 KSP preconditioned resid norm 1.192178471172e+04 true resid norm 3.082839752692e-04 ||r(i)||/||b|| 3.082839461160e-04
>>>>       6 KSP preconditioned resid norm 1.492201446262e+03 true resid norm 4.633866284506e-05 ||r(i)||/||b|| 4.633865846301e-05
>>>>       7 KSP preconditioned resid norm 1.160670017241e+02 true resid norm 2.821157348522e-06 ||r(i)||/||b|| 2.821157081737e-06
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 6.447568262216e+00 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 8.794597825780e+08 at iteration 8
>>>>       8 KSP preconditioned resid norm 6.447568262216e+00 true resid norm 1.516068561348e-07 ||r(i)||/||b|| 1.516068417980e-07
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 6.135731709822e-15 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 6.162760088924e-03 at iteration 3
>>>>     3 KSP unpreconditioned resid norm 6.135731709822e-15 true resid norm 1.142020328809e-14 ||r(i)||/||b|| 1.853098793933e-12
>>>>
>>>> [0] SNESSolve_NEWTONLS(): iter=0, linear solve iterations=3
>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 1.998388224666e-02 near zero implies inconsistent rhs
>>>> [0] SNESSolve_NEWTONLS(): fnorm=6.1627600889243711e-03, gnorm=1.0406503258190572e-02, ynorm=6.2999025681245366e+04, lssucceed=0  
>>>>   1 2r: 1.04E-02 2x: 3.70E+09 2u: 6.30E+04 ir: 6.54E-03 iu: 5.00E+04 rsn:   0
>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>
>>>>     Residual norms for flow_ solve.
>>>>     0 KSP unpreconditioned resid norm 1.040650325819e-02 true resid norm 1.040650325819e-02 ||r(i)||/||b|| 1.000000000000e+00
>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>       0 KSP preconditioned resid norm 6.758906811264e+07 true resid norm 9.814998431686e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>       1 KSP preconditioned resid norm 2.503922806424e+06 true resid norm 2.275130113021e-01 ||r(i)||/||b|| 2.318013730574e-01
>>>>       2 KSP preconditioned resid norm 3.316753614870e+05 true resid norm 3.820733530238e-02 ||r(i)||/||b|| 3.892750016040e-02
>>>>       3 KSP preconditioned resid norm 2.956751700483e+04 true resid norm 2.143772538677e-03 ||r(i)||/||b|| 2.184180215207e-03
>>>>       4 KSP preconditioned resid norm 1.277067042524e+03 true resid norm 9.093614251311e-05 ||r(i)||/||b|| 9.265018547485e-05
>>>>       5 KSP preconditioned resid norm 1.060996002446e+02 true resid norm 1.042893700050e-05 ||r(i)||/||b|| 1.062551061326e-05
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 5.058127343285e+00 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 6.758906811264e+07 at iteration 6
>>>>       6 KSP preconditioned resid norm 5.058127343285e+00 true resid norm 4.054770602120e-07 ||r(i)||/||b|| 4.131198420807e-07
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 4.449606189225e-10 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 1.040650325819e-02 at iteration 1
>>>>     1 KSP unpreconditioned resid norm 4.449606189225e-10 true resid norm 4.449606189353e-10 ||r(i)||/||b|| 4.275793779098e-08
>>>>
>>>> [0] SNESSolve_NEWTONLS(): iter=1, linear solve iterations=1
>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 4.300066663571e-02 near zero implies inconsistent rhs
>>>> [0] SNESSolve_NEWTONLS(): fnorm=1.0406503258190572e-02, gnorm=7.3566280848133728e-02, ynorm=7.9500485128639993e+04, lssucceed=0
>>>>   2 2r: 7.36E-02 2x: 3.70E+09 2u: 7.95E+04 ir: 4.62E-02 iu: 5.00E+04 rsn:   0
>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>
>>>>     Residual norms for flow_ solve.
>>>>     0 KSP unpreconditioned resid norm 7.356628084813e-02 true resid norm 7.356628084813e-02 ||r(i)||/||b|| 1.000000000000e+00
>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>       0 KSP preconditioned resid norm 7.253424029194e+06 true resid norm 9.647008645250e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>       1 KSP preconditioned resid norm 7.126940190688e+06 true resid norm 1.228009197928e+00 ||r(i)||/||b|| 1.272942984800e+00
>>>>       2 KSP preconditioned resid norm 9.391591432635e+05 true resid norm 7.804929162756e-01 ||r(i)||/||b|| 8.090517433711e-01
>>>>       3 KSP preconditioned resid norm 6.538499674761e+04 true resid norm 5.503467432893e-02 ||r(i)||/||b|| 5.704843475602e-02
>>>>       4 KSP preconditioned resid norm 1.593713396575e+04 true resid norm 8.902701363763e-02 ||r(i)||/||b|| 9.228457951208e-02
>>>>       5 KSP preconditioned resid norm 4.837260621464e+02 true resid norm 2.966772992825e-03 ||r(i)||/||b|| 3.075329464213e-03
>>>>       6 KSP preconditioned resid norm 1.681372767335e+02 true resid norm 5.312467443025e-04 ||r(i)||/||b|| 5.506854651406e-04
>>>>       7 KSP preconditioned resid norm 1.271478850717e+01 true resid norm 2.123810020488e-05 ||r(i)||/||b|| 2.201521838103e-05
>>>>       8 KSP preconditioned resid norm 1.262723712696e+00 true resid norm 1.150572715331e-06 ||r(i)||/||b|| 1.192673042641e-06
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 9.053072585125e-02 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 7.253424029194e+06 at iteration 9
>>>>       9 KSP preconditioned resid norm 9.053072585125e-02 true resid norm 9.475050575058e-08 ||r(i)||/||b|| 9.821749853747e-08
>>>>     1 KSP unpreconditioned resid norm 8.171589173162e-03 true resid norm 8.171589173162e-03 ||r(i)||/||b|| 1.110779161180e-01
>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>       0 KSP preconditioned resid norm 4.345765068989e+07 true resid norm 9.999992231691e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>       1 KSP preconditioned resid norm 5.388715093466e+06 true resid norm 8.125387327699e-02 ||r(i)||/||b|| 8.125393639755e-02
>>>>       2 KSP preconditioned resid norm 4.763725726436e+05 true resid norm 2.464285618036e-02 ||r(i)||/||b|| 2.464287532371e-02
>>>>       3 KSP preconditioned resid norm 2.287746683380e+04 true resid norm 7.224823080100e-03 ||r(i)||/||b|| 7.224828692570e-03
>>>>       4 KSP preconditioned resid norm 4.872858764091e+03 true resid norm 3.972261388893e-03 ||r(i)||/||b|| 3.972264474670e-03
>>>>       5 KSP preconditioned resid norm 8.670449895323e+02 true resid norm 2.359005963873e-04 ||r(i)||/||b|| 2.359007796423e-04
>>>>       6 KSP preconditioned resid norm 4.252589693890e+01 true resid norm 1.471904261226e-06 ||r(i)||/||b|| 1.471905404648e-06
>>>>       7 KSP preconditioned resid norm 5.128476471782e+00 true resid norm 1.643725157865e-07 ||r(i)||/||b|| 1.643726434763e-07
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 4.311901915856e-01 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 4.345765068989e+07 at iteration 8
>>>>       8 KSP preconditioned resid norm 4.311901915856e-01 true resid norm 1.166123921637e-08 ||r(i)||/||b|| 1.166124827519e-08
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.373662391739e-09 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 7.356628084813e-02 at iteration 2
>>>>     2 KSP unpreconditioned resid norm 2.373662391739e-09 true resid norm 2.373662391658e-09 ||r(i)||/||b|| 3.226562990941e-08
>>>>
>>>> [0] SNESSolve_NEWTONLS(): iter=2, linear solve iterations=2
>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 4.343326231305e-02 near zero implies inconsistent rhs
>>>> [0] SNESSolve_NEWTONLS(): fnorm=7.3566280848133728e-02, gnorm=7.2259942496422647e-02, ynorm=6.3156901950486099e+04, lssucceed=0
>>>>   3 2r: 7.23E-02 2x: 3.70E+09 2u: 6.32E+04 ir: 4.52E-02 iu: 5.00E+04 rsn:   0
>>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>>>
>>>>     Residual norms for flow_ solve.
>>>>     0 KSP unpreconditioned resid norm 7.225994249642e-02 true resid norm 7.225994249642e-02 ||r(i)||/||b|| 1.000000000000e+00
>>>>       Residual norms for flow_sub_0_galerkin_ solve.
>>>>       0 KSP preconditioned resid norm 7.705582590638e+05 true resid norm 9.649751442741e-01 ||r(i)||/||b|| 1.000000000000e+00
>>>>       1 KSP preconditioned resid norm 2.444424220392e+04 true resid norm 8.243110200738e-03 ||r(i)||/||b|| 8.542303135630e-03
>>>>       2 KSP preconditioned resid norm 2.080899648412e+03 true resid norm 7.642343147053e-04 ||r(i)||/||b|| 7.919730567570e-04
>>>>       3 KSP preconditioned resid norm 9.911171129874e+02 true resid norm 5.904182179180e-05 ||r(i)||/||b|| 6.118481096859e-05
>>>>       4 KSP preconditioned resid norm 5.258230282482e+02 true resid norm 2.043366677644e-04 ||r(i)||/||b|| 2.117532964210e-04
>>>>       5 KSP preconditioned resid norm 5.522830460456e+01 true resid norm 1.710780366056e-05 ||r(i)||/||b|| 1.772875059225e-05
>>>>       6 KSP preconditioned resid norm 5.922280741715e+00 true resid norm 1.543198740828e-06 ||r(i)||/||b|| 1.599210870855e-06
>>>>       7 KSP preconditioned resid norm 3.339500859115e-01 true resid norm 1.221335666427e-07 ||r(i)||/||b|| 1.265665414984e-07
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 3.329208597672e-02 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 7.705582590638e+05 at iteration 8
>>>>       8 KSP preconditioned resid norm 3.329208597672e-02 true resid norm 9.758240835324e-09 ||r(i)||/||b|| 1.011242713683e-08
>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 2.697128456432e-11 is less than relative tolerance 1.000000000000e-07 times initial right hand side norm 7.225994249642e-02 at iteration 1
>>>>     1 KSP unpreconditioned resid norm 2.697128456432e-11 true resid norm 2.697128457142e-11 ||r(i)||/||b|| 3.732536124389e-10
>>>>
>>>> [0] SNESSolve_NEWTONLS(): iter=3, linear solve iterations=1
>>>> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 4.329227684222e-02 near zero implies inconsistent rhs
>>>> [0] SNESSolve_NEWTONLS(): fnorm=7.2259942496422647e-02, gnorm=5.4435602192925014e-01, ynorm=2.7049750229137400e+04, lssucceed=0
>>>> [0] SNESConvergedDefault(): Converged due to small update length: 2.704975022914e+04 < 1.000000000000e-05 * 3.702469482296e+09
>>>>   4 2r: 5.44E-01 2x: 3.70E+09 2u: 2.70E+04 ir: 3.84E-01 iu: 2.34E+04 rsn: stol
>>>> Nonlinear flow_ solve converged due to CONVERGED_SNORM_RELATIVE iterations 4
>>>>
>>>>
>>>> As the simulation advances this behaviour leads to frequent time step cuts because of 8 subsequently failed Newton iterations, which brings the simulation practically to a halt.
>>>>
>>>> Is the Block Jacobi not a good choice? Better ASM with huge overlap? Or is there something wrong with my RHS? Maybe the SNES, SNESLS, KSP tolerances need better tuning?
>>>>
>>>> Grateful for any clarifying words!
>>>> Robert
>>>>
>>>>
>>>> My SNES_view is:
>>>>
>>>>
>>>> SNES Object: (flow_) 2 MPI processes
>>>>   type: newtonls
>>>>   maximum iterations=8, maximum function evaluations=10000
>>>>   tolerances: relative=1e-05, absolute=1e-05, solution=1e-05
>>>>   total number of linear solver iterations=1
>>>>   total number of function evaluations=2
>>>>   norm schedule ALWAYS
>>>>   SNESLineSearch Object: (flow_) 2 MPI processes
>>>>     type: basic
>>>>     maxstep=1.000000e+08, minlambda=1.000000e-05
>>>>     tolerances: relative=1.000000e-05, absolute=1.000000e-05, lambda=1.000000e-08
>>>>     maximum iterations=40
>>>>     using user-defined precheck step
>>>>   KSP Object: (flow_) 2 MPI processes
>>>>     type: fgmres
>>>>       GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>>>       GMRES: happy breakdown tolerance 1e-30
>>>>     maximum iterations=200, initial guess is zero
>>>>     tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>>>>     right preconditioning
>>>>     using UNPRECONDITIONED norm type for convergence test
>>>>   PC Object: (flow_) 2 MPI processes
>>>>     type: composite
>>>>     Composite PC type - MULTIPLICATIVE
>>>>     PCs on composite preconditioner follow
>>>>     ---------------------------------
>>>>       PC Object: (flow_sub_0_) 2 MPI processes
>>>>         type: galerkin
>>>>         Galerkin PC
>>>>         KSP on Galerkin follow
>>>>         ---------------------------------
>>>>         KSP Object: (flow_sub_0_galerkin_) 2 MPI processes
>>>>           type: gmres
>>>>             GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>>>             GMRES: happy breakdown tolerance 1e-30
>>>>           maximum iterations=200, initial guess is zero
>>>>           tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>>>>           left preconditioning
>>>>           using PRECONDITIONED norm type for convergence test
>>>>         PC Object: (flow_sub_0_galerkin_) 2 MPI processes
>>>>           type: hypre
>>>>             HYPRE BoomerAMG preconditioning
>>>>             HYPRE BoomerAMG: Cycle type V
>>>>             HYPRE BoomerAMG: Maximum number of levels 25
>>>>             HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
>>>>             HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
>>>>             HYPRE BoomerAMG: Threshold for strong coupling 0.25
>>>>             HYPRE BoomerAMG: Interpolation truncation factor 0.
>>>>             HYPRE BoomerAMG: Interpolation: max elements per row 0
>>>>             HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
>>>>             HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
>>>>             HYPRE BoomerAMG: Maximum row sums 0.9
>>>>             HYPRE BoomerAMG: Sweeps down         1
>>>>             HYPRE BoomerAMG: Sweeps up           1
>>>>             HYPRE BoomerAMG: Sweeps on coarse    1
>>>>             HYPRE BoomerAMG: Relax down          symmetric-SOR/Jacobi
>>>>             HYPRE BoomerAMG: Relax up            symmetric-SOR/Jacobi
>>>>             HYPRE BoomerAMG: Relax on coarse     Gaussian-elimination
>>>>             HYPRE BoomerAMG: Relax weight  (all)      1.
>>>>             HYPRE BoomerAMG: Outer relax weight (all) 1.
>>>>             HYPRE BoomerAMG: Using CF-relaxation
>>>>             HYPRE BoomerAMG: Not using more complex smoothers.
>>>>             HYPRE BoomerAMG: Measure type        local
>>>>             HYPRE BoomerAMG: Coarsen type        Falgout
>>>>             HYPRE BoomerAMG: Interpolation type  classical
>>>>           linear system matrix = precond matrix:
>>>>           Mat Object: 2 MPI processes
>>>>             type: mpiaij
>>>>             rows=8000, cols=8000
>>>>             total: nonzeros=53600, allocated nonzeros=53600
>>>>             total number of mallocs used during MatSetValues calls =0
>>>>               not using I-node (on process 0) routines
>>>>         linear system matrix = precond matrix:
>>>>         Mat Object: (flow_) 2 MPI processes
>>>>           type: mpibaij
>>>>           rows=24000, cols=24000, bs=3
>>>>           total: nonzeros=482400, allocated nonzeros=482400
>>>>           total number of mallocs used during MatSetValues calls =0
>>>>       PC Object: (flow_sub_1_) 2 MPI processes
>>>>         type: bjacobi
>>>>           block Jacobi: number of blocks = 2
>>>>           Local solve is same for all blocks, in the following KSP and PC objects:
>>>>         KSP Object: (flow_sub_1_sub_) 1 MPI processes
>>>>           type: preonly
>>>>           maximum iterations=10000, initial guess is zero
>>>>           tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.     <------ not working: -flow_sub_1_sub_ksp_rtol 1e-7
>>>>           left preconditioning
>>>>           using NONE norm type for convergence test
>>>>         PC Object: (flow_sub_1_sub_) 1 MPI processes
>>>>           type: lu
>>>>             out-of-place factorization
>>>>             tolerance for zero pivot 2.22045e-14
>>>>             matrix ordering: nd
>>>>             factor fill ratio given 5., needed 18.3108
>>>>               Factored matrix follows:
>>>>                 Mat Object: 1 MPI processes
>>>>                   type: seqbaij
>>>>                   rows=12000, cols=12000, bs=3
>>>>                   package used to perform factorization: petsc
>>>>                   total: nonzeros=4350654, allocated nonzeros=4350654
>>>>                   total number of mallocs used during MatSetValues calls =0
>>>>                       block size is 3
>>>>           linear system matrix = precond matrix:
>>>>           Mat Object: (flow_) 1 MPI processes
>>>>             type: seqbaij
>>>>             rows=12000, cols=12000, bs=3
>>>>             total: nonzeros=237600, allocated nonzeros=237600
>>>>             total number of mallocs used during MatSetValues calls =0
>>>>                 block size is 3
>>>>         linear system matrix = precond matrix:
>>>>         Mat Object: (flow_) 2 MPI processes
>>>>           type: mpibaij
>>>>           rows=24000, cols=24000, bs=3
>>>>           total: nonzeros=482400, allocated nonzeros=482400
>>>>           total number of mallocs used during MatSetValues calls =0
>>>>     ---------------------------------
>>>>     linear system matrix = precond matrix:
>>>>     Mat Object: (flow_) 2 MPI processes
>>>>       type: mpibaij
>>>>       rows=24000, cols=24000, bs=3
>>>>       total: nonzeros=482400, allocated nonzeros=482400
>>>>       total number of mallocs used during MatSetValues calls =0
>>>>
>> <het_np_1.log><het_np_2.log>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/c1ecc805/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: het_np_1_CPR-AMG.log
Type: text/x-log
Size: 53439 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/c1ecc805/attachment-0005.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: het_np_2_CPR-AMG.log
Type: text/x-log
Size: 144965 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/c1ecc805/attachment-0006.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: het_np_1_PCLU.log
Type: text/x-log
Size: 37801 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/c1ecc805/attachment-0007.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: het_np_1 PCBJACOBI_SUBPCLU.log
Type: text/x-log
Size: 37801 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/c1ecc805/attachment-0008.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: het_np_2_PCBJACOBI_SUBPCLU.log
Type: text/x-log
Size: 38231 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170707/c1ecc805/attachment-0009.bin>
-------------- next part --------------
--------------------------------------------------------------------------------
NP 1 KSPPREONLY PCILU
--------------------------------------------------------------------------------

SNES Object: (flow_) 1 MPI processes
  type: newtonls
  maximum iterations=8, maximum function evaluations=10000
  tolerances: relative=1e-05, absolute=1e-05, solution=1e-05
  total number of linear solver iterations=1
  total number of function evaluations=2
  norm schedule ALWAYS
  SNESLineSearch Object: (flow_) 1 MPI processes
    type: basic
    maxstep=1.000000e+08, minlambda=1.000000e-05
    tolerances: relative=1.000000e-05, absolute=1.000000e-05, lambda=1.000000e-08
    maximum iterations=40
    using user-defined precheck step
  KSP Object: (flow_) 1 MPI processes
    type: preonly
    maximum iterations=200, initial guess is zero
    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
    left preconditioning
    using NONE norm type for convergence test
  PC Object: (flow_) 1 MPI processes
    type: lu
      out-of-place factorization
      tolerance for zero pivot 2.22045e-14
      matrix ordering: nd
      factor fill ratio given 5., needed 23.8274
        Factored matrix follows:
          Mat Object: 1 MPI processes
            type: seqbaij
            rows=24000, cols=24000, bs=3
            package used to perform factorization: petsc
            total: nonzeros=11494314, allocated nonzeros=11494314
            total number of mallocs used during MatSetValues calls =0
                block size is 3
    linear system matrix = precond matrix:
    Mat Object: (flow_) 1 MPI processes
      type: seqbaij
      rows=24000, cols=24000, bs=3
      total: nonzeros=482400, allocated nonzeros=482400
      total number of mallocs used during MatSetValues calls =0
          block size is 3



--------------------------------------------------------------------------------
NP 1 KSPREONLY PCBJACOBI SUBPCLU
--------------------------------------------------------------------------------

SNES Object: (flow_) 1 MPI processes
  type: newtonls
  maximum iterations=8, maximum function evaluations=10000
  tolerances: relative=1e-05, absolute=1e-05, solution=1e-05
  total number of linear solver iterations=1
  total number of function evaluations=2
  norm schedule ALWAYS
  SNESLineSearch Object: (flow_) 1 MPI processes
    type: basic
    maxstep=1.000000e+08, minlambda=1.000000e-05
    tolerances: relative=1.000000e-05, absolute=1.000000e-05, lambda=1.000000e-08
    maximum iterations=40
    using user-defined precheck step
  KSP Object: (flow_) 1 MPI processes
    type: preonly
    maximum iterations=200, initial guess is zero
    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
    left preconditioning
    using NONE norm type for convergence test
  PC Object: (flow_) 1 MPI processes
    type: bjacobi
      block Jacobi: number of blocks = 1
      Local solve is same for all blocks, in the following KSP and PC objects:
      KSP Object: (flow_sub_) 1 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object: (flow_sub_) 1 MPI processes
        type: lu
          out-of-place factorization
          tolerance for zero pivot 2.22045e-14
          matrix ordering: nd
          factor fill ratio given 5., needed 23.8274
            Factored matrix follows:
              Mat Object: 1 MPI processes
                type: seqbaij
                rows=24000, cols=24000, bs=3
                package used to perform factorization: petsc
                total: nonzeros=11494314, allocated nonzeros=11494314
                total number of mallocs used during MatSetValues calls =0
                    block size is 3
        linear system matrix = precond matrix:
        Mat Object: (flow_) 1 MPI processes
          type: seqbaij
          rows=24000, cols=24000, bs=3
          total: nonzeros=482400, allocated nonzeros=482400
          total number of mallocs used during MatSetValues calls =0
              block size is 3
    linear system matrix = precond matrix:
    Mat Object: (flow_) 1 MPI processes
      type: seqbaij
      rows=24000, cols=24000, bs=3
      total: nonzeros=482400, allocated nonzeros=482400
      total number of mallocs used during MatSetValues calls =0
          block size is 3



--------------------------------------------------------------------------------
NP 2 KSPREONLY PCBJACOBI SUBPCLU
--------------------------------------------------------------------------------

SNES Object: (flow_) 2 MPI processes
  type: newtonls
  maximum iterations=8, maximum function evaluations=10000
  tolerances: relative=1e-05, absolute=1e-05, solution=1e-05
  total number of linear solver iterations=1
  total number of function evaluations=2
  norm schedule ALWAYS
  SNESLineSearch Object: (flow_) 2 MPI processes
    type: basic
    maxstep=1.000000e+08, minlambda=1.000000e-05
    tolerances: relative=1.000000e-05, absolute=1.000000e-05, lambda=1.000000e-08
    maximum iterations=40
    using user-defined precheck step
  KSP Object: (flow_) 2 MPI processes
    type: preonly
    maximum iterations=200, initial guess is zero
    tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
    left preconditioning
    using NONE norm type for convergence test
  PC Object: (flow_) 2 MPI processes
    type: bjacobi
      block Jacobi: number of blocks = 2
      Local solve is same for all blocks, in the following KSP and PC objects:
    KSP Object: (flow_sub_) 1 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
      left preconditioning
      using NONE norm type for convergence test
    PC Object: (flow_sub_) 1 MPI processes
      type: lu
        out-of-place factorization
        tolerance for zero pivot 2.22045e-14
        matrix ordering: nd
        factor fill ratio given 5., needed 18.3108
          Factored matrix follows:
            Mat Object: 1 MPI processes
              type: seqbaij
              rows=12000, cols=12000, bs=3
              package used to perform factorization: petsc
              total: nonzeros=4350654, allocated nonzeros=4350654
              total number of mallocs used during MatSetValues calls =0
                  block size is 3
      linear system matrix = precond matrix:
      Mat Object: (flow_) 1 MPI processes
        type: seqbaij
        rows=12000, cols=12000, bs=3
        total: nonzeros=237600, allocated nonzeros=237600
        total number of mallocs used during MatSetValues calls =0
            block size is 3
    linear system matrix = precond matrix:
    Mat Object: (flow_) 2 MPI processes
      type: mpibaij
      rows=24000, cols=24000, bs=3
      total: nonzeros=482400, allocated nonzeros=482400
      total number of mallocs used during MatSetValues calls =0


More information about the petsc-users mailing list