[petsc-dev] ksp_error_if_not_converged in multilevel solvers

Mon Oct 21 10:58:54 CDT 2019

> On Oct 21, 2019, at 12:55 AM, Pierre Jolivet <pierre.jolivet at enseeiht.fr> wrote:
> 
> 
> 
> On Oct 20, 2019, at 6:07 PM, "Smith, Barry F." <bsmith at mcs.anl.gov> wrote:
> 
>> 
>>  The reason the code works this way is that normally -ksp_error_if_not_converged is propagated into the inner (and innerer) solves and normally it is desirable that these inner solves do not error simply because they reach the maximum number of iterations since for nested iterative methods generally we don't need or care if the inner solves "converge". 
> 
> I fully agree with you on the last part of the above sentence. Thus, this makes me question the first part (which I wasn't aware of): why is error_if_not_converged being propagated to inner solves?

  Because the idea is to catch the problem (and the location in the stack trace the very first time it occurs rather than delay until that information is lost up at the top level.)

  Jed provides a fine example if this. Say my coarse grid solve is CG and as it iterates it finds the problem is indefinite. With -ksp_error_if_not_converged I want it to stop there immediately. Or if somewhere inside I have an LU solver, if it finds a zero pivot I want the code to stop right there and give me the full stack trace. Stopping all the way at the top in the KSPSolve() tells me nothing useful to debug the problem of a five level nested LU solver failure.

   The diverged its is a special case that is annoying, I agree, I proposed a couple of (not great) solutions to that specific problem. But the solution is not to toss the baby out with the bath water. The propagation is a very good thing we just need to find a good way to allow overriding the diverged its situation.

   Barry

> I'm sure there are good usages, but if one cares that ksp_1 (which depends on ksp_2) converges, why should an error be thrown if ksp_2 does not converge as long as ksp_1 does (I guess this goes along your last paragraph)?
> 
> Thanks,
> Pierre
> 
>>   Of course, in your particular use case this backfires. 
>> 
>>   I'm not sure what the best design fix is. It seems anything will complicate the code making it harder to maintain. 
>> 
>>   Possible changes:
>>      * if the error_if_not_converged is set directly on a KSP then also cause it to error on maximum iterations, but not if it is propagated into a KSP. This would require tracking how the error if not converged got set
>>      * make error_if_not_converged an enum with, for example, the value __always__ causing the error to be generated even on maximum iterations. This is easier on the code then above 
>> 
>>   A larger more far-reaching change would be to change the current model and have the "inner" KSP use CONVERGED_ITERATING instead of DIVERGED_ITERATING when it doesn't matter if they converge or not: KSPSetReachMaximumIterationsConverged(), then the outer KSP would call this on the inner KSP at construction time. This is appealing from code clarity point of view since now the inner solves return a negative (diverged) reason when that is acceptable and that is confusing, to return a diverged reason while everything is fine. For the given use case one would need to do -pc_bddc_coarse_ksp_error_if_not_converged  -pc_bddc_coarse_ksp_reach_maximum_iterations_converged false but this is cumbersome, who would remember the second option.
>> 
>>  I'm not really excited by any of my proposed solutions.
>> 
>> Thoughts?
>> 
>> 
>> Barry
>> 
>> 
>> 
>> 
>>> On Oct 20, 2019, at 7:55 AM, Pierre Jolivet via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
>>> 
>>> Hello,
>>> I’m trying to get multilevel solvers to error out when coarse levels are not converging, but I’m failing…
>>> Could someone either tell me if this is not possible to do so, or help me find what the problem in my options is, please?
>>> (in src/ksp/ksp/examples/tutorials)
>>> $ mpirun -n 8 ./ex71 -pde_type Elasticity -cells 7,9,8 -dim 3 -pc_bddc_monolithic -pc_bddc_use_faces -pc_bddc_coarse_pc_type none -pc_bddc_coarse_ksp_type gmres -pc_bddc_coarse_ksp_converged_reason -pc_bddc_coarse_ksp_error_if_not_converged -ksp_converged_reason
>>> Linear pc_bddc_coarse_ solve did not converge due to DIVERGED_ITS iterations 1 <- I want to error out here
>>> […]
>>> Linear pc_bddc_coarse_ solve did not converge due to DIVERGED_ITS iterations 1
>>> Linear solve converged due to CONVERGED_RTOL iterations 31
>>> $ mpirun -np 1 ./ex1 -ksp_type gmres -pc_type gamg -ksp_converged_reason -mg_coarse_pc_type lu -mg_coarse_ksp_max_it 5 -mg_coarse_ksp_type gmres -ksp_type fgmres -mg_levels_1_ksp_type fgmres -mg_coarse_ksp_error_if_not_converged -mg_coarse_ksp_converged_reason
>>>   Linear mg_coarse_ solve did not converge due to DIVERGED_ITS iterations 5  <- I want to error out here
>>> […]
>>>   Linear mg_coarse_ solve did not converge due to DIVERGED_ITS iterations 5
>>> Linear solve converged due to CONVERGED_RTOL iterations 3
>>> 
>>> This is probably linked with this SETERRQ1 https://www.mcs.anl.gov/petsc/petsc-dev/src/ksp/ksp/interface/itfunc.c.html#line836 and the condition ksp->reason != KSP_DIVERGED_ITS.
>>> If one just wants to run a fixed number of iterations, not checking for convergence, why would one set ksp->errorifnotconverged to true?
>>> 
>>> Thanks,
>>> Pierre
>>