[petsc-users] ML and -pc_factor_shift_nonzero

Mon Apr 19 06:29:40 CDT 2010

Hi Jed,

>> ML works now using, e.g., -mg_coarse_redundant_pc_factor_shift_type
>> POSITIVE_DEFINITE. However, it converges very slowly using the default
>> REDUNDANT for the coarse solve.
>
> "Converges slowly" or "the coarse-level solve is expensive"?

hm, rather "converges slowly". Using ML inside a preconditioner for  
the Schur complement system, the overall outer system preconditioned  
with the approximated Schur complement preconditioner converges  
slowly, if you understand what I mean.

My particular problem is that the convergence rate depends strongly on  
the number of processors. In case of one processor, using ML for  
preconditioning the deeply inner system the outer system converges in,  
e.g., 39 iterations. In case of np=10, however, it needs 69 iterations.

This number of iterations is independent on the number of processes  
using HYPRE (at least if np<80), but the latter is (applied to this  
inner system, not generally) slower and scales very badly. That's why  
I would like to use ML.

Thinking about it, all this shouldn't have to do anything with the  
choice of the direct solver of the coarse system inside ML (mumps or  
petsc-own), should it? The direct solver solves completely,  
independently from the number of processes, and shouldn't have an  
influence on the effectiveness of ML, or am I wrong?

> I suggest
> starting with
>
> -mg_coarse_pc_type lu -mg_coarse_pc_factor_mat_solver_package mumps
>
> or varying parameters in ML to see if you can make the coarse level
> problem smaller without hurting convergence rate.  You can do
> semi-redundant solves if you scale processor counts beyond what MUMPS
> works well with.

Thanks. Thus, MUMPS is supposed to be the usually fastest parallel  
direct solver?

> Depending on what problem you are solving, ML could be producing a
> (nearly) singular coarse level operator in which case you can expect
> very confusing and inconsistent behavior.

Could it also be the reason for the decreased convergence rate when  
increasing from 1 to 10 processors? Even if the equation system  
remains the same?

Thanks a lot,

Kathrin