On Mon, Apr 19, 2010 at 6:29 AM,  <span dir="ltr">&lt;<a href="mailto:tribur@vision.ee.ethz.ch">tribur@vision.ee.ethz.ch</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hi Jed,<br>

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

ML works now using, e.g., -mg_coarse_redundant_pc_factor_shift_type<br>

POSITIVE_DEFINITE. However, it converges very slowly using the default<br>

REDUNDANT for the coarse solve.<br>

</blockquote>

<br>

&quot;Converges slowly&quot; or &quot;the coarse-level solve is expensive&quot;?<br>

</blockquote>

<br>

hm, rather &quot;converges slowly&quot;. Using ML inside a preconditioner for the Schur complement system, the overall outer system preconditioned with the approximated Schur complement preconditioner converges slowly, if you understand what I mean.<br>


<br>

My particular problem is that the convergence rate depends strongly on the number of processors. In case of one processor, using ML for preconditioning the deeply inner system the outer system converges in, e.g., 39 iterations. In case of np=10, however, it needs 69 iterations.<br>

</blockquote><div><br></div><div>For Schur complement methods, the inner system usually has to be solved very accurately.</div><div>Are you accelerating a Krylov method for A^{-1}, or just using ML itself? I would expect for</div>

<div>the same linear system tolerance, you get identical convergence for the same system,</div><div>independent of the number of processors.</div><div><br></div><div>   Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


This number of iterations is independent on the number of processes using HYPRE (at least if np&lt;80), but the latter is (applied to this inner system, not generally) slower and scales very badly. That&#39;s why I would like to use ML.<br>


<br>

Thinking about it, all this shouldn&#39;t have to do anything with the choice of the direct solver of the coarse system inside ML (mumps or petsc-own), should it? The direct solver solves completely, independently from the number of processes, and shouldn&#39;t have an influence on the effectiveness of ML, or am I wrong?<br>


<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I suggest<br>

starting with<br>

<br>

-mg_coarse_pc_type lu -mg_coarse_pc_factor_mat_solver_package mumps<br>

<br>

or varying parameters in ML to see if you can make the coarse level<br>

problem smaller without hurting convergence rate.  You can do<br>

semi-redundant solves if you scale processor counts beyond what MUMPS<br>

works well with.<br>

</blockquote>

<br>

Thanks. Thus, MUMPS is supposed to be the usually fastest parallel direct solver?<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Depending on what problem you are solving, ML could be producing a<br>

(nearly) singular coarse level operator in which case you can expect<br>

very confusing and inconsistent behavior.<br>

</blockquote>

<br>

Could it also be the reason for the decreased convergence rate when increasing from 1 to 10 processors? Even if the equation system remains the same?<br>

<br>

<br>

Thanks a lot,<br>

<br>

Kathrin<br>

<br>

<br>

</blockquote></div><br><br clear="all"><br>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener<br>