<div class="gmail_quote">On Thu, May 12, 2011 at 15:41, Henning Sauerland <span dir="ltr"><<a href="mailto:uerland@gmail.com">uerland@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Applying -sub_pc_type lu helped a lot in 2D, but in 3D apart from reducing the number of iterations the whole solution takes more than 10 times longer. </blockquote><div><br></div><div>Does -sub_pc_type ilu -sub_pc_factor_levels 2 (default is 0) help relative to the default? Direct subdomain solves in 3D are very expensive. How much does the system change between time steps?</div>
<div><br></div><div>What "CFD" formulation is this (physics, discretization) and what regime (Reynolds and Mach numbers, etc)?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
I attached the log_summary output for a problem with about 240000 unkowns (1 time step) using 4, 8 and 16 Intel Xeon E5450 processors (InfiniBand-connected). As far as I see the number of iterations seems to be the major issue here or am I missing something?</blockquote>
<div><br></div><div>Needing more iterations is the algorithmic part of the problem, but the relative cost of orthogonaliztaion is going up. You may want to see if the iteration count can stay reasonable with -ksp_type ibcgs. If this works algorithmically, it may ease the pain. Beyond that, the algorithmic scaling needs to be improved. How does the iteration count scale if you use a direct solver? (I acknowledge that it is not practical, but it provides some insight towards the underlying problem.)</div>
</div>