<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Let me first of all explain the problem I'm considering a bit more detailed. I'm working on two-phase flow problems in the low Reynolds number regime (laminar flow). The flow field is described by the incompressible Navier-Stokes equations and the phase interface is tracked implicitly using the level-set method. This leads to a strongly coupled problem of the flow field and the level-set field. That is, during one time step the Navier-Stokes equations are solved in a series of Picard iterations and subsequently the interface (level-set field) is advected in the flow field. Those 'two' steps are carried out until the fluid and the level-set field are converged. A typical output of my current testcase for one time step looks like that (showing the relative norm of the solution vector and the number of solver iterations):</div><div><br></div><div><br></div><div><div><div> KSP Iterations: 170</div><div> Picard iteration step 1: 1.000000e+00</div><div> KSP Iterations: 151</div><div> Picard iteration step 2: 6.972740e-07</div><div> KSP Iterations: 4</div><div> Level-set iteration step 1: 2.619094e-06</div><div> KSP Iterations: 166</div><div> Picard iteration step 1: 1.124124e-06</div><div> KSP Iterations: 4</div><div> Level-set iteration step 2: 5.252072e-11</div><div>Time step 1 of 1, time: 0.005000</div></div></div><div><br></div><div>Excuse me for not mentioning it in the first place. The log_summary output on it's own may be misleading. For comparison I think one should probably concentrate on the iteration counts for one Picard iteration only.</div><div><br></div><div>The problem is discretized using FEM (more precisely XFEM) with stabilized, trilinear hexahedral elements. As the XFEM approximation space is time-dependant as well as the physical properties at the nodes the resulting system may change quite significantly between time steps. Furthermore, the system matrix tends to be ill-conditioned which can luckily be greatly imporved using a diagonal scaling.</div><div><br></div><br><div><div>On 12.05.2011, at 16:02, Jed Brown wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div class="gmail_quote">On Thu, May 12, 2011 at 15:41, Henning Sauerland <span dir="ltr"><<a href="mailto:uerland@gmail.com">uerland@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Applying -sub_pc_type lu helped a lot in 2D, but in 3D apart from reducing the number of iterations the whole solution takes more than 10 times longer. </blockquote><div><br></div><div>Does -sub_pc_type ilu -sub_pc_factor_levels 2 (default is 0) help relative to the default? Direct subdomain solves in 3D are very expensive. How much does the system change between time steps?</div></div></blockquote><div>ILU(2) requires less than half the number of KSP iterations, but it scales similar to ILU(0) and requires about 1/3 more time.</div><br><blockquote type="cite"><div class="gmail_quote">
<div><br></div><div>What "CFD" formulation is this (physics, discretization) and what regime (Reynolds and Mach numbers, etc)?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
I attached the log_summary output for a problem with about 240000 unkowns (1 time step) using 4, 8 and 16 Intel Xeon E5450 processors (InfiniBand-connected). As far as I see the number of iterations seems to be the major issue here or am I missing something?</blockquote>
<div><br></div><div>Needing more iterations is the algorithmic part of the problem, </div></div></blockquote>I guess you are talking about the nonlinear iterations? I was always referring to the KSP iterations and I thought that the ksp iteration count grows with increasing number of processors is more or less solely related to the iterative solver and preconditioner.<br><br><blockquote type="cite"><div class="gmail_quote"><div>but the relative cost of orthogonaliztaion is going up. You may want to see if the iteration count can stay reasonable with -ksp_type ibcgs. If this works algorithmically, it may ease the pain. Beyond that, the algorithmic scaling needs to be improved. How does the iteration count scale if you use a direct solver? (I acknowledge that it is not practical, but it provides some insight towards the underlying problem.)</div>
</div>
</blockquote>ibcgs is slightly faster, requiring less number of ksp iterations compared to lgmres. Unfortunately, the iteration count scales very similar to lgmres and generally the lack of robustness of bcgs solvers turns out to problematic for tougher testcases in my experience.</div><div><br></div><div><br></div><div>Thanks</div><div>Henning</div><div><br><br></div><br></body></html>