<div class="gmail_quote">On Tue, Nov 15, 2011 at 14:49, Brandt Belson <span dir="ltr">&lt;<a href="mailto:bbelson@princeton.edu">bbelson@princeton.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div>Sorry, I was mistaken about the dt/Re coefficient. I use an implicit time step on the linear part of incompressible Navier-Stokes, so roughly discretizing du/dt = 1/Re * Lu with implicit Euler for simplicity gives:</div>


<div><br></div><div>(u^{n+1} - u^n) / dt = 1/Re L u^{n+1}</div><div>rearranging:</div><div>(I - dt/Re L) u^{n+1} = u^n<br><br>I incorrectly had dt/Re on I instead of L before.</div></blockquote><div><br></div><div>Right. And you are treating the (nonlinear) convective term explicitly?</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><br></div><div>For tri-diagonal matrices, I believe the direct solve is O(N), and similar methods exist for block tri-diagonal matrices. I know multigrid is also O(N), but I&#39;ve heard it tends to be slower.</div>

</blockquote><div><br></div><div>The &quot;block tri-diagonal&quot; matrices have to deal with fill. They are not O(N) because the bandwidth b=sqrt(N), so you need b*N storage and b^2*N time (as Jack said). At high Reynolds number and short time steps (high resolution), your system is pretty well-conditioned because it is mostly identity. You should be able to solve it very fast using iterative methods (perhaps with a lightweight multigrid or other preconditioner).</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div><br></div><div>I think the bandwidth (b) of the matrices is small. For example for a 5-point stencil (center, left, right, below, above) the bandwidth is 5 and for a 9-point stencil it is 9.</div></blockquote><div><br>

</div><div>No, if the neighbors in the x-direction are &quot;close&quot;, then the neighbors in the y-direction will be &quot;far&quot; (order n=sqrt(N) for an n*n grid).</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div><br></div><div>


The other limiting factor of doing option 1, with the all-to-all, is the number of nodes is limited by nz. That might be ok, but I would like the code to scale to larger grids in the future. </div><div></div></blockquote>

</div><br><div>Hence Jack&#39;s hybrid suggestion of using process blocks to handle chunks. As usual with parallel programming, choose a distribution and write your code, but as you write it, keep in mind that you will likely change the distribution later (e.g. don&#39;t hard-code the bounds for &quot;local&quot; computations).</div>