<div>Yes, I&#39;m treating the nonlinear term explicitly.</div><div><br></div><div>Ok, I now see what you mean about b=sqrt(N). I misunderstood the definition of bandwidth. I guess with petsc it is easy to experiment with multigrid vs sparse direct solvers like Jack mentioned, so maybe I&#39;ll try both.</div>


<div><br></div><div>I see what you and Jack mean about using blocks of processes for xy planes, transposing the data, and using multiple processes on each z. I&#39;ll code things flexibly so I can change how the data is distributed.</div>


<div><br></div><div>I&#39;ll probably have more questions as I get deeper, but I think I&#39;m sold on using petsc. Thanks for your quick replies and guidance!</div><div>Brandt</div><div><br><br><div class="gmail_quote">

On Tue, Nov 15, 2011 at 3:59 PM, Jed Brown <span dir="ltr">&lt;<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="gmail_quote"><div class="im">On Tue, Nov 15, 2011 at 14:49, Brandt Belson <span dir="ltr">&lt;<a href="mailto:bbelson@princeton.edu" target="_blank">bbelson@princeton.edu</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>Sorry, I was mistaken about the dt/Re coefficient. I use an implicit time step on the linear part of incompressible Navier-Stokes, so roughly discretizing du/dt = 1/Re * Lu with implicit Euler for simplicity gives:</div>


<div><br></div><div>(u^{n+1} - u^n) / dt = 1/Re L u^{n+1}</div><div>rearranging:</div><div>(I - dt/Re L) u^{n+1} = u^n<br><br>I incorrectly had dt/Re on I instead of L before.</div></blockquote><div><br></div></div><div>


Right. And you are treating the (nonlinear) convective term explicitly?</div><div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><br></div><div>For tri-diagonal matrices, I believe the direct solve is O(N), and similar methods exist for block tri-diagonal matrices. I know multigrid is also O(N), but I&#39;ve heard it tends to be slower.</div>


</blockquote><div><br></div></div><div>The &quot;block tri-diagonal&quot; matrices have to deal with fill. They are not O(N) because the bandwidth b=sqrt(N), so you need b*N storage and b^2*N time (as Jack said). At high Reynolds number and short time steps (high resolution), your system is pretty well-conditioned because it is mostly identity. You should be able to solve it very fast using iterative methods (perhaps with a lightweight multigrid or other preconditioner).</div>


<div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div><br></div><div>I think the bandwidth (b) of the matrices is small. For example for a 5-point stencil (center, left, right, below, above) the bandwidth is 5 and for a 9-point stencil it is 9.</div></blockquote><div><br>


</div></div><div>No, if the neighbors in the x-direction are &quot;close&quot;, then the neighbors in the y-direction will be &quot;far&quot; (order n=sqrt(N) for an n*n grid).</div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div><br></div><div>


The other limiting factor of doing option 1, with the all-to-all, is the number of nodes is limited by nz. That might be ok, but I would like the code to scale to larger grids in the future. </div><div></div></blockquote>


</div></div><br><div>Hence Jack&#39;s hybrid suggestion of using process blocks to handle chunks. As usual with parallel programming, choose a distribution and write your code, but as you write it, keep in mind that you will likely change the distribution later (e.g. don&#39;t hard-code the bounds for &quot;local&quot; computations).</div>


</blockquote></div><br></div>