Thanks for getting back to me.<div>The matrix solves in each x-y plane are linear. The matrices depend on the z wavenumber and so are different at each x-y slice. The equations are basically Helmholtz and Poisson type. They are 3D, but when done in Fourier space, they decouple so each x-y plane can be solved independently.</div>
<div><br></div><div>I'd like to run on a few hundred processors, but if possible I'd like it to scale to more processors for higher Re. I agree that keeping the z-dimension data local is beneficial for FFTs.</div>
<div><br></div><div>Thanks,</div><div>Brandt</div><div><br></div><div><br><div class="gmail_quote">On Tue, Nov 15, 2011 at 10:57 AM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><div></div><div class="h5"><div class="gmail_quote">On Tue, Nov 15, 2011 at 09:43, Brandt Belson <span dir="ltr"><<a href="mailto:bbelson@princeton.edu" target="_blank">bbelson@princeton.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>I'm writing a 3D incompressible fluids solver for transitional and turbulent boundary layers, and would like to make use of petsc if possible. At each time-step I'll need to solve matrix equations arising from finite differences in two dimensions (x and y) on a structured grid. The matrix is block tri/penta-diagonal, depending on the stencil, and the blocks are also tri/penta-diagonal. Correct me if I'm wrong, but I believe these types of matrix equations can be solved directly and cheaply on one node. </div>
<div><br></div><div>The two options I'm comparing are:</div><div>1. Distribute the data in z and solve x-y plane matrices with LAPACK or other serial or shared-memory libraries. Then do an MPI all-to-all to distribute the data in x and/or y, and do all computations in z. This method allows all calculations to be done with all necessary data available to one node, so serial or shared-memory algorithms can be used. The disadvantages are that the MPI all-to-all can be expensive and the number of nodes is limited by the number of points in the z direction.</div>
<div><br></div><div>2. Distribute the data in x-y only and use petsc to do matrix solves in the x-y plane across nodes. The data would always be contiguous in z. The possible disadvantage is that the x-y plane matrix solves could be slower. However, there is no need for an all-to-all and the number of nodes is roughly limited by nx*ny instead of nz.</div>
<div><br></div><div>The size of the grid will be about 1000 x 100 x 500 in x, y, and z, so matrices would be about 100,000 x 100,000, but the grid size could vary. </div><div><br></div><div>For anyone interested, the derivatives in x and y are done with compact finite differences, and in z with discrete Fourier transforms. I also hope to make use petsc4py and python.</div>
<div></div></blockquote></div><br></div></div><div>Are the problems in the x-y planes linear or nonlinear? Are the coefficients the same in each level? What equations are being solved in this direction? (You can do much better than "banded" solvers, ala LAPACK, for these plane problems, but the best methods will depend on the problem. Fortunately, you can don't have to change the code to change the method.)</div>
<div><br></div><div>About how many processes would you like to run this problem size on? Since the Fourier direction is densely coupled, it would be convenient to keep it local, but it's probably worth partitioning if your desired subdomain sizes get very small.</div>
</blockquote></div><br></div>