# [petsc-users] Using petsc for banded matrices and 2D finite differences

Brandt Belson bbelson at princeton.edu
Tue Nov 15 14:49:21 CST 2011

```Thanks for the replies.
Sorry, I was mistaken about the dt/Re coefficient. I use an implicit time
step on the linear part of incompressible Navier-Stokes, so roughly
discretizing du/dt = 1/Re * Lu with implicit Euler for simplicity gives:

(u^{n+1} - u^n) / dt = 1/Re L u^{n+1}
rearranging:
(I - dt/Re L) u^{n+1} = u^n

For tri-diagonal matrices, I believe the direct solve is O(N), and similar
methods exist for block tri-diagonal matrices. I know multigrid is also
O(N), but I've heard it tends to be slower.

I think the bandwidth (b) of the matrices is small. For example for a
5-point stencil (center, left, right, below, above) the bandwidth is 5 and
for a 9-point stencil it is 9.

The other limiting factor of doing option 1, with the all-to-all, is the
number of nodes is limited by nz. That might be ok, but I would like the
code to scale to larger grids in the future.

Thanks,
Brandt

On Tue, Nov 15, 2011 at 1:12 PM, Jack Poulson <jack.poulson at gmail.com>wrote:

> Brandt,
>
> On Tue, Nov 15, 2011 at 11:12 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>
>> Jack Poulson (Cc'd) may have some advice because he has been doing this
>> for high frequency Helmholtz.
>>
>
> > The matrix is block tri/penta-diagonal, depending on the stencil, and
> the blocks are also
> > tri/penta-diagonal. Correct me if I'm wrong, but I believe these types
> of matrix equations can be
> > solved directly and cheaply on one node.
>
> These solves should be cheap, but since the bandwidth is not small, it
> makes more sense to use a sparse-direct solver (e.g., MUMPS or SuperLU).
> The complexity of banded factorizations for matrices of size N x N with
> bandwidth of size b is O(b^2 N), and the memory and solve complexity is
> O(bN). For 2d sparse-direct the factorization complexity is O(N^{3/2}) and
> the memory and solve complexities are O(N log(N)). Thus, when the bandwidth
> is larger than O(N^{1/4}), it makes sense to consider sparse direct in
> order to make the factorization cheaper. I suspect that in your case,
> b=sqrt(N).
>
> > The two options I'm comparing are:
> > 1. Distribute the data in z and solve x-y plane matrices with LAPACK or
> other serial or shared-
> > memory libraries. Then do an MPI all-to-all to distribute the data in x
> and/or y, and do all
> > computations in z. This method allows all calculations to be done with
> all necessary data
> > available to one node, so serial or shared-memory algorithms can be
> > are that the MPI all-to-all can be expensive and the number of nodes is
> limited by the number of
> > points in the z direction.
> >
> > 2. Distribute the data in x-y only and use petsc to do matrix solves in
> the x-y plane across
> > nodes. The data would always be contiguous in z. The possible
> disadvantage is that the x-y
> > plane matrix solves could be slower. However, there is no need for an
> all-to-all and the number
> >  of nodes is roughly limited by nx*ny instead of nz.
> >
> > The size of the grid will be about 1000 x 100 x 500 in x, y, and z, so
> > 100,000 x 100,000, but the grid size could vary.
>
> I would look into a modification of approach 1, where, if you have p
> processes and an nx x ny x nz grid, you use p/nz processes for each xy
> plane solve simultaneously, and then to perform the AllToAll communication
> to rearrange the solutions. I think that you are overestimating the cost of
> the AllToAll communication.
>
> Jack
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111115/8ec7ae30/attachment-0001.htm>
```