[petsc-users] Using petsc for banded matrices and 2D finite differences

Tue Nov 15 12:12:06 CST 2011

Brandt,

On Tue, Nov 15, 2011 at 11:12 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> Jack Poulson (Cc'd) may have some advice because he has been doing this
> for high frequency Helmholtz.
>

> The matrix is block tri/penta-diagonal, depending on the stencil, and the
blocks are also
> tri/penta-diagonal. Correct me if I'm wrong, but I believe these types of
matrix equations can be
> solved directly and cheaply on one node.

These solves should be cheap, but since the bandwidth is not small, it
makes more sense to use a sparse-direct solver (e.g., MUMPS or SuperLU).
The complexity of banded factorizations for matrices of size N x N with
bandwidth of size b is O(b^2 N), and the memory and solve complexity is
O(bN). For 2d sparse-direct the factorization complexity is O(N^{3/2}) and
the memory and solve complexities are O(N log(N)). Thus, when the bandwidth
is larger than O(N^{1/4}), it makes sense to consider sparse direct in
order to make the factorization cheaper. I suspect that in your case,
b=sqrt(N).

> The two options I'm comparing are:
> 1. Distribute the data in z and solve x-y plane matrices with LAPACK or
other serial or shared-
> memory libraries. Then do an MPI all-to-all to distribute the data in x
and/or y, and do all
> computations in z. This method allows all calculations to be done with
all necessary data
> available to one node, so serial or shared-memory algorithms can be used.
The disadvantages
> are that the MPI all-to-all can be expensive and the number of nodes is
limited by the number of
> points in the z direction.
>
> 2. Distribute the data in x-y only and use petsc to do matrix solves in
the x-y plane across
> nodes. The data would always be contiguous in z. The possible
disadvantage is that the x-y
> plane matrix solves could be slower. However, there is no need for an
all-to-all and the number
>  of nodes is roughly limited by nx*ny instead of nz.
>
> The size of the grid will be about 1000 x 100 x 500 in x, y, and z, so
matrices would be about
> 100,000 x 100,000, but the grid size could vary.

I would look into a modification of approach 1, where, if you have p
processes and an nx x ny x nz grid, you use p/nz processes for each xy
plane solve simultaneously, and then to perform the AllToAll communication
to rearrange the solutions. I think that you are overestimating the cost of
the AllToAll communication.

Jack
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111115/f5741e7b/attachment.htm>