[petsc-users] Storage space for symmetric (SBAIJ) matrix

Jed Brown jed at 59A2.org
Fri Sep 17 11:51:15 CDT 2010


On Fri, Sep 17, 2010 at 13:15, Daniel Langr <daniel.langr at gmail.com> wrote:
> Matrix Object:
>  type=mpisbaij, rows=10000, cols=10000
>  total: nonzeros=19999, allocated nonzeros=69990
>    [0] Local rows 10000 nz 10000 nz alloced 10000 bs 1 mem 254496
>    [0] on-diagonal part: nz 5000
>    [0] off-diagonal part: nz 5000
>    [1] Local rows 10000 nz 9999 nz alloced 59990 bs 1 mem 264494
>    [1] on-diagonal part: nz 9999
>    [1] off-diagonal part: nz 0

This now with petsc-dev, see src/mat/examples/tests/ex135.c.

$ mpiexec -n 2  ./ex135 -n 10000 -mat_view_info_detailed
Matrix Object:
  type=mpisbaij, rows=10000, cols=10000
  total: nonzeros=19999, allocated nonzeros=20000
  total number of mallocs used during MatSetValues calls =0
    [0] Local rows 5000 nz 10000 nz alloced 10000 bs 1 mem 254856
    [0] on-diagonal part: nz 5000
    [0] off-diagonal part: nz 5000
    [1] Local rows 5000 nz 9999 nz alloced 10000 bs 1 mem 264854
    [1] on-diagonal part: nz 9999
    [1] off-diagonal part: nz 0
    Information on VecScatter used in matrix-vector product:
    [0] Number sends = 0; Number to self = 0
    [0] Number receives = 1; Number from self = 0
    [0] 0 length 1 from whom 1
    Now the indices for all remote receives (in order by process received from)
    [0] 0
    [1] Number sends = 1; Number to self = 0
    [1]   0 length = 1 to whom 0
    Now the indices for all remote sends (in order by process sent to)
    [1] 4999
    [1] Number receives = 0; Number from self = 0

> 1. My problem is with the amount of memory used. For 10000 nonzeroes of the
> first process I would except memory needs for CSR storage format something
> approximately like:
>
> nz * sizeof(PetscScalar) + nz * sizeof(PetscInt) + n_local_rows *
> sizeof(PetscInt)
> = 10000 * 8 + 10000 * 4 + 5000 * 4
> = 140000 bytes

The dynamic assembly process involves two more arrays of length equal
to the number of local rows.  I don't think there is currently an
interface for MPISBAIJ to avoid these arrays.

> and matrix info gives 254496 bytes. Similarly for the second process. I
> would understand some additional space needed because of efficiency but this
> is more than 180 precent of a space really needed for storing CSR matrix,
> which is quite unacceptable for large problems.

Most matrices have more than two nonzeros per row in which case
2*local_rows*sizeof(PetscInt) is lost in the noise (it costs the same
as one extra vector).  Note that there are two additional private
vectors needed for the parallel multiply (not included in the matrix
"mem" field).

If the matrices you are really interested in working with have
structure Diagonal + Fringe, then you could define your own optimized
format (e.g. only storing two vectors).

> 2. Why there is "Local rows 10000"? Shouldn't be this 5000 for every
> process?

Yes, thanks.  This is fixed now.

> 3. Why there is "alloced 59990 bs" for the second process? Why there is
> "total: nonzeros=19999, allocated nonzeros=69990"?

You did not preallocate correctly.

> 4. Why there is 9999 on-diagonal and 0 off-diagonal nonzeroes for the second
> process? This is not true for my matrix.

That is referring to the diagonal block, not the diagonal itself.  The
last process hold both the (scalar) diagonal and the fringe in the
diagonal block (it's action is local).

> There is no information about symmetric matrices in PETSc Users Manual. I
> would really welcome some hints how to works with them. For example, how to
> effectively construct such matrices. When I have to set values only for the
> upper triangular part and expect approximately similar fill for every row,
> then, to give every process the same amount of rows (as MatGetOwnershipRange
> indicates) would lead to terrible load balancing. At least for the matrix
> construction process.

It looks like this info is only in the man pages at the moment, we'll
put it on the list to add to the manual.  An example using (by
default) block size 2 MPISBAIJ matrices can be found at
src/snes/examples/tutorials/ex48.c.  Note that the same assembly code
works with AIJ, BAIJ, and SBAIJ (parallel and serial).  This (finite
element) example computes the upper triangular part of the element
stiffness matrix (which may not be upper triangular in the global
ordering) and mirrors it (cheap) before calling
MatSetValuesBlockedStencil.

Most sparse matrices involve some spatial locality and have the
majority of entries in the diagonal block.  For example, consider a
structural mechanics problem with some spatial domain decomposition.
Only the nodes lying on subdomain boundaries involve any entries in
the off-diagonal block.  Symmetric formats involve an asymmetric
decision of which side "owns" these interface values, but the entries
can still be produced in a symmetric manner.  Symmetric formats
usually have somewhat slower throughput, although the memory savings
are significant.  You will have to decide which is better for your
application.

I hope this helps.

Jed


More information about the petsc-users mailing list