[petsc-users] MatCreateSBAIJ

Tue Mar 22 15:30:37 CDT 2022

On Tue, Mar 22, 2022 at 4:26 PM Sam Guo <sam.guo at cd-adapco.com> wrote:

> The matrix only requires 175MB (upper triangular part). Not sure where the
> other extra memory comes from for np > 1
>

This is likely MUMPS. However, we could be sure by using a heap profiler,
like massif.

  Thanks,

     Matt

> On Tue, Mar 22, 2022 at 1:21 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Tue, Mar 22, 2022 at 4:16 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>
>>> Here is one memory comparison (memory in MB)
>>>  np=1np=2np=4np=8np=16
>>> shell 1614 1720 1874 1673 1248
>>> PETSc(using full matrix) 2108 2260 2364 2215 1734
>>> PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the
>>> total water mark memory added.
>>>
>>
>> You should be able to directly read off the memory in the PETSc Mat
>> structures (it is at the end of the log).
>> With a tool like massif you could also directly measure the MUMPS memory.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>>   Sam,
>>>>
>>>>   MUMPS is a direct solver, as such, it requires much more memory than
>>>> the original matrix (stored as a PETSc matrix) to store the factored
>>>> matrix. The savings you will get by not having a PETSc copy of the matrix
>>>> and a MUMPS copy of the matrix at the same time is unlikely to be
>>>> significant. Do you have memory footprint measurements indicating that not
>>>> having the PETSc copy of the matrix in memory will allow you to run
>>>> measurably larger simulations?
>>>>
>>>>  Barry
>>>>
>>>>
>>>>
>>>>
>>>> On Mar 22, 2022, at 3:58 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>
>>>> The reason I want to use shell matrix is to reduce memory footprint. If
>>>> I create a PETSc matrix and use MUMPS, I understand PETSc will create
>>>> another copy of the matrix for MUMPS. Is there any way to avoid the extra
>>>> copy of MUMPS?
>>>>
>>>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>
>>>>> Barry,
>>>>>    Thanks for the illustration. Is there an easy way to mimic the
>>>>> implementation using shell matrix? I have been studying how the sMvctx is
>>>>> created and it seems pretty involved.
>>>>>
>>>>> Thanks,
>>>>> Sam
>>>>>
>>>>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mar 21, 2022, at 4:36 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>>
>>>>>> Barry,
>>>>>>    Thanks. Could you elaborate? I try to implement the matrix-vector
>>>>>> multiplication for a symmetric matrix using shell matrix.
>>>>>>
>>>>>>
>>>>>>      Consider with three ranks
>>>>>>
>>>>>>    (a)  =   ( A    B  D) (x)
>>>>>>    (b)       ( B'   C  E) (y)
>>>>>>    (c)       ( D'   E'  F) (w)
>>>>>>
>>>>>>       Only the ones without the ' are stored on the rank. So for
>>>>>> example B is stored on rank 0.
>>>>>>
>>>>>>        Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and
>>>>>> keeps it in b Rank 2 computes Fw and keeps it in c
>>>>>>
>>>>>>        Rank 0 computes B'x and D'x. It puts the nonzero entries of
>>>>>> these values as well as the values of x into slvec0
>>>>>>
>>>>>>        Rank 1 computes E'y and puts the nonzero entries as well as
>>>>>> the values into slvec0
>>>>>>
>>>>>>        Rank 2 puts the values of we needed by the other ranks into
>>>>>> slvec0
>>>>>>
>>>>>>        Rank 0 does B y_h + D z_h where it gets the y_h and z_h values
>>>>>> from slvec1 and adds it to a
>>>>>>
>>>>>>        Rank 1 takes the B'x from slvec1 and adds it to b it then
>>>>>> takes the E y_h values where the y_h are pulled from slvec1 and adds them b
>>>>>>
>>>>>>        Rank 2 takes the B'x and E'y from slvec0 and adds them to c.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Sam
>>>>>>
>>>>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith <bsmith at petsc.dev>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>   The "trick" is that though "more" communication is needed to
>>>>>>> complete the product the communication can still be done in a single
>>>>>>> VecScatter instead of two separate calls to VecScatter. We simply pack both
>>>>>>> pieces of information that needs to be sent into a single vector.
>>>>>>>
>>>>>>>  /* copy x into the vec slvec0 */1111:   VecGetArray <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);1112:   VecGetArrayRead <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x);
>>>>>>> 1114:   PetscArraycpy <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);1115:   VecRestoreArray <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);1116:   VecRestoreArrayRead <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x);
>>>>>>> 1118:   VecScatterBegin <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1119:   VecScatterEnd <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);
>>>>>>>
>>>>>>> If you create two symmetric matrices, one with SBAIJ and one with
>>>>>>> BAIJ and compare the time to do the product you will find that the SBAIJ is
>>>>>>> not significantly slower but does save memory.
>>>>>>>
>>>>>>>
>>>>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>>>
>>>>>>> Using following example from the  MatCreateSBAIJ documentation
>>>>>>>
>>>>>>>           0 1 2 3 4 5 6 7 8 9 10 11
>>>>>>>           --------------------------
>>>>>>>    row 3  |. . . d d d o o o o  o  o
>>>>>>>    row 4  |. . . d d d o o o o  o  o
>>>>>>>    row 5  |. . . d d d o o o o  o  o
>>>>>>>           --------------------------
>>>>>>>
>>>>>>>
>>>>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result
>>>>>>>
>>>>>>> to the processor that owns 3-5?
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>> PETSc stores parallel matrices as two serial matrices. One for the
>>>>>>>> diagonal (d or A) block and one for the rest (o or B).
>>>>>>>> I would guess that for symmetric matrices it has a symmetric matrix
>>>>>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal.
>>>>>>>> So the multtranspose is applying B symmetrically. This lower
>>>>>>>> off-diagonal and the diagonal block can be done without communication.
>>>>>>>> Then the off processor values are collected, and the upper
>>>>>>>> off-diagonal is applied.
>>>>>>>>
>>>>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am most interested in how the lower triangular part is
>>>>>>>>> redistributed. It seems that SBAJI saves memory but requires more
>>>>>>>>> communication than BAIJ.
>>>>>>>>>
>>>>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Mark, thanks for the quick response. I am more interested in
>>>>>>>>>> parallel implementation of MatMult for SBAIJ. I found following
>>>>>>>>>>
>>>>>>>>>> 1094: *PetscErrorCode <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode> MatMult_MPISBAIJ(Mat <https://petsc.org/main/docs/manualpages/Mat/Mat.html#Mat> A,Vec <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> xx,Vec <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> yy)*1095: {1096:   Mat_MPISBAIJ      *a = (Mat_MPISBAIJ*)A->data;1097:   PetscErrorCode <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode>    ierr;1098:   PetscInt <https://petsc.org/main/docs/manualpages/Sys/PetscInt.html#PetscInt>          mbs=a->mbs,bs=A->rmap->bs;1099:   PetscScalar <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar>       *from;1100:   const PetscScalar <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar> *x;
>>>>>>>>>> 1103:   /* diagonal part */1104:   (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105:   VecSet <https://petsc.org/main/docs/manualpages/Vec/VecSet.html#VecSet>(a->slvec1b,0.0);
>>>>>>>>>> 1107:   /* subdiagonal part */1108:   (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b);
>>>>>>>>>> 1110:   /* copy x into the vec slvec0 */1111:   VecGetArray <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);1112:   VecGetArrayRead <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x);
>>>>>>>>>> 1114:   PetscArraycpy <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);1115:   VecRestoreArray <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);1116:   VecRestoreArrayRead <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x);
>>>>>>>>>> 1118:   VecScatterBegin <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1119:   VecScatterEnd <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1120:   /* supperdiagonal part */1121:   (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122:   return(0);1123: }
>>>>>>>>>>
>>>>>>>>>>   I try to understand the algorithm.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Sam
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams <mfadams at lbl.gov>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> This code looks fine to me and the code is
>>>>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dear PETSc dev team,
>>>>>>>>>>>>    The documentation about MatCreateSBAIJ has following
>>>>>>>>>>>> "It is recommended that one use the MatCreate
>>>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatCreate.html#MatCreate>
>>>>>>>>>>>> (), MatSetType
>>>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetType.html#MatSetType>()
>>>>>>>>>>>> and/or MatSetFromOptions
>>>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetFromOptions.html#MatSetFromOptions>(),
>>>>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly.
>>>>>>>>>>>> [MatXXXXSetPreallocation() is, for example,
>>>>>>>>>>>> MatSeqAIJSetPreallocation
>>>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSeqAIJSetPreallocation.html#MatSeqAIJSetPreallocation>
>>>>>>>>>>>> ]"
>>>>>>>>>>>>    I currently call MatCreateSBAIJ directly as follows:
>>>>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz)
>>>>>>>>>>>> MatSetValues (to add row by row)
>>>>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>>>>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
>>>>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE);
>>>>>>>>>>>>
>>>>>>>>>>>>    Two questions:
>>>>>>>>>>>>    (1) I am wondering whether what I am doing is the
>>>>>>>>>>>> most efficient.
>>>>>>>>>>>>
>>>>>>>>>>>>    (2) I try to find out how the matrix
>>>>>>>>>>>> vector multiplication is implemented in PETSc for SBAIJ storage.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Sam
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220322/9a8f1bd5/attachment-0001.html>