[petsc-users] MatCreateSBAIJ

Tue Mar 22 15:34:49 CDT 2022

Hi Matt,
   I refer to the extra memory comparing shell vs PETSc. I am using the
same MUMPS for both shell and PETSc matrix and expect the memory increase
by MUMPS to be the same for both.

Thanks,
Sam

On Tue, Mar 22, 2022 at 1:30 PM Matthew Knepley <knepley at gmail.com> wrote:

> On Tue, Mar 22, 2022 at 4:26 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>
>> The matrix only requires 175MB (upper triangular part). Not sure where
>> the other extra memory comes from for np > 1
>>
>
> This is likely MUMPS. However, we could be sure by using a heap profiler,
> like massif.
>
>   Thanks,
>
>      Matt
>
>
>> On Tue, Mar 22, 2022 at 1:21 PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Tue, Mar 22, 2022 at 4:16 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>
>>>> Here is one memory comparison (memory in MB)
>>>>  np=1np=2np=4np=8np=16
>>>> shell 1614 1720 1874 1673 1248
>>>> PETSc(using full matrix) 2108 2260 2364 2215 1734
>>>> PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the
>>>> total water mark memory added.
>>>>
>>>
>>> You should be able to directly read off the memory in the PETSc Mat
>>> structures (it is at the end of the log).
>>> With a tool like massif you could also directly measure the MUMPS memory.
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>
>>>>>
>>>>>   Sam,
>>>>>
>>>>>   MUMPS is a direct solver, as such, it requires much more memory than
>>>>> the original matrix (stored as a PETSc matrix) to store the factored
>>>>> matrix. The savings you will get by not having a PETSc copy of the matrix
>>>>> and a MUMPS copy of the matrix at the same time is unlikely to be
>>>>> significant. Do you have memory footprint measurements indicating that not
>>>>> having the PETSc copy of the matrix in memory will allow you to run
>>>>> measurably larger simulations?
>>>>>
>>>>>  Barry
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mar 22, 2022, at 3:58 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>
>>>>> The reason I want to use shell matrix is to reduce memory footprint.
>>>>> If I create a PETSc matrix and use MUMPS, I understand PETSc will create
>>>>> another copy of the matrix for MUMPS. Is there any way to avoid the extra
>>>>> copy of MUMPS?
>>>>>
>>>>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>> wrote:
>>>>>
>>>>>> Barry,
>>>>>>    Thanks for the illustration. Is there an easy way to mimic the
>>>>>> implementation using shell matrix? I have been studying how the sMvctx is
>>>>>> created and it seems pretty involved.
>>>>>>
>>>>>> Thanks,
>>>>>> Sam
>>>>>>
>>>>>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mar 21, 2022, at 4:36 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>>>
>>>>>>> Barry,
>>>>>>>    Thanks. Could you elaborate? I try to implement the matrix-vector
>>>>>>> multiplication for a symmetric matrix using shell matrix.
>>>>>>>
>>>>>>>
>>>>>>>      Consider with three ranks
>>>>>>>
>>>>>>>    (a)  =   ( A    B  D) (x)
>>>>>>>    (b)       ( B'   C  E) (y)
>>>>>>>    (c)       ( D'   E'  F) (w)
>>>>>>>
>>>>>>>       Only the ones without the ' are stored on the rank. So for
>>>>>>> example B is stored on rank 0.
>>>>>>>
>>>>>>>        Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and
>>>>>>> keeps it in b Rank 2 computes Fw and keeps it in c
>>>>>>>
>>>>>>>        Rank 0 computes B'x and D'x. It puts the nonzero entries of
>>>>>>> these values as well as the values of x into slvec0
>>>>>>>
>>>>>>>        Rank 1 computes E'y and puts the nonzero entries as well as
>>>>>>> the values into slvec0
>>>>>>>
>>>>>>>        Rank 2 puts the values of we needed by the other ranks into
>>>>>>> slvec0
>>>>>>>
>>>>>>>        Rank 0 does B y_h + D z_h where it gets the y_h and z_h
>>>>>>> values from slvec1 and adds it to a
>>>>>>>
>>>>>>>        Rank 1 takes the B'x from slvec1 and adds it to b it then
>>>>>>> takes the E y_h values where the y_h are pulled from slvec1 and adds them b
>>>>>>>
>>>>>>>        Rank 2 takes the B'x and E'y from slvec0 and adds them to c.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sam
>>>>>>>
>>>>>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith <bsmith at petsc.dev>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>   The "trick" is that though "more" communication is needed to
>>>>>>>> complete the product the communication can still be done in a single
>>>>>>>> VecScatter instead of two separate calls to VecScatter. We simply pack both
>>>>>>>> pieces of information that needs to be sent into a single vector.
>>>>>>>>
>>>>>>>>  /* copy x into the vec slvec0 */1111:   VecGetArray <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);1112:   VecGetArrayRead <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x);
>>>>>>>> 1114:   PetscArraycpy <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);1115:   VecRestoreArray <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);1116:   VecRestoreArrayRead <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x);
>>>>>>>> 1118:   VecScatterBegin <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1119:   VecScatterEnd <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);
>>>>>>>>
>>>>>>>> If you create two symmetric matrices, one with SBAIJ and one with
>>>>>>>> BAIJ and compare the time to do the product you will find that the SBAIJ is
>>>>>>>> not significantly slower but does save memory.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>>>>
>>>>>>>> Using following example from the  MatCreateSBAIJ documentation
>>>>>>>>
>>>>>>>>           0 1 2 3 4 5 6 7 8 9 10 11
>>>>>>>>           --------------------------
>>>>>>>>    row 3  |. . . d d d o o o o  o  o
>>>>>>>>    row 4  |. . . d d d o o o o  o  o
>>>>>>>>    row 5  |. . . d d d o o o o  o  o
>>>>>>>>           --------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. Is is that the processor that owns rows 0-2 will apply B symmetrical and then send the result
>>>>>>>>
>>>>>>>> to the processor that owns 3-5?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams <mfadams at lbl.gov>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> PETSc stores parallel matrices as two serial matrices. One for the
>>>>>>>>> diagonal (d or A) block and one for the rest (o or B).
>>>>>>>>> I would guess that for symmetric matrices it has a symmetric
>>>>>>>>> matrix for the diagonal and a full AIJ matrix for the (upper) off-diagonal.
>>>>>>>>> So the multtranspose is applying B symmetrically. This lower
>>>>>>>>> off-diagonal and the diagonal block can be done without communication.
>>>>>>>>> Then the off processor values are collected, and the upper
>>>>>>>>> off-diagonal is applied.
>>>>>>>>>
>>>>>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I am most interested in how the lower triangular part is
>>>>>>>>>> redistributed. It seems that SBAJI saves memory but requires more
>>>>>>>>>> communication than BAIJ.
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Mark, thanks for the quick response. I am more interested in
>>>>>>>>>>> parallel implementation of MatMult for SBAIJ. I found following
>>>>>>>>>>>
>>>>>>>>>>> 1094: *PetscErrorCode <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode> MatMult_MPISBAIJ(Mat <https://petsc.org/main/docs/manualpages/Mat/Mat.html#Mat> A,Vec <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> xx,Vec <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> yy)*1095: {1096:   Mat_MPISBAIJ      *a = (Mat_MPISBAIJ*)A->data;1097:   PetscErrorCode <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode>    ierr;1098:   PetscInt <https://petsc.org/main/docs/manualpages/Sys/PetscInt.html#PetscInt>          mbs=a->mbs,bs=A->rmap->bs;1099:   PetscScalar <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar>       *from;1100:   const PetscScalar <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar> *x;
>>>>>>>>>>> 1103:   /* diagonal part */1104:   (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105:   VecSet <https://petsc.org/main/docs/manualpages/Vec/VecSet.html#VecSet>(a->slvec1b,0.0);
>>>>>>>>>>> 1107:   /* subdiagonal part */1108:   (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b);
>>>>>>>>>>> 1110:   /* copy x into the vec slvec0 */1111:   VecGetArray <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);1112:   VecGetArrayRead <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x);
>>>>>>>>>>> 1114:   PetscArraycpy <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);1115:   VecRestoreArray <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);1116:   VecRestoreArrayRead <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x);
>>>>>>>>>>> 1118:   VecScatterBegin <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1119:   VecScatterEnd <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1120:   /* supperdiagonal part */1121:   (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122:   return(0);1123: }
>>>>>>>>>>>
>>>>>>>>>>>   I try to understand the algorithm.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Sam
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams <mfadams at lbl.gov>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> This code looks fine to me and the code is
>>>>>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Dear PETSc dev team,
>>>>>>>>>>>>>    The documentation about MatCreateSBAIJ has following
>>>>>>>>>>>>> "It is recommended that one use the MatCreate
>>>>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatCreate.html#MatCreate>
>>>>>>>>>>>>> (), MatSetType
>>>>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetType.html#MatSetType>()
>>>>>>>>>>>>> and/or MatSetFromOptions
>>>>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetFromOptions.html#MatSetFromOptions>(),
>>>>>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly.
>>>>>>>>>>>>> [MatXXXXSetPreallocation() is, for example,
>>>>>>>>>>>>> MatSeqAIJSetPreallocation
>>>>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSeqAIJSetPreallocation.html#MatSeqAIJSetPreallocation>
>>>>>>>>>>>>> ]"
>>>>>>>>>>>>>    I currently call MatCreateSBAIJ directly as follows:
>>>>>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz)
>>>>>>>>>>>>> MatSetValues (to add row by row)
>>>>>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>>>>>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
>>>>>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE);
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Two questions:
>>>>>>>>>>>>>    (1) I am wondering whether what I am doing is the
>>>>>>>>>>>>> most efficient.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    (2) I try to find out how the matrix
>>>>>>>>>>>>> vector multiplication is implemented in PETSc for SBAIJ storage.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Sam
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20220322/750dcd0c/attachment-0001.html>