[petsc-users] Various Questions Regarding PETSC

Mark Adams mfadams at lbl.gov
Sat Jul 13 13:57:20 CDT 2019


On Sat, Jul 13, 2019 at 2:39 PM Mohammed Mostafa via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> I am generating the matrix using the finite volume method
> I basically loop over the face list instead of looping over the cells to
> avoid double evaluation of the fluxes of cell faces
> So I figured I would store the coefficients in a temp container ( in this
> case a csr sparse matrix) and then loop over rows to set in the petsc
> matrix
> I know it looks like a waste of memory and copy overhead but for now I
> can’t think of a better way.
>

You can use PETSc's setValues, just use ADD_VALUES instead of INSERT. Our
matrix class is pretty much the same as your "temp container" so I would
use PETSc directly.

And I see "Maximum nonzeros in any row is 4" and "Maximum nonzeros in any
row is 1". I guess the "1"s are from the off diagonal block matrix, but
should the 4 be 2*D + 1 ?


> For now I will try the two routines from the master branch
> MatCreateMPIAIJWithArrays() MatUpdateMPIAIJWithArrays()
> and send the logs
>
> Thanks
> Kamra
>
> On Sun, Jul 14, 2019 at 1:51 AM Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
>
>>
>>   How are you generating entries in your matrix? Finite differences,
>> finite element, finite volume, something else?
>>
>>   If you are using finite differences you generally generate an entire
>> row at a time and call MatSetValues() once per row. With finite elements
>> you generates an element at a time and ADD_VALUES for a block of rows and
>> columns.
>>
>>   I don't know why generating directly in CSR format would be faster than
>> than calling MatSetValues() once per row but anyways if you have the matrix
>> in CSR format you can use
>>
>>   MatCreateMPIAIJWithArrays() (and in the master branch of the
>> repository) MatUpdateMPIAIJWithArrays().
>>
>> to build the matrix the first time, and then "refill" it with numerical
>> values each new time. There are a few other optimizations related to matrix
>> insertion in the master branch you might also benefit from.
>>
>>   Generally for problems with multiple "times" or "linear solve steps" we
>> use two stages, the first to track the initial set up and first time step
>> and the other to capture all the other steps (since the extra overhead is
>> only in the first step.) You could make a new stage for each time step but
>> I don't think that is needed.
>>
>>   After you have this going send us the new log summary.
>>
>>   Barry
>>
>>
>>
>> > On Jul 13, 2019, at 11:20 AM, Mohammed Mostafa via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> >
>> > I am sorry but I don’t see what you mean by small times
>> > Although mat assembly is relatively smaller
>> > The cost of mat set values is still significant
>> > The same can be said for vec assembly
>> > Combined vec/mat assembly  and matsetvalues constitute about 50% of the
>> total cost of matrix construction
>> >
>> > So is this problem of my matrix setup/ preallocation
>> >
>> > Or is this a hardware  issue, for whatever reason the copy is overly
>> slow
>> > The code was run on a single node
>> >
>> > Or is this function call overhead since matsetvalues is being called 1M
>> times inside the for loop ( 170k times in each process)
>> >
>> > Thanks, Kamra
>> >
>> > On Sun, Jul 14, 2019 at 12:41 AM Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Sat, Jul 13, 2019 at 9:56 AM Mohammed Mostafa <
>> mo7ammedmostafa at gmail.com> wrote:
>> > Hello Matt,
>> >
>> > I revised my code and changed the way I create the rhs vector,
>> > previosly I was using vecCreateGhost just in case I need the ghost
>> values, but for now I changed that to
>> > vecCreateMPI(.......)
>> > So maybe that was the cause of the scatter
>> > I am attaching with this email a new log output
>> >
>> > Okay, the times are now very small. How does it scale up?
>> >
>> >   Thanks,
>> >
>> >      Matt
>> >
>> > Also regarding how I fill my petsc matrix,
>> > In my code I fill a temp CSR format matrix becasue otherwise I would
>> need "MatSetValue" to fill the petsc mat element by element
>> > which is not recommmeded in the petsc manual and probably very
>> expensive due to function call overhead
>> > So after I create my matrix in CSR format, I fill the PETSC mat A as
>> follows
>> > for (i = 0; i < nMatRows; i++) {
>> >  cffset = CSR_iptr[i];
>> > row_index = row_gIndex[i];
>> > nj = Eqn_nj[i];
>> > MatSetValues(PhiEqnSolver.A, 1, &row_index, nj, CSR_jptr + offset,
>> CSR_vptr +  offset, INSERT_VALUES);
>> > }
>> > After That
>> > VecAssemblyBegin(RHS);
>> > VecAssemblyEnd(RHS);
>> >
>> > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>> > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>> >
>> > I don't believe , I am doing anything special, if possible I would like
>> to set the whole csr matrix at once in one command.
>> > I took a look at the code for MatSetValues, if I am understanding it
>> correctly(hopefully) I think I could do it, maybe modify it or create a new
>> routine entirely for this pupose.
>> > i.e. MatSetValuesFromCSR(.....)
>> > Or is there a particular reason why it has to be this way
>> >
>> > I also tried ksp ex3 but I slightly tweaked it to add a logging stage
>> around the assembly and MatSetValues and I am attaching the modified
>> example here as well.
>> > Although in this example the matrix stash is not empty ( means
>> off-processor values are being set ) but the timing values for roughly the
>> same matrix size , the command I used is
>> > mpirun -np 6 ./mod_ksp_ex3 -m 1000 -log_view -info
>> >
>> >
>> > Regards,
>> > Kamra
>> >
>> > On Sat, Jul 13, 2019 at 1:43 PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Fri, Jul 12, 2019 at 10:51 PM Mohammed Mostafa <
>> mo7ammedmostafa at gmail.com> wrote:
>> > Hello Matt,
>> > Attached is the dumped entire log output using -log_view and -info.
>> >
>> > In matrix construction, it looks like you have a mixture of load
>> imbalance (see the imbalance in the Begin events)
>> > and lots of Scatter messages in your assembly. We turn off
>> MatSetValues() logging by default since it is usually
>> > called many times, but you can explicitly turn it back on if you want.
>> I don't think that is the problem here. Its easy
>> > to see from examples (say SNES ex5) that it is not the major time sink.
>> What is the Scatter doing?
>> >
>> >   Thanks,
>> >
>> >      Matt
>> >
>> > Thanks,
>> > Kamra
>> >
>> > On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>> > Hello all,
>> > I have a few question regarding Petsc,
>> >
>> > Please send the entire output of a run with all the logging turned on,
>> using -log_view and -info.
>> >
>> >   Thanks,
>> >
>> >     Matt
>> >
>> > Question 1:
>> > For the profiling , is it possible to only show the user defined log
>> events in the breakdown of each stage in Log-view.
>> > I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>> >  PetscLogEventExcludeClass(MAT_CLASSID);
>> > PetscLogEventExcludeClass(VEC_CLASSID);
>> > PetscLogEventExcludeClass(KSP_CLASSID);
>> > PetscLogEventExcludeClass(PC_CLASSID);
>> > which should "Deactivates event logging for a PETSc object class in
>> every stage" according to the manual.
>> > however I still see them in the stage breakdown
>> > --- Event Stage 1: Matrix Construction
>> >
>> > BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>> > VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> > VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>> > VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> > VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01
>> 2.1e+03 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
>> > VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> > MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> > MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01
>> 5.3e+02 8.0e+00  0  0  3  0  6  10  0 50 20100     0
>> > AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01
>> 1.3e+03 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
>> > myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
>> > setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
>> > setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
>> > callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
>> >
>> > Also is possible to clear the logs so that I can write a  separate
>> profiling output file for each timestep ( since I am solving a transient
>> problem and I want to know the change in performance as time goes by )
>> >
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> > Question 2:
>> > Regarding MatSetValues
>> > Right now, I writing a finite volume code, due to algorithm requirement
>> I have to write the matrix into local native format ( array of arrays) and
>> then loop through rows and use MatSetValues to set the elements in "Mat A"
>> > MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>> > but it is very slow and it is killing my performance
>> > although the matrix was properly set using
>> > MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>> PETSC_DETERMINE,
>> > PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>> > with d_nnz,and  o_nnz properly assigned so no mallocs occur during
>> matsetvalues and all inserted values are local so no off-processor values
>> > So my question is it possible to set multiple rows at once hopefully
>> all, I checked the manual and MatSetValues can only set dense matrix block
>> because it seems that row by row is expensive
>> > Or perhaps is it possible to copy all rows to the underlying matrix
>> data, as I mentioned all values are local and no off-processor values (
>> stash is 0 )
>> > [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
>> > [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>> mallocs.
>> > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>> > [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>> > [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>> > [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>> > [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>> > [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>> > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>> space: 0 unneeded,743028 used
>> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>> space: 0 unneeded,742972 used
>> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>> > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>> > [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>> > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>> space: 0 unneeded,743093 used
>> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>> space: 0 unneeded,743036 used
>> > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>> > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>> space: 0 unneeded,742938 used
>> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>> > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>> > [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>> space: 0 unneeded,743049 used
>> > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>> > [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>> > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space:
>> 0 unneeded,685 used
>> > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space:
>> 0 unneeded,649 used
>> > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>> > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>> > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>> > [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage space:
>> 0 unneeded,1011 used
>> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage space:
>> 0 unneeded,1137 used
>> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>> > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>> > [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space:
>> 0 unneeded,658 used
>> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space:
>> 0 unneeded,648 used
>> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>> > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>> 0
>> > [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>> > [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>> >
>> >
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> > Question 3:
>> > If all matrix and vector inserted data are local, what part of the
>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>> more time than matrix builder
>> > Also this is not just for the first time MAT_FINAL_ASSEMBLY
>> >
>> >
>> > For context the matrix in the above is nearly 1Mx1M partitioned over
>> six processes and it was NOT built using DM
>> >
>> > Finally the configure options are:
>> >
>> > Configure options:
>> > PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>> --with-fc=mpif90 --download-metis --download-hypre
>> >
>> > Sorry for such long question and thanks in advance
>> > Thanks
>> > M. Kamra
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>> > https://www.cse.buffalo.edu/~knepley/
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>> > https://www.cse.buffalo.edu/~knepley/
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>> > https://www.cse.buffalo.edu/~knepley/
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190713/f386b43a/attachment-0001.html>


More information about the petsc-users mailing list