[petsc-users] Various Questions Regarding PETSC

Mohammed Mostafa mo7ammedmostafa at gmail.com
Sat Jul 13 13:37:20 CDT 2019


I am generating the matrix using the finite volume method
I basically loop over the face list instead of looping over the cells to
avoid double evaluation of the fluxes of cell faces
So I figured I would store the coefficients in a temp container ( in this
case a csr sparse matrix) and then loop over rows to set in the petsc
matrix
I know it looks like a waste of memory and copy overhead but for now I
can’t think of a better way.
For now I will try the two routines from the master branch
MatCreateMPIAIJWithArrays() MatUpdateMPIAIJWithArrays()
and send the logs

Thanks
Kamra

On Sun, Jul 14, 2019 at 1:51 AM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   How are you generating entries in your matrix? Finite differences,
> finite element, finite volume, something else?
>
>   If you are using finite differences you generally generate an entire row
> at a time and call MatSetValues() once per row. With finite elements you
> generates an element at a time and ADD_VALUES for a block of rows and
> columns.
>
>   I don't know why generating directly in CSR format would be faster than
> than calling MatSetValues() once per row but anyways if you have the matrix
> in CSR format you can use
>
>   MatCreateMPIAIJWithArrays() (and in the master branch of the repository)
> MatUpdateMPIAIJWithArrays().
>
> to build the matrix the first time, and then "refill" it with numerical
> values each new time. There are a few other optimizations related to matrix
> insertion in the master branch you might also benefit from.
>
>   Generally for problems with multiple "times" or "linear solve steps" we
> use two stages, the first to track the initial set up and first time step
> and the other to capture all the other steps (since the extra overhead is
> only in the first step.) You could make a new stage for each time step but
> I don't think that is needed.
>
>   After you have this going send us the new log summary.
>
>   Barry
>
>
>
> > On Jul 13, 2019, at 11:20 AM, Mohammed Mostafa via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> >
> > I am sorry but I don’t see what you mean by small times
> > Although mat assembly is relatively smaller
> > The cost of mat set values is still significant
> > The same can be said for vec assembly
> > Combined vec/mat assembly  and matsetvalues constitute about 50% of the
> total cost of matrix construction
> >
> > So is this problem of my matrix setup/ preallocation
> >
> > Or is this a hardware  issue, for whatever reason the copy is overly slow
> > The code was run on a single node
> >
> > Or is this function call overhead since matsetvalues is being called 1M
> times inside the for loop ( 170k times in each process)
> >
> > Thanks, Kamra
> >
> > On Sun, Jul 14, 2019 at 12:41 AM Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Sat, Jul 13, 2019 at 9:56 AM Mohammed Mostafa <
> mo7ammedmostafa at gmail.com> wrote:
> > Hello Matt,
> >
> > I revised my code and changed the way I create the rhs vector,
> > previosly I was using vecCreateGhost just in case I need the ghost
> values, but for now I changed that to
> > vecCreateMPI(.......)
> > So maybe that was the cause of the scatter
> > I am attaching with this email a new log output
> >
> > Okay, the times are now very small. How does it scale up?
> >
> >   Thanks,
> >
> >      Matt
> >
> > Also regarding how I fill my petsc matrix,
> > In my code I fill a temp CSR format matrix becasue otherwise I would
> need "MatSetValue" to fill the petsc mat element by element
> > which is not recommmeded in the petsc manual and probably very expensive
> due to function call overhead
> > So after I create my matrix in CSR format, I fill the PETSC mat A as
> follows
> > for (i = 0; i < nMatRows; i++) {
> >  cffset = CSR_iptr[i];
> > row_index = row_gIndex[i];
> > nj = Eqn_nj[i];
> > MatSetValues(PhiEqnSolver.A, 1, &row_index, nj, CSR_jptr + offset,
> CSR_vptr +  offset, INSERT_VALUES);
> > }
> > After That
> > VecAssemblyBegin(RHS);
> > VecAssemblyEnd(RHS);
> >
> > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
> > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
> >
> > I don't believe , I am doing anything special, if possible I would like
> to set the whole csr matrix at once in one command.
> > I took a look at the code for MatSetValues, if I am understanding it
> correctly(hopefully) I think I could do it, maybe modify it or create a new
> routine entirely for this pupose.
> > i.e. MatSetValuesFromCSR(.....)
> > Or is there a particular reason why it has to be this way
> >
> > I also tried ksp ex3 but I slightly tweaked it to add a logging stage
> around the assembly and MatSetValues and I am attaching the modified
> example here as well.
> > Although in this example the matrix stash is not empty ( means
> off-processor values are being set ) but the timing values for roughly the
> same matrix size , the command I used is
> > mpirun -np 6 ./mod_ksp_ex3 -m 1000 -log_view -info
> >
> >
> > Regards,
> > Kamra
> >
> > On Sat, Jul 13, 2019 at 1:43 PM Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Jul 12, 2019 at 10:51 PM Mohammed Mostafa <
> mo7ammedmostafa at gmail.com> wrote:
> > Hello Matt,
> > Attached is the dumped entire log output using -log_view and -info.
> >
> > In matrix construction, it looks like you have a mixture of load
> imbalance (see the imbalance in the Begin events)
> > and lots of Scatter messages in your assembly. We turn off
> MatSetValues() logging by default since it is usually
> > called many times, but you can explicitly turn it back on if you want. I
> don't think that is the problem here. Its easy
> > to see from examples (say SNES ex5) that it is not the major time sink.
> What is the Scatter doing?
> >
> >   Thanks,
> >
> >      Matt
> >
> > Thanks,
> > Kamra
> >
> > On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
> wrote:
> > On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
> > Hello all,
> > I have a few question regarding Petsc,
> >
> > Please send the entire output of a run with all the logging turned on,
> using -log_view and -info.
> >
> >   Thanks,
> >
> >     Matt
> >
> > Question 1:
> > For the profiling , is it possible to only show the user defined log
> events in the breakdown of each stage in Log-view.
> > I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
> >  PetscLogEventExcludeClass(MAT_CLASSID);
> > PetscLogEventExcludeClass(VEC_CLASSID);
> > PetscLogEventExcludeClass(KSP_CLASSID);
> > PetscLogEventExcludeClass(PC_CLASSID);
> > which should "Deactivates event logging for a PETSc object class in
> every stage" according to the manual.
> > however I still see them in the stage breakdown
> > --- Event Stage 1: Matrix Construction
> >
> > BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
> > VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
> > VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01 2.1e+03
> 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
> > VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01 5.3e+02
> 8.0e+00  0  0  3  0  6  10  0 50 20100     0
> > AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01 1.3e+03
> 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
> > myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
> > setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
> > setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
> > callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
> >
> > Also is possible to clear the logs so that I can write a  separate
> profiling output file for each timestep ( since I am solving a transient
> problem and I want to know the change in performance as time goes by )
> >
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Question 2:
> > Regarding MatSetValues
> > Right now, I writing a finite volume code, due to algorithm requirement
> I have to write the matrix into local native format ( array of arrays) and
> then loop through rows and use MatSetValues to set the elements in "Mat A"
> > MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
> > but it is very slow and it is killing my performance
> > although the matrix was properly set using
> > MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
> PETSC_DETERMINE,
> > PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
> > with d_nnz,and  o_nnz properly assigned so no mallocs occur during
> matsetvalues and all inserted values are local so no off-processor values
> > So my question is it possible to set multiple rows at once hopefully
> all, I checked the manual and MatSetValues can only set dense matrix block
> because it seems that row by row is expensive
> > Or perhaps is it possible to copy all rows to the underlying matrix
> data, as I mentioned all values are local and no off-processor values (
> stash is 0 )
> > [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
> > [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
> mallocs.
> > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> > [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> > [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> > [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> > [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> > [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
> > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
> space: 0 unneeded,743028 used
> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
> space: 0 unneeded,742972 used
> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
> > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
> > [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
> > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
> space: 0 unneeded,743093 used
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
> space: 0 unneeded,743036 used
> > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
> > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
> space: 0 unneeded,742938 used
> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
> > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
> > [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
> space: 0 unneeded,743049 used
> > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
> > [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
> > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space: 0
> unneeded,685 used
> > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space: 0
> unneeded,649 used
> > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
> > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> > [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage space:
> 0 unneeded,1011 used
> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage space:
> 0 unneeded,1137 used
> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> > [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space: 0
> unneeded,658 used
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space: 0
> unneeded,648 used
> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
> > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> > [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
> >
> >
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Question 3:
> > If all matrix and vector inserted data are local, what part of the
> vec/mat assembly consumes time because matsetvalues and matassembly consume
> more time than matrix builder
> > Also this is not just for the first time MAT_FINAL_ASSEMBLY
> >
> >
> > For context the matrix in the above is nearly 1Mx1M partitioned over six
> processes and it was NOT built using DM
> >
> > Finally the configure options are:
> >
> > Configure options:
> > PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
> --with-fc=mpif90 --download-metis --download-hypre
> >
> > Sorry for such long question and thanks in advance
> > Thanks
> > M. Kamra
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190714/c0667245/attachment-0001.html>


More information about the petsc-users mailing list