[petsc-users] Various Questions Regarding PETSC
Mohammed Mostafa
mo7ammedmostafa at gmail.com
Sat Jul 13 14:02:19 CDT 2019
This is a 2D case on triangular mesh so 3+1
For quad it becomes 4+1 and in 3D more and more nnz depending on the cell
type
On Sun, Jul 14, 2019 at 3:55 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>
> On Sat, Jul 13, 2019 at 2:39 PM Mohammed Mostafa via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> I am generating the matrix using the finite volume method
>> I basically loop over the face list instead of looping over the cells to
>> avoid double evaluation of the fluxes of cell faces
>> So I figured I would store the coefficients in a temp container ( in this
>> case a csr sparse matrix) and then loop over rows to set in the petsc
>> matrix
>> I know it looks like a waste of memory and copy overhead but for now I
>> can’t think of a better way.
>>
>
> You can use PETSc's setValues, just use ADD_VALUES instead of INSERT. Our
> matrix class is pretty much the same as your "temp container" so I would
> use PETSc directly.
>
> And I see "Maximum nonzeros in any row is 4" and "Maximum nonzeros in any
> row is 1". I guess the "1"s are from the off diagonal block matrix, but
> should the 4 be 2*D + 1 ?
>
>
>> For now I will try the two routines from the master branch
>> MatCreateMPIAIJWithArrays() MatUpdateMPIAIJWithArrays()
>> and send the logs
>>
>> Thanks
>> Kamra
>>
>> On Sun, Jul 14, 2019 at 1:51 AM Smith, Barry F. <bsmith at mcs.anl.gov>
>> wrote:
>>
>>>
>>> How are you generating entries in your matrix? Finite differences,
>>> finite element, finite volume, something else?
>>>
>>> If you are using finite differences you generally generate an entire
>>> row at a time and call MatSetValues() once per row. With finite elements
>>> you generates an element at a time and ADD_VALUES for a block of rows and
>>> columns.
>>>
>>> I don't know why generating directly in CSR format would be faster
>>> than than calling MatSetValues() once per row but anyways if you have the
>>> matrix in CSR format you can use
>>>
>>> MatCreateMPIAIJWithArrays() (and in the master branch of the
>>> repository) MatUpdateMPIAIJWithArrays().
>>>
>>> to build the matrix the first time, and then "refill" it with numerical
>>> values each new time. There are a few other optimizations related to matrix
>>> insertion in the master branch you might also benefit from.
>>>
>>> Generally for problems with multiple "times" or "linear solve steps"
>>> we use two stages, the first to track the initial set up and first time
>>> step and the other to capture all the other steps (since the extra overhead
>>> is only in the first step.) You could make a new stage for each time step
>>> but I don't think that is needed.
>>>
>>> After you have this going send us the new log summary.
>>>
>>> Barry
>>>
>>>
>>>
>>> > On Jul 13, 2019, at 11:20 AM, Mohammed Mostafa via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>> >
>>> > I am sorry but I don’t see what you mean by small times
>>> > Although mat assembly is relatively smaller
>>> > The cost of mat set values is still significant
>>> > The same can be said for vec assembly
>>> > Combined vec/mat assembly and matsetvalues constitute about 50% of
>>> the total cost of matrix construction
>>> >
>>> > So is this problem of my matrix setup/ preallocation
>>> >
>>> > Or is this a hardware issue, for whatever reason the copy is overly
>>> slow
>>> > The code was run on a single node
>>> >
>>> > Or is this function call overhead since matsetvalues is being called
>>> 1M times inside the for loop ( 170k times in each process)
>>> >
>>> > Thanks, Kamra
>>> >
>>> > On Sun, Jul 14, 2019 at 12:41 AM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>> > On Sat, Jul 13, 2019 at 9:56 AM Mohammed Mostafa <
>>> mo7ammedmostafa at gmail.com> wrote:
>>> > Hello Matt,
>>> >
>>> > I revised my code and changed the way I create the rhs vector,
>>> > previosly I was using vecCreateGhost just in case I need the ghost
>>> values, but for now I changed that to
>>> > vecCreateMPI(.......)
>>> > So maybe that was the cause of the scatter
>>> > I am attaching with this email a new log output
>>> >
>>> > Okay, the times are now very small. How does it scale up?
>>> >
>>> > Thanks,
>>> >
>>> > Matt
>>> >
>>> > Also regarding how I fill my petsc matrix,
>>> > In my code I fill a temp CSR format matrix becasue otherwise I would
>>> need "MatSetValue" to fill the petsc mat element by element
>>> > which is not recommmeded in the petsc manual and probably very
>>> expensive due to function call overhead
>>> > So after I create my matrix in CSR format, I fill the PETSC mat A as
>>> follows
>>> > for (i = 0; i < nMatRows; i++) {
>>> > cffset = CSR_iptr[i];
>>> > row_index = row_gIndex[i];
>>> > nj = Eqn_nj[i];
>>> > MatSetValues(PhiEqnSolver.A, 1, &row_index, nj, CSR_jptr + offset,
>>> CSR_vptr + offset, INSERT_VALUES);
>>> > }
>>> > After That
>>> > VecAssemblyBegin(RHS);
>>> > VecAssemblyEnd(RHS);
>>> >
>>> > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>>> > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>>> >
>>> > I don't believe , I am doing anything special, if possible I would
>>> like to set the whole csr matrix at once in one command.
>>> > I took a look at the code for MatSetValues, if I am understanding it
>>> correctly(hopefully) I think I could do it, maybe modify it or create a new
>>> routine entirely for this pupose.
>>> > i.e. MatSetValuesFromCSR(.....)
>>> > Or is there a particular reason why it has to be this way
>>> >
>>> > I also tried ksp ex3 but I slightly tweaked it to add a logging stage
>>> around the assembly and MatSetValues and I am attaching the modified
>>> example here as well.
>>> > Although in this example the matrix stash is not empty ( means
>>> off-processor values are being set ) but the timing values for roughly the
>>> same matrix size , the command I used is
>>> > mpirun -np 6 ./mod_ksp_ex3 -m 1000 -log_view -info
>>> >
>>> >
>>> > Regards,
>>> > Kamra
>>> >
>>> > On Sat, Jul 13, 2019 at 1:43 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>> > On Fri, Jul 12, 2019 at 10:51 PM Mohammed Mostafa <
>>> mo7ammedmostafa at gmail.com> wrote:
>>> > Hello Matt,
>>> > Attached is the dumped entire log output using -log_view and -info.
>>> >
>>> > In matrix construction, it looks like you have a mixture of load
>>> imbalance (see the imbalance in the Begin events)
>>> > and lots of Scatter messages in your assembly. We turn off
>>> MatSetValues() logging by default since it is usually
>>> > called many times, but you can explicitly turn it back on if you want.
>>> I don't think that is the problem here. Its easy
>>> > to see from examples (say SNES ex5) that it is not the major time
>>> sink. What is the Scatter doing?
>>> >
>>> > Thanks,
>>> >
>>> > Matt
>>> >
>>> > Thanks,
>>> > Kamra
>>> >
>>> > On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>> > On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>> > Hello all,
>>> > I have a few question regarding Petsc,
>>> >
>>> > Please send the entire output of a run with all the logging turned on,
>>> using -log_view and -info.
>>> >
>>> > Thanks,
>>> >
>>> > Matt
>>> >
>>> > Question 1:
>>> > For the profiling , is it possible to only show the user defined log
>>> events in the breakdown of each stage in Log-view.
>>> > I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>> > PetscLogEventExcludeClass(MAT_CLASSID);
>>> > PetscLogEventExcludeClass(VEC_CLASSID);
>>> > PetscLogEventExcludeClass(KSP_CLASSID);
>>> > PetscLogEventExcludeClass(PC_CLASSID);
>>> > which should "Deactivates event logging for a PETSc object class in
>>> every stage" according to the manual.
>>> > however I still see them in the stage breakdown
>>> > --- Event Stage 1: Matrix Construction
>>> >
>>> > BuildTwoSidedF 4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0
>>> > VecSet 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> > VecAssemblyBegin 2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0
>>> > VecAssemblyEnd 2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> > VecScatterBegin 2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01
>>> 2.1e+03 0.0e+00 0 0 3 0 0 0 0 50 80 0 0
>>> > VecScatterEnd 2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> > MatAssemblyBegin 2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> > MatAssemblyEnd 2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01
>>> 5.3e+02 8.0e+00 0 0 3 0 6 10 0 50 20100 0
>>> > AssembleMats 2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01
>>> 1.3e+03 8.0e+00 0 0 7 0 6 28 0100100100 0 # USER EVENT
>>> > myMatSetValues 2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 19 0 0 0 0 0 # USER EVENT
>>> > setNativeMat 1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 24 0 0 0 0 0 # USER EVENT
>>> > setNativeMatII 1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 28 0 0 0 0 0 # USER EVENT
>>> > callScheme 1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 # USER EVENT
>>> >
>>> > Also is possible to clear the logs so that I can write a separate
>>> profiling output file for each timestep ( since I am solving a transient
>>> problem and I want to know the change in performance as time goes by )
>>> >
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> > Question 2:
>>> > Regarding MatSetValues
>>> > Right now, I writing a finite volume code, due to algorithm
>>> requirement I have to write the matrix into local native format ( array of
>>> arrays) and then loop through rows and use MatSetValues to set the elements
>>> in "Mat A"
>>> > MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>> > but it is very slow and it is killing my performance
>>> > although the matrix was properly set using
>>> > MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>>> PETSC_DETERMINE,
>>> > PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>> > with d_nnz,and o_nnz properly assigned so no mallocs occur during
>>> matsetvalues and all inserted values are local so no off-processor values
>>> > So my question is it possible to set multiple rows at once hopefully
>>> all, I checked the manual and MatSetValues can only set dense matrix block
>>> because it seems that row by row is expensive
>>> > Or perhaps is it possible to copy all rows to the underlying matrix
>>> data, as I mentioned all values are local and no off-processor values (
>>> stash is 0 )
>>> > [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
>>> > [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>>> mallocs.
>>> > [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> > [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> > [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> > [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> > [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> > [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>>> space: 0 unneeded,743028 used
>>> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>> space: 0 unneeded,742972 used
>>> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>> > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> > [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>> > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>> space: 0 unneeded,743093 used
>>> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>> space: 0 unneeded,743036 used
>>> > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>> space: 0 unneeded,742938 used
>>> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>> > [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>> space: 0 unneeded,743049 used
>>> > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> > [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>> > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space:
>>> 0 unneeded,685 used
>>> > [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space:
>>> 0 unneeded,649 used
>>> > [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> > [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>> > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> > [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage
>>> space: 0 unneeded,1011 used
>>> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage
>>> space: 0 unneeded,1137 used
>>> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> > [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> > [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space:
>>> 0 unneeded,658 used
>>> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space:
>>> 0 unneeded,648 used
>>> > [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>> > [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>> is 0
>>> > [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> > [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>> >
>>> >
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> > Question 3:
>>> > If all matrix and vector inserted data are local, what part of the
>>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>>> more time than matrix builder
>>> > Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>> >
>>> >
>>> > For context the matrix in the above is nearly 1Mx1M partitioned over
>>> six processes and it was NOT built using DM
>>> >
>>> > Finally the configure options are:
>>> >
>>> > Configure options:
>>> > PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>>> --with-fc=mpif90 --download-metis --download-hypre
>>> >
>>> > Sorry for such long question and thanks in advance
>>> > Thanks
>>> > M. Kamra
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > -- Norbert Wiener
>>> >
>>> > https://www.cse.buffalo.edu/~knepley/
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > -- Norbert Wiener
>>> >
>>> > https://www.cse.buffalo.edu/~knepley/
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > -- Norbert Wiener
>>> >
>>> > https://www.cse.buffalo.edu/~knepley/
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190714/b62a65f4/attachment-0001.html>
More information about the petsc-users
mailing list