[petsc-users] Various Questions Regarding PETSC

Matthew Knepley knepley at gmail.com
Sat Jul 13 10:41:44 CDT 2019


On Sat, Jul 13, 2019 at 9:56 AM Mohammed Mostafa <mo7ammedmostafa at gmail.com>
wrote:

> Hello Matt,
>
> I revised my code and changed the way I create the rhs vector,
> previosly I was using vecCreateGhost just in case I need the ghost values,
> but for now I changed that to
> vecCreateMPI(.......)
> So maybe that was the cause of the scatter
> I am attaching with this email a new log output
>

Okay, the times are now very small. How does it scale up?

  Thanks,

     Matt


> Also regarding how I fill my petsc matrix,
> In my code I fill a temp CSR format matrix becasue otherwise I would need
> "MatSetValue" to fill the petsc mat element by element
> which is not recommmeded in the petsc manual and probably very expensive
> due to function call overhead
> *So after I create my matrix in CSR format, I fill the PETSC mat A as
> follows*
>
>> for (i = 0; i < nMatRows; i++) {
>>  cffset = CSR_iptr[i];
>> row_index = row_gIndex[i];
>> nj = Eqn_nj[i];
>> MatSetValues(PhiEqnSolver.A, 1, &row_index, nj, CSR_jptr + offset,
>> CSR_vptr +  offset, INSERT_VALUES);
>> }
>>
> *After That*
>
>> VecAssemblyBegin(RHS);
>> VecAssemblyEnd(RHS);
>>
>> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>>
>
> *I don't believe , I am doing anything special, if possible I would like
> to set the whole csr matrix at once in one command.*
> *I took a look at the code for MatSetValues, if I am understanding it
> correctly(hopefully) I think I could do it, maybe modify it or create a new
> routine entirely for this pupose.*
>
> *i.e. MatSetValuesFromCSR(.....)*
> *Or is there a particular reason why it has to be this way*
>
> I also tried ksp ex3 but I slightly tweaked it to add a logging stage
> around the assembly and MatSetValues and I am attaching the modified
> example here as well.
> Although in this example the matrix stash is not empty ( means
> off-processor values are being set ) but the timing values for roughly the
> same matrix size , the command I used is
> mpirun -np 6 ./mod_ksp_ex3 -m 1000 -log_view -info
>
>
> Regards,
> Kamra
>
> On Sat, Jul 13, 2019 at 1:43 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Fri, Jul 12, 2019 at 10:51 PM Mohammed Mostafa <
>> mo7ammedmostafa at gmail.com> wrote:
>>
>>> Hello Matt,
>>> Attached is the dumped entire log output using -log_view and -info.
>>>
>>
>> In matrix construction, it looks like you have a mixture of load
>> imbalance (see the imbalance in the Begin events)
>> and lots of Scatter messages in your assembly. We turn off MatSetValues()
>> logging by default since it is usually
>> called many times, but you can explicitly turn it back on if you want. I
>> don't think that is the problem here. Its easy
>> to see from examples (say SNES ex5) that it is not the major time sink.
>> What is the Scatter doing?
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Thanks,
>>> Kamra
>>>
>>> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>>
>>>>> Hello all,
>>>>> I have a few question regarding Petsc,
>>>>>
>>>>
>>>> Please send the entire output of a run with all the logging turned on,
>>>> using -log_view and -info.
>>>>
>>>>   Thanks,
>>>>
>>>>     Matt
>>>>
>>>>
>>>>> Question 1:
>>>>> For the profiling , is it possible to only show the user defined log
>>>>> events in the breakdown of each stage in Log-view.
>>>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>>>>  PetscLogEventExcludeClass(MAT_CLASSID);
>>>>> PetscLogEventExcludeClass(VEC_CLASSID);
>>>>> PetscLogEventExcludeClass(KSP_CLASSID);
>>>>> PetscLogEventExcludeClass(PC_CLASSID);
>>>>> which should "Deactivates event logging for a PETSc object class in
>>>>> every stage" according to the manual.
>>>>> however I still see them in the stage breakdown
>>>>> --- Event Stage 1: Matrix Construction
>>>>>
>>>>> BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>> VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>> VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01
>>>>> 2.1e+03 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
>>>>> VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01
>>>>> 5.3e+02 8.0e+00  0  0  3  0  6  10  0 50 20100     0
>>>>> AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01
>>>>> 1.3e+03 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
>>>>> myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
>>>>> setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
>>>>> setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
>>>>> callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
>>>>>
>>>>> Also is possible to clear the logs so that I can write a  separate
>>>>> profiling output file for each timestep ( since I am solving a transient
>>>>> problem and I want to know the change in performance as time goes by )
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> Question 2:
>>>>> Regarding MatSetValues
>>>>> Right now, I writing a finite volume code, due to algorithm
>>>>> requirement I have to write the matrix into local native format ( array of
>>>>> arrays) and then loop through rows and use MatSetValues to set the elements
>>>>> in "Mat A"
>>>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>>>> but it is very slow and it is killing my performance
>>>>> although the matrix was properly set using
>>>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>>>>> PETSC_DETERMINE,
>>>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>>>> with d_nnz,and  o_nnz properly assigned so no mallocs occur during
>>>>> matsetvalues and all inserted values are local so no off-processor values
>>>>> So my question is it possible to set multiple rows at once hopefully
>>>>> all, I checked the manual and MatSetValues can only set dense matrix block
>>>>> because it seems that row by row is expensive
>>>>> Or perhaps is it possible to copy all rows to the underlying matrix
>>>>> data, as I mentioned all values are local and no off-processor values (
>>>>> stash is 0 )
>>>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
>>>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>>>>> mallocs.
>>>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>>>>> space: 0 unneeded,743028 used
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>> space: 0 unneeded,742972 used
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>> space: 0 unneeded,743093 used
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>> space: 0 unneeded,743036 used
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>> space: 0 unneeded,742938 used
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>> space: 0 unneeded,743049 used
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space:
>>>>> 0 unneeded,685 used
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space:
>>>>> 0 unneeded,649 used
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage
>>>>> space: 0 unneeded,1011 used
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage
>>>>> space: 0 unneeded,1137 used
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space:
>>>>> 0 unneeded,658 used
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space:
>>>>> 0 unneeded,648 used
>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> Question 3:
>>>>> If all matrix and vector inserted data are local, what part of the
>>>>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>>>>> more time than matrix builder
>>>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>>>>
>>>>>
>>>>> For context the matrix in the above is nearly 1Mx1M partitioned over
>>>>> six processes and it was NOT built using DM
>>>>>
>>>>> Finally the configure options are:
>>>>>
>>>>> Configure options:
>>>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>>>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>>>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>>>>> --with-fc=mpif90 --download-metis --download-hypre
>>>>>
>>>>> Sorry for such long question and thanks in advance
>>>>> Thanks
>>>>> M. Kamra
>>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190713/34258d46/attachment.html>


More information about the petsc-users mailing list