[petsc-users] Various Questions Regarding PETSC
Matthew Knepley
knepley at gmail.com
Sat Jul 13 11:51:55 CDT 2019
On Sat, Jul 13, 2019 at 11:20 AM Mohammed Mostafa <mo7ammedmostafa at gmail.com>
wrote:
> I am sorry but I don’t see what you mean by small times
> Although mat assembly is relatively smaller
> The cost of mat set values is still significant
> The same can be said for vec assembly
> Combined vec/mat assembly and matsetvalues constitute about 50% of the
> total cost of matrix construction
>
This is why I asked you about scaling, since it is difficult to disentangle
overheads from scalable work at 10^-2 seconds.
Second, if you look at the times for the PETSc example you ran, setvalues
and assembly is clearly not 50% of the
time for construction. I cannot see your code, so we do not know exactly
what the custom events are timing. I
would think about incrementally changing the example until you get what you
want.
Thanks,
Matt
> So is this problem of my matrix setup/ preallocation
>
> Or is this a hardware issue, for whatever reason the copy is overly slow
> The code was run on a single node
>
> Or is this function call overhead since matsetvalues is being called 1M
> times inside the for loop ( 170k times in each process)
>
> Thanks, Kamra
>
> On Sun, Jul 14, 2019 at 12:41 AM Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Sat, Jul 13, 2019 at 9:56 AM Mohammed Mostafa <
>> mo7ammedmostafa at gmail.com> wrote:
>>
>>> Hello Matt,
>>>
>>> I revised my code and changed the way I create the rhs vector,
>>> previosly I was using vecCreateGhost just in case I need the ghost
>>> values, but for now I changed that to
>>> vecCreateMPI(.......)
>>> So maybe that was the cause of the scatter
>>> I am attaching with this email a new log output
>>>
>>
>> Okay, the times are now very small. How does it scale up?
>>
>> Thanks,
>>
>> Matt
>>
>>
>>> Also regarding how I fill my petsc matrix,
>>> In my code I fill a temp CSR format matrix becasue otherwise I would
>>> need "MatSetValue" to fill the petsc mat element by element
>>> which is not recommmeded in the petsc manual and probably very expensive
>>> due to function call overhead
>>> *So after I create my matrix in CSR format, I fill the PETSC mat A as
>>> follows*
>>>
>>>> for (i = 0; i < nMatRows; i++) {
>>>> cffset = CSR_iptr[i];
>>>> row_index = row_gIndex[i];
>>>> nj = Eqn_nj[i];
>>>> MatSetValues(PhiEqnSolver.A, 1, &row_index, nj, CSR_jptr + offset,
>>>> CSR_vptr + offset, INSERT_VALUES);
>>>> }
>>>>
>>> *After That*
>>>
>>>> VecAssemblyBegin(RHS);
>>>> VecAssemblyEnd(RHS);
>>>>
>>>> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>>>> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>>>>
>>>
>>> *I don't believe , I am doing anything special, if possible I would like
>>> to set the whole csr matrix at once in one command.*
>>> *I took a look at the code for MatSetValues, if I am understanding it
>>> correctly(hopefully) I think I could do it, maybe modify it or create a new
>>> routine entirely for this pupose.*
>>>
>>> *i.e. MatSetValuesFromCSR(.....)*
>>> *Or is there a particular reason why it has to be this way*
>>>
>>> I also tried ksp ex3 but I slightly tweaked it to add a logging stage
>>> around the assembly and MatSetValues and I am attaching the modified
>>> example here as well.
>>> Although in this example the matrix stash is not empty ( means
>>> off-processor values are being set ) but the timing values for roughly the
>>> same matrix size , the command I used is
>>> mpirun -np 6 ./mod_ksp_ex3 -m 1000 -log_view -info
>>>
>>>
>>> Regards,
>>> Kamra
>>>
>>> On Sat, Jul 13, 2019 at 1:43 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Fri, Jul 12, 2019 at 10:51 PM Mohammed Mostafa <
>>>> mo7ammedmostafa at gmail.com> wrote:
>>>>
>>>>> Hello Matt,
>>>>> Attached is the dumped entire log output using -log_view and -info.
>>>>>
>>>>
>>>> In matrix construction, it looks like you have a mixture of load
>>>> imbalance (see the imbalance in the Begin events)
>>>> and lots of Scatter messages in your assembly. We turn off
>>>> MatSetValues() logging by default since it is usually
>>>> called many times, but you can explicitly turn it back on if you want.
>>>> I don't think that is the problem here. Its easy
>>>> to see from examples (say SNES ex5) that it is not the major time sink.
>>>> What is the Scatter doing?
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>
>>>>> Thanks,
>>>>> Kamra
>>>>>
>>>>> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>> I have a few question regarding Petsc,
>>>>>>>
>>>>>>
>>>>>> Please send the entire output of a run with all the logging turned
>>>>>> on, using -log_view and -info.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>>> Question 1:
>>>>>>> For the profiling , is it possible to only show the user defined log
>>>>>>> events in the breakdown of each stage in Log-view.
>>>>>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>>>>>> PetscLogEventExcludeClass(MAT_CLASSID);
>>>>>>> PetscLogEventExcludeClass(VEC_CLASSID);
>>>>>>> PetscLogEventExcludeClass(KSP_CLASSID);
>>>>>>> PetscLogEventExcludeClass(PC_CLASSID);
>>>>>>> which should "Deactivates event logging for a PETSc object class in
>>>>>>> every stage" according to the manual.
>>>>>>> however I still see them in the stage breakdown
>>>>>>> --- Event Stage 1: Matrix Construction
>>>>>>>
>>>>>>> BuildTwoSidedF 4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0
>>>>>>> VecSet 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>>>>>> VecAssemblyBegin 2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 18 0 0 0 0 0
>>>>>>> VecAssemblyEnd 2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>>>>>> VecScatterBegin 2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01
>>>>>>> 2.1e+03 0.0e+00 0 0 3 0 0 0 0 50 80 0 0
>>>>>>> VecScatterEnd 2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>>>>>> MatAssemblyBegin 2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>>>>>> MatAssemblyEnd 2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01
>>>>>>> 5.3e+02 8.0e+00 0 0 3 0 6 10 0 50 20100 0
>>>>>>> AssembleMats 2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01
>>>>>>> 1.3e+03 8.0e+00 0 0 7 0 6 28 0100100100 0 # USER EVENT
>>>>>>> myMatSetValues 2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 19 0 0 0 0 0 # USER EVENT
>>>>>>> setNativeMat 1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 24 0 0 0 0 0 # USER EVENT
>>>>>>> setNativeMatII 1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 28 0 0 0 0 0 # USER EVENT
>>>>>>> callScheme 1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00
>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0 # USER EVENT
>>>>>>>
>>>>>>> Also is possible to clear the logs so that I can write a separate
>>>>>>> profiling output file for each timestep ( since I am solving a transient
>>>>>>> problem and I want to know the change in performance as time goes by )
>>>>>>>
>>>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>> Question 2:
>>>>>>> Regarding MatSetValues
>>>>>>> Right now, I writing a finite volume code, due to algorithm
>>>>>>> requirement I have to write the matrix into local native format ( array of
>>>>>>> arrays) and then loop through rows and use MatSetValues to set the elements
>>>>>>> in "Mat A"
>>>>>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>>>>>> but it is very slow and it is killing my performance
>>>>>>> although the matrix was properly set using
>>>>>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>>>>>>> PETSC_DETERMINE,
>>>>>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>>>>>> with d_nnz,and o_nnz properly assigned so no mallocs occur during
>>>>>>> matsetvalues and all inserted values are local so no off-processor values
>>>>>>> So my question is it possible to set multiple rows at once hopefully
>>>>>>> all, I checked the manual and MatSetValues can only set dense matrix block
>>>>>>> because it seems that row by row is expensive
>>>>>>> Or perhaps is it possible to copy all rows to the underlying matrix
>>>>>>> data, as I mentioned all values are local and no off-processor values (
>>>>>>> stash is 0 )
>>>>>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>>>>>>> mallocs.
>>>>>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>>>>>>> space: 0 unneeded,743028 used
>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>>>> space: 0 unneeded,742972 used
>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>>>> space: 0 unneeded,743093 used
>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>>>> space: 0 unneeded,743036 used
>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>>>> space: 0 unneeded,742938 used
>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>>>> space: 0 unneeded,743049 used
>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage
>>>>>>> space: 0 unneeded,685 used
>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage
>>>>>>> space: 0 unneeded,649 used
>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage
>>>>>>> space: 0 unneeded,1011 used
>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage
>>>>>>> space: 0 unneeded,1137 used
>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage
>>>>>>> space: 0 unneeded,658 used
>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage
>>>>>>> space: 0 unneeded,648 used
>>>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>>> is 0
>>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>> Question 3:
>>>>>>> If all matrix and vector inserted data are local, what part of the
>>>>>>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>>>>>>> more time than matrix builder
>>>>>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>>>>>>
>>>>>>>
>>>>>>> For context the matrix in the above is nearly 1Mx1M partitioned over
>>>>>>> six processes and it was NOT built using DM
>>>>>>>
>>>>>>> Finally the configure options are:
>>>>>>>
>>>>>>> Configure options:
>>>>>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>>>>>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>>>>>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>>>>>>> --with-fc=mpif90 --download-metis --download-hypre
>>>>>>>
>>>>>>> Sorry for such long question and thanks in advance
>>>>>>> Thanks
>>>>>>> M. Kamra
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190713/dd3b8923/attachment-0001.html>
More information about the petsc-users
mailing list