[petsc-users] Various Questions Regarding PETSC

Sat Jul 13 13:43:08 CDT 2019

Sorry about that
I wanted to see if the assembly cost would drop with subsequent time steps
but it was taking too long to run so I set it to solve only once since I
was only interested in profiling the matrix builder.
Again sorry for that
Kamra

On Sun, Jul 14, 2019 at 3:33 AM Mark Adams <mfadams at lbl.gov> wrote:

> Ok, I only see one all to KSPSolve.
>
> On Sat, Jul 13, 2019 at 2:08 PM Mohammed Mostafa <
> mo7ammedmostafa at gmail.com> wrote:
>
>> This log is for 100 time-steps, not a single time step
>>
>>
>> On Sun, Jul 14, 2019 at 3:01 AM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> You call the assembly stuff a lot (200). BuildTwoSidedF is a global
>>> thing and is taking a lot of time. You should just call these once per time
>>> step (it looks like you are just doing one time step).
>>>
>>>
>>> --- Event Stage 1: Matrix Construction
>>>
>>> BuildTwoSidedF       400 1.0 6.5222e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
>>> VecSet                 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyBegin     200 1.0 6.2633e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
>>> VecAssemblyEnd       200 1.0 6.7163e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin      200 1.0 5.9373e-03 2.2 0.00e+00 0.0 3.6e+03 2.1e+03 0.0e+00  0  0 79  2  0   0  0 99100  0     0
>>> VecScatterEnd        200 1.0 2.7236e-0223.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyBegin     200 1.0 3.2747e-02 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd       200 1.0 9.0972e-01 1.0 0.00e+00 0.0 3.6e+01 5.3e+02 8.0e+00  4  0  1  0  6   9  0  1  0100     0
>>> AssembleMats         200 1.0 1.5568e+00 1.2 0.00e+00 0.0 3.6e+03 2.1e+03 8.0e+00  6  0 79  2  6  14  0100100100     0
>>> myMatSetValues       200 1.0 2.5367e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11  0  0  0  0  25  0  0  0  0     0
>>> setNativeMat         100 1.0 2.8223e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  28  0  0  0  0     0
>>> setNativeMatII       100 1.0 3.2174e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  31  0  0  0  0     0
>>> callScheme           100 1.0 2.0700e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
>>>
>>>
>>>
>>> On Fri, Jul 12, 2019 at 11:56 PM Mohammed Mostafa via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>>
>>>> Hello Matt,
>>>> Attached is the dumped entire log output using -log_view and -info.
>>>>
>>>> Thanks,
>>>> Kamra
>>>>
>>>> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>>>>> petsc-users at mcs.anl.gov> wrote:
>>>>>
>>>>>> Hello all,
>>>>>> I have a few question regarding Petsc,
>>>>>>
>>>>>
>>>>> Please send the entire output of a run with all the logging turned on,
>>>>> using -log_view and -info.
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>     Matt
>>>>>
>>>>>
>>>>>> Question 1:
>>>>>> For the profiling , is it possible to only show the user defined log
>>>>>> events in the breakdown of each stage in Log-view.
>>>>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>>>>>  PetscLogEventExcludeClass(MAT_CLASSID);
>>>>>> PetscLogEventExcludeClass(VEC_CLASSID);
>>>>>> PetscLogEventExcludeClass(KSP_CLASSID);
>>>>>> PetscLogEventExcludeClass(PC_CLASSID);
>>>>>> which should "Deactivates event logging for a PETSc object class in
>>>>>> every stage" according to the manual.
>>>>>> however I still see them in the stage breakdown
>>>>>> --- Event Stage 1: Matrix Construction
>>>>>>
>>>>>> BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>>> VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>> VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>>> VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>> VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01
>>>>>> 2.1e+03 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
>>>>>> VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>> MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>>> MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01
>>>>>> 5.3e+02 8.0e+00  0  0  3  0  6  10  0 50 20100     0
>>>>>> AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01
>>>>>> 1.3e+03 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
>>>>>> myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
>>>>>> setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
>>>>>> setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
>>>>>> callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00
>>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
>>>>>>
>>>>>> Also is possible to clear the logs so that I can write a  separate
>>>>>> profiling output file for each timestep ( since I am solving a transient
>>>>>> problem and I want to know the change in performance as time goes by )
>>>>>>
>>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>> Question 2:
>>>>>> Regarding MatSetValues
>>>>>> Right now, I writing a finite volume code, due to algorithm
>>>>>> requirement I have to write the matrix into local native format ( array of
>>>>>> arrays) and then loop through rows and use MatSetValues to set the elements
>>>>>> in "Mat A"
>>>>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>>>>> but it is very slow and it is killing my performance
>>>>>> although the matrix was properly set using
>>>>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>>>>>> PETSC_DETERMINE,
>>>>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>>>>> with d_nnz,and  o_nnz properly assigned so no mallocs occur during
>>>>>> matsetvalues and all inserted values are local so no off-processor values
>>>>>> So my question is it possible to set multiple rows at once hopefully
>>>>>> all, I checked the manual and MatSetValues can only set dense matrix block
>>>>>> because it seems that row by row is expensive
>>>>>> Or perhaps is it possible to copy all rows to the underlying matrix
>>>>>> data, as I mentioned all values are local and no off-processor values (
>>>>>> stash is 0 )
>>>>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
>>>>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>>>>>> mallocs.
>>>>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>>>>>> space: 0 unneeded,743028 used
>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>>> space: 0 unneeded,742972 used
>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>>> space: 0 unneeded,743093 used
>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>>> space: 0 unneeded,743036 used
>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>>> space: 0 unneeded,742938 used
>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>>> space: 0 unneeded,743049 used
>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage
>>>>>> space: 0 unneeded,685 used
>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage
>>>>>> space: 0 unneeded,649 used
>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage
>>>>>> space: 0 unneeded,1011 used
>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage
>>>>>> space: 0 unneeded,1137 used
>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage
>>>>>> space: 0 unneeded,658 used
>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage
>>>>>> space: 0 unneeded,648 used
>>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>>> is 0
>>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>> Question 3:
>>>>>> If all matrix and vector inserted data are local, what part of the
>>>>>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>>>>>> more time than matrix builder
>>>>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>>>>>
>>>>>>
>>>>>> For context the matrix in the above is nearly 1Mx1M partitioned over
>>>>>> six processes and it was NOT built using DM
>>>>>>
>>>>>> Finally the configure options are:
>>>>>>
>>>>>> Configure options:
>>>>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>>>>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>>>>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>>>>>> --with-fc=mpif90 --download-metis --download-hypre
>>>>>>
>>>>>> Sorry for such long question and thanks in advance
>>>>>> Thanks
>>>>>> M. Kamra
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190714/240c8077/attachment.html>