[petsc-users] Various Questions Regarding PETSC

Mohammed Mostafa mo7ammedmostafa at gmail.com
Sat Jul 13 13:07:42 CDT 2019


This log is for 100 time-steps, not a single time step


On Sun, Jul 14, 2019 at 3:01 AM Mark Adams <mfadams at lbl.gov> wrote:

> You call the assembly stuff a lot (200). BuildTwoSidedF is a global thing
> and is taking a lot of time. You should just call these once per time step
> (it looks like you are just doing one time step).
>
>
> --- Event Stage 1: Matrix Construction
>
> BuildTwoSidedF       400 1.0 6.5222e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
> VecSet                 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyBegin     200 1.0 6.2633e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
> VecAssemblyEnd       200 1.0 6.7163e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin      200 1.0 5.9373e-03 2.2 0.00e+00 0.0 3.6e+03 2.1e+03 0.0e+00  0  0 79  2  0   0  0 99100  0     0
> VecScatterEnd        200 1.0 2.7236e-0223.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin     200 1.0 3.2747e-02 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd       200 1.0 9.0972e-01 1.0 0.00e+00 0.0 3.6e+01 5.3e+02 8.0e+00  4  0  1  0  6   9  0  1  0100     0
> AssembleMats         200 1.0 1.5568e+00 1.2 0.00e+00 0.0 3.6e+03 2.1e+03 8.0e+00  6  0 79  2  6  14  0100100100     0
> myMatSetValues       200 1.0 2.5367e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11  0  0  0  0  25  0  0  0  0     0
> setNativeMat         100 1.0 2.8223e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  28  0  0  0  0     0
> setNativeMatII       100 1.0 3.2174e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  31  0  0  0  0     0
> callScheme           100 1.0 2.0700e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
>
>
>
> On Fri, Jul 12, 2019 at 11:56 PM Mohammed Mostafa via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Hello Matt,
>> Attached is the dumped entire log output using -log_view and -info.
>>
>> Thanks,
>> Kamra
>>
>> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
>> wrote:
>>
>>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>>> petsc-users at mcs.anl.gov> wrote:
>>>
>>>> Hello all,
>>>> I have a few question regarding Petsc,
>>>>
>>>
>>> Please send the entire output of a run with all the logging turned on,
>>> using -log_view and -info.
>>>
>>>   Thanks,
>>>
>>>     Matt
>>>
>>>
>>>> Question 1:
>>>> For the profiling , is it possible to only show the user defined log
>>>> events in the breakdown of each stage in Log-view.
>>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>>>  PetscLogEventExcludeClass(MAT_CLASSID);
>>>> PetscLogEventExcludeClass(VEC_CLASSID);
>>>> PetscLogEventExcludeClass(KSP_CLASSID);
>>>> PetscLogEventExcludeClass(PC_CLASSID);
>>>> which should "Deactivates event logging for a PETSc object class in
>>>> every stage" according to the manual.
>>>> however I still see them in the stage breakdown
>>>> --- Event Stage 1: Matrix Construction
>>>>
>>>> BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>> VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>> VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01
>>>> 2.1e+03 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
>>>> VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>> MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01
>>>> 5.3e+02 8.0e+00  0  0  3  0  6  10  0 50 20100     0
>>>> AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01
>>>> 1.3e+03 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
>>>> myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
>>>> setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
>>>> setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
>>>> callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
>>>>
>>>> Also is possible to clear the logs so that I can write a  separate
>>>> profiling output file for each timestep ( since I am solving a transient
>>>> problem and I want to know the change in performance as time goes by )
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> Question 2:
>>>> Regarding MatSetValues
>>>> Right now, I writing a finite volume code, due to algorithm requirement
>>>> I have to write the matrix into local native format ( array of arrays) and
>>>> then loop through rows and use MatSetValues to set the elements in "Mat A"
>>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>>> but it is very slow and it is killing my performance
>>>> although the matrix was properly set using
>>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>>>> PETSC_DETERMINE,
>>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>>> with d_nnz,and  o_nnz properly assigned so no mallocs occur during
>>>> matsetvalues and all inserted values are local so no off-processor values
>>>> So my question is it possible to set multiple rows at once hopefully
>>>> all, I checked the manual and MatSetValues can only set dense matrix block
>>>> because it seems that row by row is expensive
>>>> Or perhaps is it possible to copy all rows to the underlying matrix
>>>> data, as I mentioned all values are local and no off-processor values (
>>>> stash is 0 )
>>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
>>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>>>> mallocs.
>>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>>>> space: 0 unneeded,743028 used
>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>> space: 0 unneeded,742972 used
>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>> space: 0 unneeded,743093 used
>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>> space: 0 unneeded,743036 used
>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>> space: 0 unneeded,742938 used
>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>> space: 0 unneeded,743049 used
>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space:
>>>> 0 unneeded,685 used
>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space:
>>>> 0 unneeded,649 used
>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage space:
>>>> 0 unneeded,1011 used
>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage space:
>>>> 0 unneeded,1137 used
>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space:
>>>> 0 unneeded,658 used
>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space:
>>>> 0 unneeded,648 used
>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
>>>> 0
>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> Question 3:
>>>> If all matrix and vector inserted data are local, what part of the
>>>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>>>> more time than matrix builder
>>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>>>
>>>>
>>>> For context the matrix in the above is nearly 1Mx1M partitioned over
>>>> six processes and it was NOT built using DM
>>>>
>>>> Finally the configure options are:
>>>>
>>>> Configure options:
>>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>>>> --with-fc=mpif90 --download-metis --download-hypre
>>>>
>>>> Sorry for such long question and thanks in advance
>>>> Thanks
>>>> M. Kamra
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190714/9097a21b/attachment-0001.html>


More information about the petsc-users mailing list