[petsc-users] Various Questions Regarding PETSC

Mark Adams mfadams at lbl.gov
Sat Jul 13 13:03:02 CDT 2019


You call the assembly stuff a lot (200). BuildTwoSidedF is a global thing
and is taking a lot of time. You should just call these once per time step
(it looks like you are just doing one time step).


--- Event Stage 1: Matrix Construction

BuildTwoSidedF       400 1.0 6.5222e-01 2.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
VecSet                 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin     200 1.0 6.2633e-01 1.9 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
VecAssemblyEnd       200 1.0 6.7163e-04 1.3 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      200 1.0 5.9373e-03 2.2 0.00e+00 0.0 3.6e+03
2.1e+03 0.0e+00  0  0 79  2  0   0  0 99100  0     0
VecScatterEnd        200 1.0 2.7236e-0223.3 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin     200 1.0 3.2747e-02 5.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       200 1.0 9.0972e-01 1.0 0.00e+00 0.0 3.6e+01
5.3e+02 8.0e+00  4  0  1  0  6   9  0  1  0100     0
AssembleMats         200 1.0 1.5568e+00 1.2 0.00e+00 0.0 3.6e+03
2.1e+03 8.0e+00  6  0 79  2  6  14  0100100100     0
myMatSetValues       200 1.0 2.5367e+00 1.1 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 11  0  0  0  0  25  0  0  0  0     0
setNativeMat         100 1.0 2.8223e+00 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 12  0  0  0  0  28  0  0  0  0     0
setNativeMatII       100 1.0 3.2174e+00 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 14  0  0  0  0  31  0  0  0  0     0
callScheme           100 1.0 2.0700e-01 1.2 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0



On Fri, Jul 12, 2019 at 11:56 PM Mohammed Mostafa via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hello Matt,
> Attached is the dumped entire log output using -log_view and -info.
>
> Thanks,
> Kamra
>
> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>> Hello all,
>>> I have a few question regarding Petsc,
>>>
>>
>> Please send the entire output of a run with all the logging turned on,
>> using -log_view and -info.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> Question 1:
>>> For the profiling , is it possible to only show the user defined log
>>> events in the breakdown of each stage in Log-view.
>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>>  PetscLogEventExcludeClass(MAT_CLASSID);
>>> PetscLogEventExcludeClass(VEC_CLASSID);
>>> PetscLogEventExcludeClass(KSP_CLASSID);
>>> PetscLogEventExcludeClass(PC_CLASSID);
>>> which should "Deactivates event logging for a PETSc object class in
>>> every stage" according to the manual.
>>> however I still see them in the stage breakdown
>>> --- Event Stage 1: Matrix Construction
>>>
>>> BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>> VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>> VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01 2.1e+03
>>> 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
>>> VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01 5.3e+02
>>> 8.0e+00  0  0  3  0  6  10  0 50 20100     0
>>> AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01 1.3e+03
>>> 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
>>> myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
>>> setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
>>> setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
>>> callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
>>>
>>> Also is possible to clear the logs so that I can write a  separate
>>> profiling output file for each timestep ( since I am solving a transient
>>> problem and I want to know the change in performance as time goes by )
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Question 2:
>>> Regarding MatSetValues
>>> Right now, I writing a finite volume code, due to algorithm requirement
>>> I have to write the matrix into local native format ( array of arrays) and
>>> then loop through rows and use MatSetValues to set the elements in "Mat A"
>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>> but it is very slow and it is killing my performance
>>> although the matrix was properly set using
>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>>> PETSC_DETERMINE,
>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>> with d_nnz,and  o_nnz properly assigned so no mallocs occur during
>>> matsetvalues and all inserted values are local so no off-processor values
>>> So my question is it possible to set multiple rows at once hopefully
>>> all, I checked the manual and MatSetValues can only set dense matrix block
>>> because it seems that row by row is expensive
>>> Or perhaps is it possible to copy all rows to the underlying matrix
>>> data, as I mentioned all values are local and no off-processor values (
>>> stash is 0 )
>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>>> mallocs.
>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>>> space: 0 unneeded,743028 used
>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>> space: 0 unneeded,742972 used
>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>> space: 0 unneeded,743093 used
>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>> space: 0 unneeded,743036 used
>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>> space: 0 unneeded,742938 used
>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>> space: 0 unneeded,743049 used
>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space: 0
>>> unneeded,685 used
>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space: 0
>>> unneeded,649 used
>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage space:
>>> 0 unneeded,1011 used
>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage space:
>>> 0 unneeded,1137 used
>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space: 0
>>> unneeded,658 used
>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space: 0
>>> unneeded,648 used
>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Question 3:
>>> If all matrix and vector inserted data are local, what part of the
>>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>>> more time than matrix builder
>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>>
>>>
>>> For context the matrix in the above is nearly 1Mx1M partitioned over six
>>> processes and it was NOT built using DM
>>>
>>> Finally the configure options are:
>>>
>>> Configure options:
>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>>> --with-fc=mpif90 --download-metis --download-hypre
>>>
>>> Sorry for such long question and thanks in advance
>>> Thanks
>>> M. Kamra
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190713/a3510c04/attachment.html>


More information about the petsc-users mailing list