[petsc-users] Various Questions Regarding PETSC

Mark Adams mfadams at lbl.gov
Sat Jul 13 13:35:27 CDT 2019


Ok, I only see one all to KSPSolve.

On Sat, Jul 13, 2019 at 2:08 PM Mohammed Mostafa <mo7ammedmostafa at gmail.com>
wrote:

> This log is for 100 time-steps, not a single time step
>
>
> On Sun, Jul 14, 2019 at 3:01 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> You call the assembly stuff a lot (200). BuildTwoSidedF is a global thing
>> and is taking a lot of time. You should just call these once per time step
>> (it looks like you are just doing one time step).
>>
>>
>> --- Event Stage 1: Matrix Construction
>>
>> BuildTwoSidedF       400 1.0 6.5222e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
>> VecSet                 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAssemblyBegin     200 1.0 6.2633e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   5  0  0  0  0     0
>> VecAssemblyEnd       200 1.0 6.7163e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecScatterBegin      200 1.0 5.9373e-03 2.2 0.00e+00 0.0 3.6e+03 2.1e+03 0.0e+00  0  0 79  2  0   0  0 99100  0     0
>> VecScatterEnd        200 1.0 2.7236e-0223.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyBegin     200 1.0 3.2747e-02 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd       200 1.0 9.0972e-01 1.0 0.00e+00 0.0 3.6e+01 5.3e+02 8.0e+00  4  0  1  0  6   9  0  1  0100     0
>> AssembleMats         200 1.0 1.5568e+00 1.2 0.00e+00 0.0 3.6e+03 2.1e+03 8.0e+00  6  0 79  2  6  14  0100100100     0
>> myMatSetValues       200 1.0 2.5367e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11  0  0  0  0  25  0  0  0  0     0
>> setNativeMat         100 1.0 2.8223e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  28  0  0  0  0     0
>> setNativeMatII       100 1.0 3.2174e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  31  0  0  0  0     0
>> callScheme           100 1.0 2.0700e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
>>
>>
>>
>> On Fri, Jul 12, 2019 at 11:56 PM Mohammed Mostafa via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>> Hello Matt,
>>> Attached is the dumped entire log output using -log_view and -info.
>>>
>>> Thanks,
>>> Kamra
>>>
>>> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>>>> petsc-users at mcs.anl.gov> wrote:
>>>>
>>>>> Hello all,
>>>>> I have a few question regarding Petsc,
>>>>>
>>>>
>>>> Please send the entire output of a run with all the logging turned on,
>>>> using -log_view and -info.
>>>>
>>>>   Thanks,
>>>>
>>>>     Matt
>>>>
>>>>
>>>>> Question 1:
>>>>> For the profiling , is it possible to only show the user defined log
>>>>> events in the breakdown of each stage in Log-view.
>>>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>>>>  PetscLogEventExcludeClass(MAT_CLASSID);
>>>>> PetscLogEventExcludeClass(VEC_CLASSID);
>>>>> PetscLogEventExcludeClass(KSP_CLASSID);
>>>>> PetscLogEventExcludeClass(PC_CLASSID);
>>>>> which should "Deactivates event logging for a PETSc object class in
>>>>> every stage" according to the manual.
>>>>> however I still see them in the stage breakdown
>>>>> --- Event Stage 1: Matrix Construction
>>>>>
>>>>> BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>> VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>>>> VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01
>>>>> 2.1e+03 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
>>>>> VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>>>> MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01
>>>>> 5.3e+02 8.0e+00  0  0  3  0  6  10  0 50 20100     0
>>>>> AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01
>>>>> 1.3e+03 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
>>>>> myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
>>>>> setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
>>>>> setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
>>>>> callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00
>>>>> 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
>>>>>
>>>>> Also is possible to clear the logs so that I can write a  separate
>>>>> profiling output file for each timestep ( since I am solving a transient
>>>>> problem and I want to know the change in performance as time goes by )
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> Question 2:
>>>>> Regarding MatSetValues
>>>>> Right now, I writing a finite volume code, due to algorithm
>>>>> requirement I have to write the matrix into local native format ( array of
>>>>> arrays) and then loop through rows and use MatSetValues to set the elements
>>>>> in "Mat A"
>>>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>>>> but it is very slow and it is killing my performance
>>>>> although the matrix was properly set using
>>>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>>>>> PETSC_DETERMINE,
>>>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>>>> with d_nnz,and  o_nnz properly assigned so no mallocs occur during
>>>>> matsetvalues and all inserted values are local so no off-processor values
>>>>> So my question is it possible to set multiple rows at once hopefully
>>>>> all, I checked the manual and MatSetValues can only set dense matrix block
>>>>> because it seems that row by row is expensive
>>>>> Or perhaps is it possible to copy all rows to the underlying matrix
>>>>> data, as I mentioned all values are local and no off-processor values (
>>>>> stash is 0 )
>>>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
>>>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>>>>> mallocs.
>>>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>>>>> space: 0 unneeded,743028 used
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>> space: 0 unneeded,742972 used
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>> space: 0 unneeded,743093 used
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>> space: 0 unneeded,743036 used
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>>>> space: 0 unneeded,742938 used
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>>>> space: 0 unneeded,743049 used
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space:
>>>>> 0 unneeded,685 used
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space:
>>>>> 0 unneeded,649 used
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage
>>>>> space: 0 unneeded,1011 used
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage
>>>>> space: 0 unneeded,1137 used
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space:
>>>>> 0 unneeded,658 used
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space:
>>>>> 0 unneeded,648 used
>>>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
>>>>> is 0
>>>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>> Question 3:
>>>>> If all matrix and vector inserted data are local, what part of the
>>>>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>>>>> more time than matrix builder
>>>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>>>>
>>>>>
>>>>> For context the matrix in the above is nearly 1Mx1M partitioned over
>>>>> six processes and it was NOT built using DM
>>>>>
>>>>> Finally the configure options are:
>>>>>
>>>>> Configure options:
>>>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>>>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>>>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>>>>> --with-fc=mpif90 --download-metis --download-hypre
>>>>>
>>>>> Sorry for such long question and thanks in advance
>>>>> Thanks
>>>>> M. Kamra
>>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190713/37faa76c/attachment.html>


More information about the petsc-users mailing list