[petsc-users] Various Questions Regarding PETSC

Matthew Knepley knepley at gmail.com
Fri Jul 12 23:42:48 CDT 2019


On Fri, Jul 12, 2019 at 10:51 PM Mohammed Mostafa <mo7ammedmostafa at gmail.com>
wrote:

> Hello Matt,
> Attached is the dumped entire log output using -log_view and -info.
>

In matrix construction, it looks like you have a mixture of load imbalance
(see the imbalance in the Begin events)
and lots of Scatter messages in your assembly. We turn off MatSetValues()
logging by default since it is usually
called many times, but you can explicitly turn it back on if you want. I
don't think that is the problem here. Its easy
to see from examples (say SNES ex5) that it is not the major time sink.
What is the Scatter doing?

  Thanks,

     Matt


> Thanks,
> Kamra
>
> On Fri, Jul 12, 2019 at 9:23 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Fri, Jul 12, 2019 at 5:19 AM Mohammed Mostafa via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>>> Hello all,
>>> I have a few question regarding Petsc,
>>>
>>
>> Please send the entire output of a run with all the logging turned on,
>> using -log_view and -info.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>> Question 1:
>>> For the profiling , is it possible to only show the user defined log
>>> events in the breakdown of each stage in Log-view.
>>> I tried deactivating all ClassIDs, MAT,VEC, PC, KSP,PC,
>>>  PetscLogEventExcludeClass(MAT_CLASSID);
>>> PetscLogEventExcludeClass(VEC_CLASSID);
>>> PetscLogEventExcludeClass(KSP_CLASSID);
>>> PetscLogEventExcludeClass(PC_CLASSID);
>>> which should "Deactivates event logging for a PETSc object class in
>>> every stage" according to the manual.
>>> however I still see them in the stage breakdown
>>> --- Event Stage 1: Matrix Construction
>>>
>>> BuildTwoSidedF         4 1.0 2.7364e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>> VecSet                 1 1.0 4.5300e-06 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecAssemblyBegin       2 1.0 2.7344e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  18  0  0  0  0     0
>>> VecAssemblyEnd         2 1.0 8.3447e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> VecScatterBegin        2 1.0 7.5102e-05 1.7 0.00e+00 0.0 3.6e+01 2.1e+03
>>> 0.0e+00  0  0  3  0  0   0  0 50 80  0     0
>>> VecScatterEnd          2 1.0 3.5286e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyBegin       2 1.0 8.8930e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatAssemblyEnd         2 1.0 1.3566e-02 1.1 0.00e+00 0.0 3.6e+01 5.3e+02
>>> 8.0e+00  0  0  3  0  6  10  0 50 20100     0
>>> AssembleMats           2 1.0 3.9774e-02 1.7 0.00e+00 0.0 7.2e+01 1.3e+03
>>> 8.0e+00  0  0  7  0  6  28  0100100100     0  # USER EVENT
>>> myMatSetValues         2 1.0 2.6931e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  19  0  0  0  0     0   # USER EVENT
>>> setNativeMat           1 1.0 3.5613e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  24  0  0  0  0     0   # USER EVENT
>>> setNativeMatII         1 1.0 4.7023e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0  28  0  0  0  0     0   # USER EVENT
>>> callScheme             1 1.0 2.2333e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   2  0  0  0  0     0   # USER EVENT
>>>
>>> Also is possible to clear the logs so that I can write a  separate
>>> profiling output file for each timestep ( since I am solving a transient
>>> problem and I want to know the change in performance as time goes by )
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Question 2:
>>> Regarding MatSetValues
>>> Right now, I writing a finite volume code, due to algorithm requirement
>>> I have to write the matrix into local native format ( array of arrays) and
>>> then loop through rows and use MatSetValues to set the elements in "Mat A"
>>> MatSetValues(A, 1, &row, nj, j_index, coefvalues, INSERT_VALUES);
>>> but it is very slow and it is killing my performance
>>> although the matrix was properly set using
>>> MatCreateAIJ(PETSC_COMM_WORLD, this->local_size, this->local_size,
>>> PETSC_DETERMINE,
>>> PETSC_DETERMINE, -1, d_nnz, -1, o_nnz, &A);
>>> with d_nnz,and  o_nnz properly assigned so no mallocs occur during
>>> matsetvalues and all inserted values are local so no off-processor values
>>> So my question is it possible to set multiple rows at once hopefully
>>> all, I checked the manual and MatSetValues can only set dense matrix block
>>> because it seems that row by row is expensive
>>> Or perhaps is it possible to copy all rows to the underlying matrix
>>> data, as I mentioned all values are local and no off-processor values (
>>> stash is 0 )
>>> [0] VecAssemblyBegin_MPI_BTS(): Stash has 0 entries, uses 0 mallocs.
>>> [0] VecAssemblyBegin_MPI_BTS(): Block-Stash has 0 entries, uses 0
>>> mallocs.
>>> [0] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [1] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [2] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [3] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [4] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [5] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries, uses 0 mallocs.
>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 186064; storage
>>> space: 0 unneeded,743028 used
>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>> space: 0 unneeded,742972 used
>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186064) < 0.6. Do not use CompressedRow routines.
>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>> space: 0 unneeded,743093 used
>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>> space: 0 unneeded,743036 used
>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 186062; storage
>>> space: 0 unneeded,742938 used
>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186062) < 0.6. Do not use CompressedRow routines.
>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 186063; storage
>>> space: 0 unneeded,743049 used
>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 4
>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 0)/(num_localrows 186063) < 0.6. Do not use CompressedRow routines.
>>> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 186064 X 685; storage space: 0
>>> unneeded,685 used
>>> [4] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 649; storage space: 0
>>> unneeded,649 used
>>> [4] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [4] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [4] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185414)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [2] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185379)/(num_localrows 186064) > 0.6. Use CompressedRow routines.
>>> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1011; storage space:
>>> 0 unneeded,1011 used
>>> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 1137; storage space:
>>> 0 unneeded,1137 used
>>> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [5] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 184925)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [3] MatAssemblyEnd_SeqAIJ(): Matrix size: 186063 X 658; storage space: 0
>>> unneeded,658 used
>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 186062 X 648; storage space: 0
>>> unneeded,648 used
>>> [1] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185051)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185414)/(num_localrows 186062) > 0.6. Use CompressedRow routines.
>>> [3] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
>>> [3] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
>>> [3] MatCheckCompressedRow(): Found the ratio (num_zerorows
>>> 185405)/(num_localrows 186063) > 0.6. Use CompressedRow routines.
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Question 3:
>>> If all matrix and vector inserted data are local, what part of the
>>> vec/mat assembly consumes time because matsetvalues and matassembly consume
>>> more time than matrix builder
>>> Also this is not just for the first time MAT_FINAL_ASSEMBLY
>>>
>>>
>>> For context the matrix in the above is nearly 1Mx1M partitioned over six
>>> processes and it was NOT built using DM
>>>
>>> Finally the configure options are:
>>>
>>> Configure options:
>>> PETSC_ARCH=release3 -with-debugging=0 COPTFLAGS="-O3 -march=native
>>> -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3
>>> -march=native -mtune=native" --with-cc=mpicc --with-cxx=mpicxx
>>> --with-fc=mpif90 --download-metis --download-hypre
>>>
>>> Sorry for such long question and thanks in advance
>>> Thanks
>>> M. Kamra
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20190712/4ed3697d/attachment.html>


More information about the petsc-users mailing list